At some point in their careers, most SQL Server developers will have faced the challenge of getting the data from an XML source file into SQL Server. Despite the ubiquity of the CSV format, and the rising star of JSON, a vast quantity of data is still exchanged as XML files.
For bulk loading XML data, the standard tools in the Extract-Transform-Load (ETL) toolkit tend to be the venerable OPENXML, or slightly more modern OPENROWSET/XQUERY, for the T-SQL purist, or for SSIS devotees the XML task.
Any of these solutions work well enough for small-to-medium sized files and even quite complex XML structures, but they often don’t scale to larger loads. Anything over 2GB will crash OPENXML or SSIS in my experience. In some cases, even if they can load a large file, the time taken could prove a threat to your SLAs.
In these circumstances, SQLXML Bulk Loader is your friend. It can import extremely large XML source data files into SQL Server at amazing speed, while preserving referential integrity, if required. Unfortunately, it’s not installed with SQL Server, by default, and it’s poorly explained in the SQL Server documentation. As a result, most developers either ignore it completely, or implement it poorly and it is unfairly perceived as “hard to use”.
This article attempts to put the record straight. It will describe briefly how to find and install SQLXML Bulk Loader and then how to use it to load three “classic” XML file structures:
- A simple XML format
- Multiple tables stored as XML
- A nested, moderately complex XML structure that maps to a relational table structure
To work through the examples, you’ll need to download the sample XML files from the “article rating” box at the top of this article. I’ve provided the kind of “semi-structured” source files that SQLXML Bulk Loader can digest with ease. You’ll also need to create a database in which to practice. In the examples, I’ve assumed you’ve created a directory called C:\SQLXMLArticle
for the files, and created a database called CarSales_Staging
. All techniques described will work with all versions of the database from 2005 through 2014.
In a subsequent article, I will explain some of the more advanced features of SQLXML Bulk Loader, from handling attribute-centric XML to optimizing the process for bulk loads via validating the XML and integrating SQLXML Bulk Loader with SSIS, and other tricks and techniques to help you get the most out of this tool.
SQLXML Bulk Loader Use Cases
SQLXML Bulk Loader is best used in the following situations:
- When the source XML file is large. What is large? Well, if you were planning on using OPENXML, then that means any file over 2 gigabytes. The same upper limit is true of XML loaded into a variable and shredded using XQuery. With SSIS, it depends on the memory available, but I have experienced practical difficulties way below the 2 GB limit. In contrast, I have loaded XML files of tens of gigabytes using SQLXML Bulk Load.
- When the XML source structure is relatively simple. In reality, this means a file consisting of tables and fields where the nested XML structure is not too complex. The source data cannot be too intricate, in XML terms, or it will not load. It is not for nothing that this is called “semi-structured” XML data.
- When you wish to load multiple tables from the same source file.
- When speed loading the data is important. In my tests, SQLXML Bulk Load loaded data at about 90 percent of native BCP speeds, for separate tables without relational links.
Be aware from the start that SQLXML Bulk Loader can only handle a certain level of complexity in the source file. Should you be faced with a file that SQLXML Bulk Loader refuses to load, your only solution will be one of the following:
- Negotiate a simpler structure from the data supplier
- Use one of the other available technologies mentioned above
- Use XSLT to simplify or ‘flatten’ the file structure, to produce a file amenable to processing by SQLXML. XML transformation is a vast and completely separate topic and one that I will not be explaining it in this article
Preparing SQLXML Bulk Loader
SQLXML 4.0 was installed by default as far as SQL Server 2005 but from SQL Server 2008 onwards, it has to be installed as part of the Feature Pack, for SQL Server 2008, or downloaded separately. You can obtain version 4.0 SP1 at the following URL:
http://www.microsoft.com/en-gb/download/details.aspx?id=30403
SQLXML Bulk Loader exists in both 32- and 64-bit versions (there is even a 64-bit IA version, but I expect that few people will be needing this). The installation is simple and I will not waste your time explaining it here. Once complete, you will be ready to start using SQLXML Bulk Loader to load XML files into SQL Server.
Importing an XML File into SQL Server
SQLXML Bulk Loader is a COM (Component Object Model) object that allows you to load semi-structured XML data into SQL Server tables. To make it work, you will need the following:
- An XML source file.
- A destination table in a SQL Server database.
- An XSD schema file that maps the XML to the destination table – this is the interesting part of the process
- A Visual Basic Script file to execute the XML load process.
Listing 1 shows the XML source, which in this simple example will be used to “bulk import” 2 rows in a SQL Server table. Throughout the article, the code listing caption provides the name of the relevant file in the code download bundle.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
<?xml version = "1.0" encoding = "UTF-8"?> <CarSales> <Client> <ID >3</ID> <ClientName>John Smith</ClientName> <Address1>4, Grove Drive</Address1> <Town>Uttoxeter</Town> <County>Staffs</County> <Country>1</Country> </Client> <Client> <ID>7</ID> <ClientName>Slow Sid</ClientName> <Address1>2, Rue des Bleues</Address1> <Town>Avignon</Town> <County>Vaucluse</County> <Country>3</Country> </Client> </CarSales> |
An XML document is checked for being well-formed, but it is not validated. If the XML document is not well-formed, processing is cancelled.
This example uses XML elements, but attribute-centric XML source data can also be loaded with SQLXML Bulk Loader. I give a short example of this in a subsequent article.
Inline schemas are not supported
If you have an inline schema in the
source XML document, SQLXML will ignore it.
Listing 2 creates the SQL Server table, the destination for this sample data load.
1 2 3 4 5 6 7 8 9 10 |
CREATE TABLE CarSales_Staging.dbo.Client_XMLBulkLoad ( ID INT NULL , ClientName NVARCHAR(1000) NULL , Address1 NVARCHAR(1000) NULL , Town NVARCHAR(1000) NULL , County NVARCHAR(1000) NULL , Country NUMERIC(18, 0) NULL ); GO |
Listing 3 shows the XSD schema file that maps the source XML to the destination table. I will explain its workings once the data is loaded.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sql = "urn:schemas-microsoft-com:mapping-schema"> <xsd:element name = ""CarSales"" sql:is-constant = "1" > <xsd:complexType> <xsd:sequence> <xsd:element name = "Client" sql:relation = "Client_XMLBulkLoad" maxOccurs = "unbounded"> <xsd:complexType> <xsd:sequence> <xsd:element name = "ID" type = "xsd:integer" sql:field = "ID" /> <xsd:element name = "ClientName" type = "xsd:string" sql:field = "ClientName" /> <xsd:element name = "Address1" type = "xsd:string" sql:field = "Address1" /> <xsd:element name = "Town" type = "xsd:string" sql:field = ""Town"" /> <xsd:element name = "County"" type = "xsd:string" sql:field = "County" /> <xsd:element name = "Country" type = "xsd:decimal" sql:field = "Country" /> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:schema> |
Finally, Listing 4 shows the VBScript file used to invoke SQLXML Bulk Load and load the data.
1 2 3 4 5 |
Set objBL = CreateObject("SQLXMLBulkLoad.SQLXMLBulkload.4.0") objBL.ConnectionString = "provider = SQLOLEDB;data source=MySQLServer;database=CarSales_Staging; integrated security = SSPI" objBL.ErrorLogFile = "C:\SQLXMLArticle\SQLXMLBulkLoadImporterror.log" objBL.Execute "C:\SQLXMLArticle\SQLXMLBulkLoadImport_Simple.xsd","C:\SQLXMLArticle\Clients_Simple.xml" Set objBL = Nothing |
…or in PowerShell …
1 2 3 4 5 |
$objBL = new-object -comobject 'SQLXMLBulkLoad.SQLXMLBulkLoad' $objBL.ConnectionString = 'provider = SQLOLEDB;data source=YourServer;database=CarSales_Staging; integrated security = SSPI' $objBL.ErrorLogFile ='C:\SQLXMLArticle\SQLXMLBulkLoadImporterror.log' $objBL.Execute('C:\SQLXMLArticle\SQLXMLBulkLoadImport_Simple.xsd','C:\SQLXMLArticle\Clients_Simple.xml') $objBL = $null |
If the .xml and .xsd files are in the same directory as the .vbs file then you will not need to add the full paths after the .Execute
command in the VBScript file. However it is probably a good habit to use full path details, as a matter of principle.
Double-click the SQLXMLBulkload.vbs file to run the bulk load. If all goes well, you should be able to query the Client_XMLBulkLoad table and see the data from the XML source file correctly loaded.
A Closer Look at the Schema File
The schema (XSD) file contains a few extra tidbits, over and above the Microsoft mapping schema, that allow it to perform its job so efficiently. Essentially, it extends the basic schema with the attributes that allow SQLXML to channel the source data into the correct tables and fields.
Perhaps the easiest way to understand what is special about the XSD that SQLXML bulk loader uses is to compare it with a “plain vanilla” XSD for the same XML source file. Listing 6 shows the XSD without the extensions that SQLXML bulk loader needs.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:element name = "CarSales"> <xsd:complexType> <xsd:sequence> <xsd:element name = "Client"> <xsd:complexType> <xsd:sequence> <xsd:element name = "ID" type = "xsd:integer"/> <xsd:element name = "ClientName" type = "xsd:string"/> <xsd:element name = "Address1" type = "xsd:string/> <xsd:element name = "Town" type = "xsd:string"/> <xsd:element name = "County" type = "xsd:string"/> <xsd:element name = "Country" type = "xsd:decimal"/> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:schema> |
As you can see this version is much simpler. So what was added to the basic schema to make the data load work? Only four annotations were necessary to apply all the required constraints on the data. They are:
xmlns:sql = "urn:schemas-microsoft-com:mapping-schema"
– This extension to the namespace specifies how the schema is extended.Sql:relation
– This extension indicates to SQLXML the table into which the data is to be loaded for this XML element. As you can see from Listing 3, this is added to the<Client>
element to indicate that this particular element maps to the Client_XMLBulkLoad table in the SQL Server database.sql:field
– This extension indicates to SQLXML the field in the destination table into which the data is loaded. This is added to each of the nested elements in the<Client>
element to indicate which field the XML element is mapped to in the Client_XMLBulkLoad table in SQL Serversql:is-constant = "1"
– This extension tells SQLXML that the<CarSales>
element is a container and not a data element – and so does not need to be loaded into a data table. This annotation has to be added to all “parent container” elements.
Getting the schema file right is the hard part of using SQLXML bulk loader. You will probably spend more time on crafting this file than with any other part of the SQLXML bulk load process. Consequently it is worth ensuring that you have understood the XSD extensions before attempting a complex data load. You may even find that practicing on a simple XML file to start with can reap dividends. You can then move on to tackling a large and more complex XML source file once you are confident that you have mastered the basics.
However, you do not need to hand-craft the entire schema file from scratch. There are many tools available that can analyze an XML file and produce a basic XSD file that you can then extend by adding the Microsoft mapping extensions. One tool that is available with any Visual Studio installation is xsd.exe
. It will take an XML source file and return a schema file in all but the most arcane cases. You can find out more about this at: XML Schema Definition Tool (Xsd.exe). If this tool does not meet your needs, then you can try a PowerShell alternative that is described at: Loading Data With PowerShell.
How the VBScript File Works
The .vbs file that invokes SQLXML Bulk Loader can be extremely simple, and only needs the following items:
- The object creation statement – to invoke the SQLXML Bulk Load COM object.
- A standard OLEDB connection string, containing, at a minimum:
- The server and, if required, instance name (
Data Source
in this example). - The destination database name (
CarSales_Staging
in this example). - The SQL Server security information (Windows integrated security or SQL Server security).
- The server and, if required, instance name (
- The
Execute
command, which provides paths to the XSD and XML files – in that order. - An ErrorLog file. While not compulsory, this is invaluable for debugging the process.
- A
SET
command to dispose of the COM object.
This simple VB script assumes that the destination table already exists (although you can create the table as part of the load as you can see in a companion article). It also presumes integrated security. It is interesting that, somewhat counter-intuitively, the XML schema (.xsd) file is passed as the first parameter to the XML Bulk Loader. The XML file itself is passed in as the second parameter.
Troubleshooting
Two things will tell you that the XML data load has worked:
- There is no ErrorLog file, or if the old ErrorLog file is removed (assuming that you have requested a log file)
- The fact that the data loaded correctly into the destination table(s)
Should the data not load, then your first port of call is the ErrorLog file that SQLXML Bulk Load created, assuming that you added the objBL.ErrorLogFile = "Log File and path"
parameter. Fortunately, these error log files are very explicit, and always prove to be an invaluable source of debugging information if you encounter load issues.
There are a few precautions that you can take to prevent persistent recourse to the error log file becoming a way of life. Nearly all your attention should probably focus on the schema file. In my experience, this is the most common source of problems, because it generally involves a certain amount of hand-crafting. Potential problems include:
- Making sure that there are no spaces immediately inside all the double-quotes that are used in the
xsd:element
definition. - Ensuring that you do not inadvertently close elements such as
xsd:sequence
andxsd:complextype
– or even some of thexsd:element
definitions. - Respecting case-sensitivity. So in the preceding example, having an element called “Country”, and a schema mapping element named “country” would cause the process to fail.
An interesting “feature” concerns the VBScript file. You have to be sure that you do not leave any spaces in the database=DatabaseName
part of the script file. Spaces here, even if they are allowed elsewhere in the connection string, will cause a cryptic error message complaining that the table object cannot be found.
Loading Multiple Tables at Once from a Single XML Source File
One advantage that an XML file can have over a CSV file is the capacity to store multiple unrelated tables in a single source file. There will doubtless be occasions when you will want to load data from an XML source file that contains data for several separate tables. With the SQLXML Bulk Load executable and a suitably-crafted XSD file, you can load data into multiple tables from a single XML file in a single process. SQLXML Bulk Loader allows you, in effect, to transfer an entire subset of data in a single file, ready to be loaded simultaneously into multiple tables in a destination database. The data load can also be blindingly fast.
This technique is predicated on the fact that the source XML is very simple in layout. It also presumes that you are essentially loading data from a “table dump” in an XML file, where each table is stored sequentially in the XML file. You are also nearly always best advised to drop any indexes for the duration of the load, and recreate them afterwards when loading multiple tables. Moreover it is also best to disable foreign key constraints during the data load and reapply them afterwards.
To emulate the “table dump” scenario, the data file used in this example, shown in Listing 7, contains two tables in XML format, Invoice
and Invoice_Lines
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
<?xml version = "1.0" encoding = "UTF-8" ?> <ROOT> <Invoice> <ID> 3</ID> <InvoiceNumber> AA/1/2014-07-25</InvoiceNumber> <DeliveryCharge> 250</DeliveryCharge> </Invoice> <Invoice_Lines> <InvoiceID> 3</InvoiceID> <SalePrice> 5000</SalePrice> </Invoice_Lines> <Invoice_Lines> <InvoiceID>3</InvoiceID> <SalePrice> 12500</SalePrice> </Invoice_Lines> </ROOT> |
The file only has one record for Invoice
and two for Invoice_Lines
, but in reality there could be dozens of tables, each containing millions of records. Note that there is no presumption in the XML of any relational link between the Invoice
and Invoice_Lines
elements.
Listing 8 shows the XSD file that maps the source XML data to two tables in a SQL Server database, which are also called Invoice
and Invoice_Lines
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
<xsd:schema xmlns:xsd = "http://www.w3.org/2001/XMLSchema" xmlns:sql = "urn:schemas-microsoft-com:mapping-schema"> <xsd:element name = "ROOT" sql:is-constant = "1" > <xsd:complexType> <xsd:sequence> <!-- Invoice --> <xsd:element name = "Invoice" sql:relation = "Invoice"> <xsd:complexType> <xsd:sequence> <xsd:element name = "ID" type = "xsd:integer" sql:field = "ID" /> <xsd:element name = "InvoiceNumber" type = "xsd:string" sql:field = "InvoiceNumber" /> <xsd:element name = "DeliveryCharge" type = "xsd:decimal" sql:field = "DeliveryCharge" /> </xsd:sequence> </xsd:complexType> </xsd:element> <!-- Invoice Lines --> <xsd:element name = "Invoice_Lines" sql:relation = "Invoice_Lines" > <xsd:complexType> <xsd:sequence> <xsd:element name = "InvoiceID" type = "xsd:integer" sql:field = "InvoiceID" /> <xsd:element name = "SalePrice" type = "xsd:string" sql:field = "SalePrice" /> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:schema> |
This schema file uses the same extensions as the previous one that you saw, only this time there are two “tables” of data – each mapped using the sql:relation
annotation. Each of the XML elements containing data inside the XML container element is mapped to the corresponding database field with the sql:field
annotation.
As you can see, each XML element that maps to a table is a separate XML node, and the order of the elements is unimportant. What is important, however, is to have a single root node, just as you would for a conventionally well-formed XML file.
The only potential issue with this approach is that the XSD file can become extremely complex if you have to handle dozens of tables where each one can have potentially hundreds of columns. It is not that the XSD file is necessarily difficult to create, just arduous. So arm yourself with patience, and be prepared for some repetitive testing.
The penultimate stage is to create the destination tables, as shown in Listing 8.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
CREATE TABLE CarSales_Staging.dbo.Invoice ( ID INT NULL , InvoiceNumber VARCHAR(50) NULL , DeliveryCharge SMALLMONEY NULL ); GO CREATE TABLE CarSales_Staging.dbo.Invoice_Lines ( InvoiceID INT NULL , SalePrice MONEY NULL, ); GO |
Once the destination tables exist, you can create and execute a VBScript file just as you did before. Here is the script to use.
1 2 3 4 5 6 7 |
Set objBL = CreateObject("SQLXMLBulkLoad.SQLXMLBulkload.4.0") objBL.ConnectionString = "provider = SQLOLEDB;data source=YourServer; database=CarSales_Staging; integrated security = SSPI" objBL.ErrorLogFile = "C:\SQLXMLArticle\SQLXMLBulkLoadImporterror.log" objBL.Execute "C:\SQLXMLArticle\SQLXMLBulkLoadImportMultipleTables.xsd", "C:\SQLXMLArticle\SQLXMLSourceDataMultipleTables.xml" Set objBL = Nothing |
… or in PowerShell
1 2 3 4 5 6 |
$objBL = new-object -comobject 'SQLXMLBulkLoad.SQLXMLBulkLoad' $objBL.ConnectionString = 'provider = SQLOLEDB;data source=YourServer;database=CarSales_Staging; integrated security = SSPI' $objBL.ErrorLogFile ='C:\SQLXMLArticle\SQLXMLBulkLoadImporterror.log' $objBL.Execute('C:\SQLXMLArticle\SQLXMLBulkLoadImportMultipleTables.xsd', ` 'C:\SQLXMLArticle\SQLXMLSourceDataMultipleTables.xml') $objBL = $null |
Once again the source data is loaded into the destination database, this time loading multiple tables simultaneously, from a single file in a single operation.
There is nothing to stop you from enforcing referential integrity through the addition of primary and foreign key constraints once the load has finished, so this technique can become a method for transferring data between databases. Certainly, it is not as simple as a backup and restore, but it can prove useful in certain circumstances, such as when:
- You need to transfer data between multiple versions of SQL server, and frequently into older versions of the database.
- The environment that you are using does not allow for direct connections between databases.
- CSV files prove unreliable for the data that you are using.
- You need to use the same data for loading into multiple different databases (Oracle, MySQL or DB2 for instance) – and XML is the most appropriate “Lingua Franca”.
- You need a rapid and reliable data load process.
On another note, it is worth remarking that the XML data file can contain more “tables” and “records” than are mapped in the schema file. However, only those entities that are specified in the XSD file will be loaded.
Loading and Shredding Relational Tables from an XML Source File
XML is nothing if not infinitely varied. Frequently, the XML source file that will be more intricate than the simple “table dump” structure of the previous example. Often, the data that makes up several tables is presented in a hierarchical way inside a single XML element.
Listing 12 illustrates a simple version of this nested XML pattern.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
<?xml version = "1.0" encoding = "UTF-8" ?> <ROOT> <Invoice> <ID >3</ID> <InvoiceNumber >3A9271EA-FC76-4281-A1ED-714060ADBA30</InvoiceNumber> <DeliveryCharge >250</DeliveryCharge> <Invoice_Lines> <ID >1</ID> <InvoiceID >3</InvoiceID> <SalePrice >5000</SalePrice> </Invoice_Lines> <Invoice_Lines> <ID >2</ID> <InvoiceID >5</InvoiceID> <SalePrice >12500</SalePrice> </Invoice_Lines> </Invoice> </ROOT> |
Although this file only contains a small amount of data it maps to a classic Invoice header / Invoice Item pattern, where the <Invoice>
element contains the <Invoice_Lines>
element. We can load this kind of data directly into a relational table structure using SQLXML Bulk Loader.
Drop the two tables used in the previous example and re-create them with a FOREIGN KEY
constraint, as shown in Listing 11.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
DROP TABLE CarSales_Staging.dbo.Invoice; DROP TABLE CarSales_Staging.dbo.Invoice_Lines; CREATE TABLE CarSales_Staging.dbo.Invoice ( ID INT NOT NULL PRIMARY KEY , InvoiceNumber VARCHAR(50) NULL , DeliveryCharge SMALLMONEY NULL ); GO CREATE TABLE CarSales_Staging.dbo.Invoice_Lines ( ID INT NOT NULL PRIMARY KEY , InvoiceID INT NOT NULL FOREIGN KEY REFERENCES Invoice ( ID ) , SalePrice MONEY NULL, ); GO |
Create an appropriate XML schema, as shown in Listing 12 (I’ll explain this file in a little more detail shortly).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
<xsd:schema xmlns:xsd = "http://www.w3.org/2001/XMLSchema" xmlns:sql = "urn:schemas-microsoft-com:mapping-schema"> <xsd:annotation> <xsd:appinfo> <sql:relationship name = "InvoiceToLine" parent= "Invoice" parent-key = "ID" child= "Invoice_Lines" child-key= "InvoiceID" /> </xsd:appinfo> </xsd:annotation> <xsd:element name = "Invoice" sql:relation = "Invoice" > <xsd:complexType> <xsd:sequence> <xsd:element name = "ID" type = "xsd:integer" sql:field = "ID" /> <xsd:element name = "InvoiceNumber" type = "xsd:string" sql:field = "InvoiceNumber" /> <xsd:element name = "DeliveryCharge" type = "xsd:decimal" sql:field = "DeliveryCharge" /> <xsd:element name = "Invoice_Lines" sql:relation = "Invoice_Lines" sql:relationship = "InvoiceToLine" > <xsd:complexType> <xsd:sequence> <xsd:element name = "ID" type = "xsd:integer" sql:field = "ID" /> <xsd:element name = "InvoiceID" type = "xsd:integer" sql:field = "InvoiceID" /> <xsd:element name = "SalePrice" type = "xsd:string" sql:field = "SalePrice" /> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:schema> |
Modify the objBL.Execute
statement in the existing SQLXMLBulkloadMulti.vbs file so that it refers to the new .XML and .XSD files, as shown in Listing 13. Save this file as SQLXMLBulkloadNested.vbs.
1 2 3 |
... objBL.Execute "C:\SQLXMLArticle\SQLXMLBulkLoadImportReferential.xsd" C:\SQLXMLArticle\SQLXMLSourceDataRelatedTables.xml ... |
This example builds on the approach from the previous example. However, it has the following prerequisites:
- Destination tables with
PRIMARY KEY
toFOREIGN KEY
relationships. This is why I suggested that you re-create the destination tables, as the full DDL makes the key relationships clearer. - Source data where the “parent-child” data relationship is directly expressed in the XML structure. This is the case in the source data where the
<Invoice>
element contains one or more<Invoice_Lines>
elements.
More generally, if the source XML is hierarchically structured to map to a relational source, then SQLXML Bulk Load can shred the data into SQL Server tables while maintaining foreign key relationships.
The Schema File for Nested XML Structures
Here, as was the case in the previous example, both the key to success and any potential complexity reside in the accurate definition of the XSD file. Indeed, everything that is essential to a successful load is found in the schema file:
- Firstly, note the
<xsd:annotation>
and<xsd:appinfo>
elements at the start of the schema file. The foreign key mapping is specified inside these elements. It must be present – and correctly defined – for the load to work. - Inside these elements, you have the
sql:relationship name = "InvoiceToLine"
element. This identifies each relationship between tables as part of theappinfo
, as it will then be used as part of thesql:element
definition for a “table” in the schema file.
The relationship definitions are as follows:
parent = "Invoice"
: The “parent” table.parent-key = "ID"
: The primary key column of the “parent” table.child = "Invoice_Lines"
: The “child” table.child-key = " InvoiceID "
: The foreign key column of the “child” table.sql:relationship = "InvoiceToLine"
: This attribute Inside the XML element, which contains the child elements specifies the relationship to use that was defined previously in the schema file.
As you can see from this information, the ID field is now considered as a PRIMARY KEY
in the Invoice
table, and the InvoiceID
is taken to be the FOREIGN KEY
in the Invoice_Lines
table. Executing the .vbs script loads the data into the two tables, while maintaining the ID as the FOREIGN KEY
constraint for the Invoice_Lines
.
You also need to know that SQLXML Bulk Load generates records as their nodes enter into scope, and sends those records to SQL Server as their nodes exit scope. This means that all the data for the record must be present within the scope of the node. Consequently, the data for the “child” table cannot precede the data for the “parent” table in the source file. To make the point in another way, if you have a mapping schema that defines a PRIMARY KEY
/ FOREIGN KEY
relationship between two tables (such as between Invoice
and Invoice_Lines
), the table with the PRIMARY KEY
must be described first in the schema. The table with the FOREIGN KEY
column must appear later in the schema.
Performance
An important practical point about loading XML data using this particular design pattern is speed, or rather lack of it. You will never attain the turbo-charged load times with nested XML that you can attain with the “table dump” approach that you saw previously. This is not to say that you cannot load gigabytes of data in reasonable timescales, just that you will never break any speed limits using this approach.
Equally, there are limits to the complexity of the XML source files that SQLXML Bulk Loader can load gracefully (or at all, in some cases). It is well-nigh impossible to predict exactly when you will hit a ceiling on the data ingestion capacities of this tool, but you do need to be aware that there are files that appear to be too intricate for it to handle.
Conclusion
Welcome to the world of SQLXML Bulk Loader! It’s a valuable tool that has been available for many years and yet is still underused, in my opinion. You have seen how to load multiple types of XML data files, from the simple table, via multiple separate tables through to more complex, nested XML structures. While this tool will never win any prizes for making developer’s lives easier, it certainly makes up for this in its raw power and speed. If you suspect that your XML files will ever grow to any appreciable size, then you could well be advised to get to know this versatile addition to your ETL toolkit sooner rather than later. It just could save your reputation.
Look out for the next installment, covering some of the more advanced features of SQLXML Bulk Loader!
SQL Server 2012 Data Integration Recipes
If you like what you read in this article and want to learn more about data integration using SQL Server, then please take a look at my book, SQL Server 2012 Data Integration Recipes (Apress, December 2012)
Load comments