{"id":1595,"date":"2013-03-05T00:00:00","date_gmt":"2013-03-05T00:00:00","guid":{"rendered":"https:\/\/test.simple-talk.com\/uncategorized\/ad-hoc-xml-file-querying\/"},"modified":"2021-08-24T13:39:57","modified_gmt":"2021-08-24T13:39:57","slug":"ad-hoc-xml-file-querying","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/database-administration-sql-server\/ad-hoc-xml-file-querying\/","title":{"rendered":"Ad-Hoc XML File Querying"},"content":{"rendered":"<div class=\"article-content\">\n<p class=\"start\">XML data is often stored in files, outside of a database. One obvious use for XML file storage is to hold configuration data that is read by web servers and other systems. If ad-hoc querying of XML files becomes necessary, it&#8217;s important to find accurate and efficient ways to extract just the data you require from the XML, and return the results as a table. In this post, we&#8217;ll look into finding the best ways to perform ad-hoc queries against XML files using SQL Server&#8217;s <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms345122(v=sql.90).aspx\">XQuery<\/a> implementation and end up with the results in a table. <\/p>\n<p>Let&#8217;s use Microsoft&#8217;s sample <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/windows\/desktop\/ms762271(v=vs.85).aspx\">Books.xml<\/a> file, which contains 12 book entity records that include element and attribute values for book id, author, title, genre, publication date, and description. A sample <strong>book<\/strong> node looks like this: <\/p>\n<pre class=\"lang:xhtml theme:github\">&lt;book id=\"bk109\"&gt;\n\t&lt;author&gt;Kress, Peter&lt;\/author&gt;\n\t&lt;title&gt;Paradox Lost&lt;\/title&gt;\n\t&lt;genre&gt;Science Fiction&lt;\/genre&gt;\n\t&lt;price&gt;6.95&lt;\/price&gt;\n\t&lt;publish_date&gt;2000-11-02&lt;\/publish_date&gt;\n\t&lt;description&gt;After an inadvertant trip through a Heisenberg Uncertainty Device, James Salway discovers the problems of being quantum.&lt;\/description&gt;\n&lt;\/book&gt;<\/pre>\n<p>For our purposes, I&#8217;ve created a local copy of Books.xml on C:. Let&#8217;s assume that we want to query this local file from SQL Server, returning the results of our query in a relational format &#8211; this can be referred to as &#8216;<a href=\"https:\/\/www.simple-talk.com\/sql\/database-administration\/manipulating-xml-data-in-sql-server\/?utm_source=simpletalk&amp;utm_medium=weblink&amp;utm_content=adhocxml&amp;utm_campaign=simpletalk\">shredding<\/a>&#8216; XML node values. <\/p>\n<h2>Importing XML data using a CTE <\/h2>\n<p>Our first step will be to find a way to access the XML file from SQL Server. One way to do this is to use the versatile&#160;<a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms190312.aspx\">OPENROWSET<\/a> function.&#160;It can be used to import many different formats of bulk data. We can use OPENROWSET in a <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms190766(v=sql.105).aspx\">CTE<\/a> (Common Table Expression) to directly access the XML data from the <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/bb895234.aspx\">BLOB<\/a> (Binary Large OBject) that OPENROWSET converts it into, for each time we run a query against it: <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">--query the XML Blob using a CTE (pulling from the XML file each time)\nWITH XmlFile (Contents) AS (\nSELECT CONVERT (XML, BulkColumn) \nFROM OPENROWSET (BULK 'C:\\Books.xml', SINGLE_BLOB) AS XmlData\n)\nSELECT *\nFROM XmlFile\nGO<\/pre>\n<p>The &#8216;SINGLE_BLOB&#8217; argument of OPENROWSET designates the XML data into a Blob of data type <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms188362.aspx\">VARBINARY(MAX)<\/a> &#8211; this type also is limited to a size of 2GB. The CTE creates a table expression named &#8216;XMLFile&#8217; and its singlular column to hold the data &#8211; giving us the ability to reference the XML data properly while running XQuery statements against it.<\/p>\n<h2>Importing XML data using a function<\/h2>\n<p>Another way we can access the XML data from the file on demand is by creating a function that returns an XML variable. A variable of the <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms187339.aspx\">XML data type<\/a> can hold up to 2 GB of XML data &#8211; for context, our entire Books.xml file is only 5KB. We can easily create a SQL Server scalar function to do this: <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">--create function to return XML data from file as XML variable\nUSE AdventureWorks2012\nGO\nCREATE FUNCTION fn_BooksXML()\nRETURNS XML\nAS\nBEGIN\n\n\tDECLARE @books XML;\n\tSELECT @books = CONVERT (XML, BulkColumn)\n\tFROM OPENROWSET (BULK 'C:\\Books.xml', SINGLE_BLOB) AS XmlData;\n\n\tRETURN @books;\nEND\nGO<\/pre>\n<p>We&#8217;ve created this function in the AdventureWorks2012 database. It can be called like this: <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">--call function to return all XML file data\nSELECT AdventureWorks2012.dbo.fn_BooksXML() AS [Books XML Data];<\/pre>\n<p>Obviously, creating a function to pull XML data is more of a permanent way to access an XML file, as opposed to an ad-hoc script. <\/p>\n<h2>Filtering XML data using FLWOR <\/h2>\n<p>Now that we have found ways to access our XML file data from SQL Server, we can start looking for the best way to run queries against it. To start with, let&#8217;s say we are looking for a query that will return author names for books in the &#8216;Computer&#8217; genre category. An obvious first possible solution might be to use the iterative power of the XQuery <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms190945.aspx\">FLWOR statement<\/a>. FLWOR stands for For, Let, Where, Order by, and Return. FLWOR is an integral part of XQuery. A possible FLWOR solution to our problem might look like this: <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">--use a FLWOR statement to find all authors in Computer genre\nSELECT AdventureWorks2012.dbo.fn_BooksXML().query('for $Book in \/catalog\/book\n\t\t\t\tlet $Genre := $Book\/genre\n\t\t\t\tlet $Author := $Book\/author\n\t\t\t    where $Genre = \"Computer\"\n\t\t\t    return $Author\n\t\t\t   ') AS [Computer Book Authors]\nGO<\/pre>\n<p>We&#8217;ve used a FLWOR statement inside of an XQuery query() method, accessing the XML data from the file using our function This <em>does<\/em> return the values that we want: <\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1756-bd72eb0e-705f-42a0-95f1-f75dde42ad36.png\" alt=\"1756-bd72eb0e-705f-42a0-95f1-f75dde42ad3\" \/><\/p>\n<pre class=\"lang:xhtml theme:github\">&lt;author&gt;Gambardella, Matthew&lt;\/author&gt;\n&lt;author&gt;O'Brien, Tim&lt;\/author&gt;\n&lt;author&gt;O'Brien, Tim&lt;\/author&gt;\n&lt;author&gt;Galos, Mike&lt;\/author&gt;<\/pre>\n<p>But it definitely <em>does not<\/em> present the results in the relational format we are looking for &#8211; the data is returned in its native format, an XML fragment, instead. We could try stripping the FLWOR results of the XML tags using the XQuery <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms187038.aspx\">data() function<\/a>: <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">--use a FLWOR statement to find all authors in Computer genre (remove XML tags)\nSELECT AdventureWorks2012.dbo.fn_BooksXML().query('for $Book in \/catalog\/book\n\t\t\t\tlet $Genre := $Book\/genre\n\t\t\t\tlet $Author := $Book\/author\n\t\t\t    where $Genre = \"Computer\"\n\t\t\t    return data($Author)\n\t\t\t   ') AS [Computer Book Authors]\nGO<\/pre>\n<p>However, this method still returns the results contained in one record &#8211; only now without the XML tags: <\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1756-5efccd48-ed98-4930-ac44-b0e08ccc4c00.png\" alt=\"1756-5efccd48-ed98-4930-ac44-b0e08ccc4c0\" \/><\/p>\n<p>Incidentally, using the data() method makes it easy to return a delimited list. A few changes to our script, which includes the introduction of the <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms190972.aspx\">concat()<\/a> XQuery function, produces a delimited result set: <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">--use a FLWOR statement to find all authors in Computer genre (remove XML tags and delimit)\nSELECT AdventureWorks2012.dbo.fn_BooksXML().query('for $Book in \/catalog\/book\n\t\t\t\tlet $Genre := $Book\/genre\n\t\t\t\tlet $Author := $Book\/author\n\t\t\t    where $Genre = \"Computer\"\n\t\t\t    return concat(\"[\",data($Author)[1],\"]\")\n\t\t\t   ') AS [Computer Book Authors]\nGO<\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1756-601d0e91-e1c4-4f54-aee0-7bbd099f9901.png\" alt=\"1756-601d0e91-e1c4-4f54-aee0-7bbd099f990\" \/><\/p>\n<p>Although FLWOR is a powerful way to produce iterative results, it returns values in an XML format by default. <\/p>\n<h2>Filtering XML data using XQuery value() method <\/h2>\n<p>Another option we can try is to do our search using a path inside of an XQuery <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms178030.aspx\">value()<\/a> method. The value() method simply pulls the value data from an XML node element or attribute. Let&#8217;s use value() for our situation, this time using the CTE method to pull the XML data from the file: <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">--use XQuery value() method within a CTE\nWITH XmlFile (Contents) AS (\nSELECT CONVERT (XML, BulkColumn) \nFROM OPENROWSET (BULK 'C:\\Books.xml', SINGLE_BLOB) AS XmlData\n)\nSELECT Contents.value('(\/catalog\/book\/genre[. = \"Computer\"]\/..\/author)[1]', 'varchar(50)') AS [Computer Book Authors]\nFROM   XmlFile\nGO<\/pre>\n<p>Within the value() method&#8217;s XQuery path, we are indicating here that only <strong>genre<\/strong> nodes with a value of &#8216;Computer&#8217; should be examined. Then, we use the <a href=\"http:\/\/msdn.microsoft.com\/en-US\/library\/ms345122(v=SQL.90).aspx\">parent node axis abbreviation<\/a> (&#8216;..&#8217;) path step to back up to the <strong>book<\/strong> node &#8211; where its <strong>genre<\/strong> node meets that requirement &#8211; and then look at its <strong>author<\/strong> node. This works because the <strong>genre<\/strong> and <strong>author<\/strong> nodes are on the same level inside of the <strong>book<\/strong> node. Our results for the above query are: <\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1756-8a4e7d42-f8a6-4eee-9f76-66adb6d589aa.png\" alt=\"1756-8a4e7d42-f8a6-4eee-9f76-66adb6d589a\" \/><\/p>\n<p>We can immediately see a major problem with using this solution for our purposes &#8211; it returns matches for just the first <strong>book <\/strong>node; The Value() method will only return one node at a time. This is due to the <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms178030.aspx\">singleton requirement of the value() method<\/a>. The singleton requirement forces us to indicate which <strong>book<\/strong> node instance we are referring to &#8211; even if there is only one instance. However, the XML data contains multiple <strong>book<\/strong> nodes, and we need to look at each one in order to get an accurate answer to our request. We&#8217;ve indicated that we want the results from the first <strong>book<\/strong> node instance returned &#8211; by using the &#8216;[1]&#8217; as a singleton. Therefore, if we change that to &#8216;[2],&#8217; we get: <\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1756-0dae9f7f-051c-4382-9622-882f195b8922.png\" alt=\"1756-0dae9f7f-051c-4382-9622-882f195b892\" \/><\/p>\n<p>This type of value() query will work fine when using XML fragments having only one node instance, or where only one instance&#8217;s values were required. In our case, however, this is not what we want. <\/p>\n<h2>Filtering XML data using XQuery nodes() method <\/h2>\n<p>We have another possible way to get the results we want: using the XQuery <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms188282.aspx\">nodes()<\/a> method. The nodes() method is designed to shred XML node values into a relational format &#8211; nodes() actually returns a rowset based on the shredded XML. This sounds like exactly what we need. Let&#8217;s try an XQuery nodes() query: <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">--use XQuery nodes() method within a CTE\nWITH XmlFile (Contents) AS (\nSELECT CONVERT (XML, BulkColumn) \nFROM OPENROWSET (BULK 'C:\\Books.xml', SINGLE_BLOB) AS XmlData\n)\nSELECT c.value('(author)[1]', 'varchar(50)') AS [Computer Book Authors]\nFROM   XmlFile CROSS APPLY Contents.nodes ('(\/\/book\/genre[. = \"Computer\"]\/..)') AS t(c);<\/pre>\n<p>This query uses <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms345118(v=sql.90).aspx\">CROSS APPLY<\/a> to create a kind of self join back to the <strong>XmlFile<\/strong> table expression. The CROSS APPLY requires table (t) and column (c) aliases. You may notice that we&#8217;ve used a basic XQuery statement in the value() method, but most of the real filtering is done in the nodes() method. Again, we have indicated that the values should be looked for in the <strong>genre<\/strong> nodes, and we&#8217;ve used the parent::node() axis abbreviation again to point back to the <strong>genre<\/strong> node&#8217;s parent <strong>book<\/strong> node. You&#8217;ll also notice that we are again able to employ the value() method, despite its singleton requirement. This is possible because the nodes() method shreds the XML nodes into a relational structure &#8211; the value() method is returning values for <em>each<\/em> record in a rowset generated by nodes(). So, the rowset generated by nodes()contains all <strong>book<\/strong> nodes where the genre is &#8216;Computer.&#8217; The value() method indicates that the <strong>author<\/strong> node from the given <strong>book<\/strong> node is where the value should be taken from. <\/p>\n<p>The query returns the following: <\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1756-85416852-1819-40b8-bf11-52c8de2af187.png\" alt=\"1756-85416852-1819-40b8-bf11-52c8de2af18\" \/><\/p>\n<p><strong>We&#8217;ve finally returned the results in the relational format we are looking for<\/strong>. We can eliminate redundant authors by adding the following change: <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">--use XQuery nodes() method within a CTE to get distinct authors\nWITH XmlFile (Contents) AS (\nSELECT CONVERT (XML, BulkColumn) \nFROM OPENROWSET (BULK 'C:\\Books.xml', SINGLE_BLOB) AS XmlData\n)\nSELECT DISTINCT x.[Computer Book Authors] FROM(\n\tSELECT c.value('(author)[1]', 'varchar(50)') AS [Computer Book Authors]\n\tFROM   XmlFile CROSS APPLY Contents.nodes ('(\/\/book\/genre[. = \"Computer\"]\/..)') AS t(c)\n)x;<\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1756-2fbcd3f0-20c6-4fc3-8cf0-a58a454a4606.png\" alt=\"1756-2fbcd3f0-20c6-4fc3-8cf0-a58a454a460\" \/><\/p>\n<h2>Working around Case-sensitivity in XQuery <\/h2>\n<p>Our query is case-sensitive. <strong>Genre<\/strong> node matches for the value &#8216;Computer&#8217; are not the same as matches for the value &#8216;computer.&#8217; To work around case-sensitivity in XQuery statements, you can use either the <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/cc645590.aspx\">upper-case()<\/a> or <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/cc645589.aspx\">lower-case()<\/a> functions: <\/p>\n<pre class=\"lang:tsql theme:ssms2012\"> --use XQuery nodes() method within a CTE to get distinct authors, resolving case-sensitivity\nWITH XmlFile (Contents) AS (\nSELECT CONVERT (XML, BulkColumn) \nFROM OPENROWSET (BULK 'C:\\Books.xml', SINGLE_BLOB) AS XmlData\n)\nSELECT DISTINCT x.[Computer Book Authors] \nFROM\n(\n\tSELECT c.value('(author)[1]', 'varchar(50)') AS [Computer Book Authors]\n\tFROM   XmlFile CROSS APPLY Contents.nodes ('(\/\/book\/genre[lower-case(.) = \"computer\"]\/..)') AS t(c)\n)x;<\/pre>\n<p>The lower-case() XQuery function forces the genre node value to be lower-case, and then compares it with the lower-case value &#8216;computer.&#8217; <\/p>\n<h2>Using the XQuery contains() function to find substrings <\/h2>\n<p>Our query is also currently limited to finding <em>exact<\/em> matches to the lower-case value &#8216;computer&#8217; in <strong>genre<\/strong> nodes. This means that node values with any accidental spaces, for example, will be excluded from our results. Also, if we wanted to search for text in the <strong>description<\/strong> nodes, we would have to type the entire lengthy text in our XQuery statement in order to find matches. To allow for substring searches, XQuery provides the <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms178026.aspx\">contains()<\/a> function. We&#8217;ll use this in conjunction with our current query by making the following changes: <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">--use XQuery nodes() method within a CTE to get distinct authors, resolve case-sensitivity - returning substring matches\nWITH XmlFile (Contents) AS (\nSELECT CONVERT (XML, BulkColumn) \nFROM OPENROWSET (BULK 'C:\\Books.xml', SINGLE_BLOB) AS XmlData\n)\nSELECT DISTINCT x.[Computer Book Authors] \nFROM\n(\n\tSELECT c.value('(author)[1]', 'varchar(50)') AS [Computer Book Authors]\n\tFROM   XmlFile CROSS APPLY Contents.nodes ('(\/\/book\/genre[contains(lower-case(.), \"comp\")]\/..)') AS t(c)\n)x;<\/pre>\n<p>We have modified the way the lower-case() function is used so that we could nest it inside of the contains() function. Even though we are only searching for a substring of the <strong>genre<\/strong> node value (&#8216;comp&#8217;), we still get the correct results: <\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1756-67dfdda4-b2a0-40ab-bc4c-52df68c8e177.png\" alt=\"1756-67dfdda4-b2a0-40ab-bc4c-52df68c8e17\" \/><\/p>\n<h2>Importing and filtering XML data using a view <\/h2>\n<p>If we want to establish a more complete, permanent database-oriented solution, we can create a view that encapsulates some of the work we did earlier, allowing for very easy SQL querying against a result set returned by XQuery: <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">--create view to return all XML data from file\nUSE AdventureWorks2012\nGO\nCREATE VIEW v_BooksXML\nAS\n\tWITH XmlFile (Contents) AS (\n\tSELECT CONVERT (XML, BulkColumn) \n\tFROM OPENROWSET (BULK 'C:\\Books.xml', SINGLE_BLOB) AS XmlData\n\t)\n\tSELECT\t\n\t\t\tc.value('(@id)', 'varchar(10)') AS [ID],\n\t\t\tc.value('(author)[1]', 'varchar(100)') AS [Author],\n\t\t\tc.value('(title)[1]', 'varchar(100)') AS [Title],\n\t\t\tc.value('(genre)[1]', 'varchar(50)') AS [Genre],\n\t\t\tc.value('(price)[1]', 'decimal(8,2)') AS [Price],\n\t\t\tc.value('(publish_date)[1]', 'date') AS [Date],\n\t\t\tc.value('(description)[1]', 'varchar(200)') AS [Description]\n\tFROM   XmlFile CROSS APPLY Contents.nodes ('(\/\/book)') AS t(c)\n\nGO<\/pre>\n<p>Now we can simple query the view, without using XQuery at all: <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">--query view\nSELECT DISTINCT Author FROM AdventureWorks2012..v_BooksXML\nWHERE Genre LIKE 'comp%'\nGO<\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1756-ae2ac41e-a88b-4956-a68c-70fafbf873ce.png\" alt=\"1756-ae2ac41e-a88b-4956-a68c-70fafbf873c\" \/><\/p>\n<h2>Efficiency comparisons <\/h2>\n<p>How do all of these different methods of accessing XML file data compare in terms of query speed? To check this, we&#8217;ll run the different methods together as a batch (making changes to WHERE criteria in the FLWOR statements to provide a better comparison to the other queries): <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">--XQuery nodes() method within a CTE\nWITH XmlFile (Contents) AS (\nSELECT CONVERT (XML, BulkColumn) \nFROM OPENROWSET (BULK 'C:\\Books.xml', SINGLE_BLOB) AS XmlData\n)\nSELECT DISTINCT x.[Computer Book Authors] \nFROM\n(\n\tSELECT c.value('(author)[1]', 'varchar(50)') AS [Computer Book Authors]\n\tFROM   XmlFile CROSS APPLY Contents.nodes ('(\/\/book\/genre[contains(lower-case(.), \"comp\")]\/..)') AS t(c)\n)x\n\n--XQuery nodes() method using function\nSELECT DISTINCT x.[Computer Book Authors] \nFROM\n(\n\tSELECT c.value('(author)[1]', 'varchar(50)') AS [Computer Book Authors]\n\tFROM   (SELECT AdventureWorks2012.dbo.fn_BooksXML()) AS y(z) \n\tCROSS APPLY z.nodes ('(\/\/book\/genre[contains(lower-case(.), \"comp\")]\/..)') AS t(c)\n)x\n\n--use a FLWOR statement within CTE\n;WITH XmlFile (Contents) AS (\nSELECT CONVERT (XML, BulkColumn) \nFROM OPENROWSET (BULK 'C:\\Books.xml', SINGLE_BLOB) AS XmlData\n)\nSELECT Contents.query('for $Book in \/catalog\/book\n\t\t\t\tlet $Genre := $Book\/genre\n\t\t\t\tlet $Author := $Book\/author\n\t\t\t    where $Genre [contains(lower-case(.), \"comp\")]\n\t\t\t    return $Author\n\t\t\t   ') AS [Computer Book Authors]\nFROM XmlFile\nGO;\n\n--FLWOR statement using function\nSELECT AdventureWorks2012.dbo.fn_BooksXML().query('for $Book in \/catalog\/book\n\t\t\t\tlet $Genre := $Book\/genre\n\t\t\t\tlet $Author := $Book\/author\n\t\t\t    where $Genre [contains(lower-case(.), \"comp\")]\n\t\t\t    return $Author\n\t\t\t   ') AS [Computer Book Authors]\nGO\n\n--view\nSELECT DISTINCT Author FROM AdventureWorks2012..v_BooksXML\nWHERE Genre LIKE 'comp%'\nGO<\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1756-a40d9e5f-ed99-46ee-ba2d-be4885e70820.png\" alt=\"1756-a40d9e5f-ed99-46ee-ba2d-be4885e7082\" \/><\/p>\n<p><strong><\/strong>The view is the most expensive method by far, and at 71% of the total batch cost, its query cost is roughly 10x either of the others. The differences between using a CTE versus a function seem negligible, as both the nodes() methods and the FLWOR statements cost around 7% of the batch. A minor exception is the FLWOR statement that uses a function &#8211; costing slightly more than its CTE counterpart at 8%. <\/p>\n<h2>Summary <\/h2>\n<p>We started out with an XML file that we wanted to filter for specific results, returning the values in a relational format. We first found a way to pull the XML data into a form that SQL Server could easily access and run queries against, using the OPENROWSET function. A more permanent extension of that idea was to create a function that encapsulated the same logic. We made several attempts to find the proper query solution, at first using an iterative FLWOR statement, but that did not return results relationally. We then tried using a plain XQuery value() method query, but discovered that the value() method&#8217;s singleton requirement restricted us to returning only one record at a time. We ended up finding a good resolution by using the XQuery nodes() method &#8211; the advantage that nodes() has is that it returns a rowset of the XML data, shredded and organized relationally. We also made a few modification to our nodes() query, eliminating the case-sensitivity and exact-value-match restrictions. We then incorporated some of that logic into a view that can be filtered using a SQL Server WHERE clause, rather than using XQuery criteria. The view solution is arguably the most straightforward solution, providing the creation of a permanent object is allowed. However, a batch query cost comparison showed us that the view method is by far the most costly. <\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>When you need to shred just part of the data within a large XML file into a SQL Server table, the most efficient way is to just select what you need via XQuery or by using XPath, before shredding it into a table. But precisely how would you do that?&hellip;<\/p>\n","protected":false},"author":221920,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143527],"tags":[4170,5829,4150,4151,4217,5830],"coauthors":[],"class_list":["post-1595","post","type-post","status-publish","format-standard","hentry","category-database-administration-sql-server","tag-database-administration","tag-file-querying","tag-sql","tag-sql-server","tag-xml","tag-xml-in-sql-server"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/1595","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/221920"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=1595"}],"version-history":[{"count":5,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/1595\/revisions"}],"predecessor-version":[{"id":92240,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/1595\/revisions\/92240"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=1595"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=1595"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=1595"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=1595"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}