{"id":1410,"date":"2012-09-25T00:00:00","date_gmt":"2012-09-25T00:00:00","guid":{"rendered":"https:\/\/test.simple-talk.com\/uncategorized\/getting-started-with-xml-indexes\/"},"modified":"2021-08-24T13:40:05","modified_gmt":"2021-08-24T13:40:05","slug":"getting-started-with-xml-indexes","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/database-administration-sql-server\/getting-started-with-xml-indexes\/","title":{"rendered":"Getting Started With XML Indexes"},"content":{"rendered":"<div id=\"pretty\">\n<h2>Introduction<\/h2>\n<p class=\"start\">Today, many organizations store and transmit data in an XML data format. Because of the ubiquity and readability of the XML format, it has become convenient and popular to both consume and provide data using XML. Because XML data is used so widely, it is increasingly important to be able to query XML values efficiently. XML indexes were designed to complement the <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms345117(v=sql.90).aspx\">XML data type<\/a> and have been available since SQL Server 2005. When these indexes are used properly, they can dramatically reduce execution time in queries made against XML columns.<\/p>\n<h2>How Do XML Indexes Work?<\/h2>\n<p>When no indexes are present, an XML field must be &#8216;shredded&#8217; when the query is executed. This means that the XML data is peeled apart from the XML tags, and organized into a relational format. An XML index does this work ahead of time, representing the XML data in an already-shredded version, thereby allowing easy filtering.<\/p>\n<h3>Some XML Index Ground Rules<\/h3>\n<p>XML indexes can be applied only to columns of the XML data type. In order to use XML indexes on a specific table, the table <strong>must<\/strong> have a <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/aa933131(v=SQL.80).aspx\">clustered index<\/a> on the primary key column (a primary key constraint includes a clustered index upon creation by default).<\/p>\n<p>There are 2 types of XML indexes &#8211; <strong>primary<\/strong> and <strong>secondary<\/strong>.<\/p>\n<p>A <a href=\"http:\/\/technet.microsoft.com\/en-us\/library\/bb500237.aspx\">primary XML index<\/a> essentially contains one row of information for each node in the XML column. This information is made up of XML node values, types, and paths. A primary index is a &#8216;pre-shredded&#8217; representation of the XML blob &#8211; an easy-to-filter copy of the XML data.<\/p>\n<p><a href=\"http:\/\/technet.microsoft.com\/en-us\/library\/bb522562.aspx\">Secondary XML indexes<\/a> are dependent on primary XML indexes &#8211; you cannot create any secondary indexes without first having created a primary index.<\/p>\n<p>Secondary XML indexes come in 3 types: <strong>PATH<\/strong>, <strong>VALUE<\/strong>, and <strong>PROPERTY<\/strong> indexes. Secondary indexes are designed to provide additional optimization for certain types of XML queries.<\/p>\n<ul>\n<li><strong>PATH<\/strong> secondary indexes often help to optimize queries that make use of a path. An <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms189869.aspx\">exist()<\/a> method in the query usually indicates that a <strong>PATH<\/strong> index may improve the query execution.<\/li>\n<li><strong>PROPERTY<\/strong> secondary indexes are designed to be helpful with queries where the primary key field is used, and where multiple values are retrieved from a single XML instance.<\/li>\n<li><strong>VALUE<\/strong> secondary indexes are good for queries that use wildcards in a path to find known values &#8211; if the full path is not known, but the value being filtered IS known, a <strong>VALUE<\/strong> index will probably optimize the query execution.<\/li>\n<\/ul>\n<p>The basic syntax to <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/bb934097.aspx\">create<\/a> an XML index is:<\/p>\n<pre>CREATE [ PRIMARY ] XML INDEX index_name\r\n\u00a0\u00a0\u00a0\u00a0ON &lt;object&gt; ( xml_column_name )\r\n\u00a0\u00a0\u00a0\u00a0[ USING XML INDEX xml_index_name\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0[ FOR { VALUE | PATH | PROPERTY } ] ]\r\n\u00a0\u00a0\u00a0\u00a0[ WITH ( &lt;xml_index_option&gt; [ ,...n ] ) ]\r\n[ ; ] <\/pre>\n<h3>Using a Primary XML Index<\/h3>\n<p>Let&#8217;s jump right in and start experimenting with XML indexes.<\/p>\n<p>We have an existing Employee table with <strong>500000<\/strong> employee records. A sampling of the table is shown below:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">--get sample from employee data table\r\nSELECT TOP 1 * FROM Employee\r\nGO<\/pre>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1567-1-97bd54b1-f2aa-4c49-becb-88898ad6f621.png\" alt=\"1567-1-97bd54b1-f2aa-4c49-becb-88898ad6f\" \/><\/p>\n<p>We will need to generate XML data from the Employee table for the purposes of testing the benefits of XML indexes. To do this, we&#8217;ll create a script that builds out an XML representation of the Employee data fields &#8211; but first, we&#8217;ll need to create a table to hold the XML data:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">--create table to hold xml data\r\nCREATE TABLE EMP_XML\r\n(\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ID INT PRIMARY KEY, \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 EMP_DETAILS XML\r\n)\r\nGO <\/pre>\n<p>We&#8217;ve created the table with a primary key, which by default creates a clustered index on the primary key column &#8211; a necessity if we plan on creating XML indexes on this new table. We can now run a script that populates the new <strong>EMP_XML<\/strong> table. Our script will do this via the <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms178107.aspx\"><strong>FOR XML<\/strong><\/a> clause, using the <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms189885.aspx\"><strong>PATH<\/strong><\/a> mode to create the &#8216;Employee&#8217; root node:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">--run script to convert employee records to xml data\r\nDECLARE @counter INT = 1\r\nDECLARE @row VARCHAR(MAX) \r\n\r\nWHILE @counter &lt;= 500000\r\n\u00a0\u00a0 BEGIN\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 SET @row = (\u00a0\u00a0\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 SELECT\u00a0\u00a0\u00a0\u00a0SSN\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0AS '@ssn',\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0DOB\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0AS '@dob',\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0FirstName\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 AS 'FirstName',\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0MiddleName\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0AS 'MiddleName',\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0LastName\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0AS 'LastName',\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0[State]\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0AS 'Address\/@state',\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0ZipCode\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0AS 'Address\/@zip',\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0Street\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0AS 'Address\/Street',\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0City\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0AS 'Address\/City',\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0ID\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0AS 'Internal\/@ID',\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0Salary\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0AS 'Internal\/Salary',\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0HireDate\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0AS 'Internal\/Hiredate',\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0Department\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0AS 'Internal\/Department'\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 FROM Employee\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 WHERE ID = @counter\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 FOR XML PATH ('Employee')\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 )\r\n\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 INSERT EMP_XML VALUES(@counter,@row)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 SET @counter += 1\r\n\u00a0\u00a0 END <\/pre>\n<p>Notice that we&#8217;ve designated some of the source table&#8217;s fields as nodes or nested nodes, and others as node attributes in the XML table. We now have a table containing all 500000 records of employee data, in XML format:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">--get sampling of new xml records\r\nSELECT TOP 5 * FROM EMP_XML\r\nGO<\/pre>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1567-1-726509a0-b434-4cd4-9164-3fc15d2d8e60.png\" alt=\"1567-1-726509a0-b434-4cd4-9164-3fc15d2d8\" \/><\/p>\n<p>The first record&#8217;s EMP_DETAILS field, expanded:<\/p>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1567-1-3f688bfe-f38c-4ff0-93f5-96f3c6d2cd9b.png\" alt=\"1567-1-3f688bfe-f38c-4ff0-93f5-96f3c6d2c\" \/><\/p>\n<p>We can see that all fields from the source table are now represented as XML nodes and attributes. We&#8217;ll use this table to test the performance benefits of the XML index.<\/p>\n<h3>Comparing Query Execution Times<\/h3>\n<p>Let&#8217;s now look at some basic filtering operations that we may want to perform on our new employee XML data.<\/p>\n<p>Before running any queries, let&#8217;s turn on <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms190287.aspx\"><strong>STATISTICS TIME<\/strong><\/a> to measure the query execution time:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">--turn on statistics time\r\nSET STATISTICS TIME ON\r\nGO <\/pre>\n<p>We&#8217;ll now use a simple example to demonstrate the benefits of using a <strong>primary XML index<\/strong>. To locate the street address for an employee having a Social Security Number of <strong>574582264<\/strong>, we can use the following XQuery statement:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">--find employee address by ssn\r\nSELECT EMP_DETAILS.value('(\/Employee\/Address\/Street)[1]','varchar(50)') AS StreetAddress \r\nFROM EMP_XML\r\nWHERE EMP_DETAILS.exist('\/Employee[@ssn=574582264]') = 1\r\nGO<\/pre>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1567-1-31a4d35d-d925-4219-ae36-9fba9cb562f9.png\" alt=\"1567-1-31a4d35d-d925-4219-ae36-9fba9cb56\" \/><\/p>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1567-1-c5574b08-fde0-44e0-abbe-33429cc36762.png\" alt=\"1567-1-c5574b08-fde0-44e0-abbe-33429cc36\" \/><\/p>\n<p>The <strong>elapsed time,<\/strong> under <strong>SQL Server Execution Times<\/strong>in the <strong>Messages<\/strong> tab<strong>,<\/strong> represents the actual query execution time in milliseconds. We can see that the query time was <strong>2463<\/strong> ms. Now let&#8217;s add a <strong>primary XML index<\/strong> to the table:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">--create primary xml index\r\nCREATE PRIMARY XML INDEX [IX_EMP_DETAILSx] \r\nON EMP_XML(EMP_DETAILS)\r\nGO<\/pre>\n<p>We&#8217;ve <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/bb934097.aspx\">created<\/a> a primary index on the <strong>EMP_DETAILS<\/strong> column of the table &#8211; the only XML field. Creating the primary XML index on a table with this many records may take a few minutes. XML data is being collected and stored during the primary XML index creation, including node values, node types, and the paths from each node to the root.<\/p>\n<p>Now that a primary XML index is on the <strong>EMP_DETAILS<\/strong> field, we&#8217;ll run the same query again:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">--find employee address by ssn\r\nSELECT EMP_DETAILS.value('(\/Employee\/Address\/Street)[1]','varchar(50)') AS StreetAddress \r\nFROM EMP_XML\r\nWHERE EMP_DETAILS.exist('\/Employee[@ssn=574582264]') = 1\r\nGO <\/pre>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1567-1-7d85f9ad-973f-4deb-9129-740c6b4feff0.png\" alt=\"1567-1-7d85f9ad-973f-4deb-9129-740c6b4fe\" \/><\/p>\n<p>We see that the elapsed time <strong>and<\/strong> CPU time were both <strong>reduced<\/strong> by about <strong>60%<\/strong> as a result of creating the primary XML index! We can tell for sure that the XML index was used by examining the <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms178071(v=SQL.105).aspx\">graphical execution plan<\/a>. To see this in <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms174173.aspx\">SSMS<\/a>, go to the <strong>Query<\/strong> menu before running the query and then select <strong>Include Actual Execution Plan<\/strong>. After running the query, look at the results in the Execution Plan tab:<\/p>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1567-1-83d0415d-2173-4512-bdf9-795dd6e27f5a.png\" alt=\"1567-1-83d0415d-2173-4512-bdf9-795dd6e27\" \/><\/p>\n<p>A node that is displayed for a <strong>PrimaryXML<\/strong> index <strong>seek<\/strong> or <strong>scan<\/strong> in the graphical execution plan indicates that the XML index was indeed used for the query.<\/p>\n<h3>Disabling XML Indexes<\/h3>\n<p>Let&#8217;s assume we have received a request to find <strong>all address information<\/strong> for employees who:<\/p>\n<ol>\n<li>Live in Michigan<\/li>\n<li>Have a first name of &#8216;Desiree&#8217;<\/li>\n<\/ol>\n<p>Before we create a query for this request, let&#8217;s <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms188388.aspx\">disable<\/a> the primary XML index, so we can compare the before and after time statistics. By disabling an index, we can retain the index definition whilst clearing out the stored data about the table. A disabled index is not used by the query execution plan. The current status of XML indexes can be checked by using:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">--check disabled status for xml indexes\r\nSELECT name, is_disabled \r\nFROM sys.indexes \r\nWHERE OBJECT_ID = OBJECT_ID('EMP_XML')\r\nAND type_desc = 'XML'\r\nGO <\/pre>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1567-1-9ce1b681-d33d-4690-ac78-2c0d223640f4.png\" alt=\"1567-1-9ce1b681-d33d-4690-ac78-2c0d22364\" \/><\/p>\n<p>To disable our primary index, we&#8217;ll run the following <strong>ALTER INDEX<\/strong> statement, with the <strong>DISABLE <\/strong>argument:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">--disable primary xml index\r\nALTER INDEX IX_EMP_DETAILSx ON EMP_XML DISABLE\r\nGO <\/pre>\n<p>The primary XML index is now rendered ineffective.<\/p>\n<p>To return the results we need for the &#8216;addresses&#8217; request, we can include a <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms190945.aspx\"><strong>FLWOR<\/strong><\/a> expression in our query, within an XQuery <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms191474.aspx\">query()<\/a> method:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">--get all address data for specific employees\r\nSELECT \r\nEMP_DETAILS.value('(\/Employee\/FirstName)[1]', 'varchar(50)') AS FirstName,\r\nEMP_DETAILS.value('(\/Employee\/LastName)[1]', 'varchar(50)') AS LastName,\u00a0\u00a0\r\nEMP_DETAILS.query(' \r\nfor $Address in \/Employee\/Address\r\nreturn $Address\r\n') AS [Address]\r\nFROM EMP_XML\r\nWHERE EMP_DETAILS.exist('\/Employee\/FirstName[.=\"Desiree\"]') = 1\r\nAND EMP_DETAILS.exist('\/Employee\/Address[@state=\"MI\"]') = 1\r\nGO <\/pre>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1567-1-5a3d504b-e46a-4d6a-9ad4-92c5e35bb08c.png\" alt=\"1567-1-5a3d504b-e46a-4d6a-9ad4-92c5e35bb\" \/><\/p>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1567-1-b90ecb03-dc0f-4f8c-b4ac-666875ef693a.png\" alt=\"1567-1-b90ecb03-dc0f-4f8c-b4ac-666875ef6\" \/><\/p>\n<p>This query gives us the results we are looking for; every element and attribute value under the Address node, and it also includes<strong> FirstName<\/strong> and<strong> LastName<\/strong> values for clarity. The FLWOR expression iterated the Address node and returned the State, Zip, Street, and City values. After running this query a few times, the <strong>elapsed execution time<\/strong> consistently shows a time of around <strong>9539<\/strong>ms. Now that we&#8217;ve tested this query without using any XML indexes, let&#8217;s re-enable our primary XML index.<\/p>\n<h3>Rebuilding XML Indexes<\/h3>\n<p>An XML index, like any index, can be re-enabled by simply using the <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms188388.aspx\">REBUILD<\/a> option in an <strong>ALTER INDEX<\/strong> statement. To rebuild our primary XML index, we&#8217;ll run the following:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">--rebuild primary xml index\r\nALTER INDEX IX_EMP_DETAILSx ON EMP_XML REBUILD\r\nGO <\/pre>\n<p>The primary XML index is now re-enabled. Be aware that rebuilding an index can have serious performance implications, limiting access to the table from other processes. Other types of indexes accept the<strong> REBUILD<\/strong> argument with the <strong>WITH (ONLINE = ON)<\/strong> hint to reduce this performance hit, <strong>but not XML indexes<\/strong>. If we were to attempt a <strong>REBUILD WITH (ONLINE = ON)<\/strong> on our primary XML index, we&#8217;d get the following error:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">--attempt primary xml index rebuild with online=on\r\nALTER INDEX IX_EMP_DETAILSx ON EMP_XML REBUILD WITH (ONLINE = ON)\r\nGO\u00a0\u00a0<\/pre>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1567-1-a03a2fe5-d2bc-4845-aa76-ba8164820c6f.png\" alt=\"1567-1-a03a2fe5-d2bc-4845-aa76-ba8164820\" \/><\/p>\n<p>XML indexes can only use the <strong>WITH (ONLINE = OFF)<\/strong> option.<\/p>\n<p>Now that we have re-enabled our primary XML index, it will be considered in the query execution plan. Let&#8217;s re-run our previous query and examine the statistics time again:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">--get all address data for specific employees\r\nSELECT \r\nEMP_DETAILS.value('(\/Employee\/FirstName)[1]', 'varchar(50)') AS FirstName,\r\nEMP_DETAILS.value('(\/Employee\/LastName)[1]', 'varchar(50)') AS LastName,\u00a0\u00a0\r\nEMP_DETAILS.query(' \r\nfor $Address in \/Employee\/Address\r\nreturn $Address\r\n') AS [Address]\r\nFROM EMP_XML\r\nWHERE EMP_DETAILS.exist('\/Employee\/FirstName[.=\"Desiree\"]') = 1\r\nAND EMP_DETAILS.exist('\/Employee\/Address[@state=\"MI\"]') = 1\r\nGO <\/pre>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1567-1-6e3cf8a4-ff4d-4f3a-a6f1-676b1ea35d1d.png\" alt=\"1567-1-6e3cf8a4-ff4d-4f3a-a6f1-676b1ea35\" \/><\/p>\n<p>Our previous execution time was <strong>9539<\/strong> milliseconds. We&#8217;ve reduced the time by <strong>90%<\/strong> by using just a primary XML index! Although this is a great optimization gain, let&#8217;s see if we can increase efficiency for this query even more, using a secondary XML index.<\/p>\n<h3>Using Secondary XML Indexes<\/h3>\n<p>How can we decide which <a href=\"http:\/\/technet.microsoft.com\/en-us\/library\/bb522562(v=SQL.100).aspx\">secondary index<\/a> to use for our query?<\/p>\n<p><strong>PATH<\/strong> secondary indexes are designed to help improve queries that contain a good amount of path expressions. Queries that use the <strong>exist()<\/strong> method are usually good candidates for a <strong>PATH<\/strong> index. Our last query uses 2 instances of the <strong>exist()<\/strong> method in its <strong>WHERE<\/strong> clause, and also includes path expressions in the select list. The <strong>PATH<\/strong> index seems promising for our query.<\/p>\n<p><strong>PROPERTY<\/strong> secondary indexes are intended for queries where the primary key is a known value, and multiple return values are required from the same XML instance. We are returning multiple return values from the same XML instance. A <strong>PROPERTY<\/strong> index also seems like it would be helpful with our query.<\/p>\n<p><strong>VALUE<\/strong> secondary indexes are useful for searching for known element or attribute values, without necessarily knowing the element or attribute names, or the full path to the value. Queries that use wildcards for portions of a path would probably benefit from a <strong>VALUE<\/strong> index. In our query, we <strong>do<\/strong> know the paths to the elements and attributes we are filtering, as well as the values. A <strong>VALUE<\/strong> index could help, but probably not as much as a <strong>PATH<\/strong> index would.<\/p>\n<h3>Using PATH Secondary XML Indexes<\/h3>\n<p>The <strong>PATH<\/strong> secondary XML index is known to help with queries that have an <strong>exist()<\/strong> expression in the <strong>WHERE<\/strong> clause. Since our query uses 2 instances of the <strong>exist()<\/strong> method, let&#8217;s try a<strong>PATH<\/strong> secondary index, and see if it improves performance. We&#8217;ll create the<strong>PATH<\/strong> index as follows:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">--create secondary xml path index\r\nCREATE XML INDEX [IX_EMP_DETAILS_pathx] ON EMP_XML(EMP_DETAILS)\r\nUSING XML INDEX [IX_EMP_DETAILSx]\r\nFOR PATH\u00a0\u00a0\u00a0\u00a0\r\nGO <\/pre>\n<p>Note that we need to reference the primary XML index (<strong>IX_EMP_DETAILSx<\/strong>) in the index create statement with the<strong> USING<\/strong> clause, to specify that it will be a secondary XML index. A primary XML index is required in order to implement any secondary index. The &#8216;<strong>FOR PATH&#8217; <\/strong>option is what designates the index as a <strong>PATH <\/strong>index. Now that we have a <strong>PATH<\/strong> secondary index, let&#8217;s re-run our query and view the execution time results:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">--get all address data for specific employees\r\nSELECT \r\nEMP_DETAILS.value('(\/Employee\/FirstName)[1]', 'varchar(50)') AS FirstName,\r\nEMP_DETAILS.value('(\/Employee\/LastName)[1]', 'varchar(50)') AS LastName,\u00a0\u00a0\r\nEMP_DETAILS.query(' \r\nfor $Address in \/Employee\/Address\r\nreturn $Address\r\n') AS [Address]\r\nFROM EMP_XML\r\nWHERE EMP_DETAILS.exist('\/Employee\/FirstName[.=\"Desiree\"]') = 1\r\nAND EMP_DETAILS.exist('\/Employee\/Address[@state=\"MI\"]') = 1\r\nGO <\/pre>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1567-1-6c35585e-8778-4e9a-8235-3599f16bd69d.png\" alt=\"1567-1-6c35585e-8778-4e9a-8235-3599f16bd\" \/><\/p>\n<p>We see another huge performance gain by using the <strong>PATH<\/strong> secondary XML index &#8211; another reduction in execution time by <strong>more than 60%<\/strong>.<\/p>\n<h3>Using PROPERTY Secondary XML Indexes<\/h3>\n<p>With our &#8216;addresses&#8217; query, a <strong>PROPERTY<\/strong> secondary XML index could be slightly more advantageous than a <strong>PATH<\/strong> index, because in addition to using paths, our query also returns multiple values. Let&#8217;s experiment and see if adding a <strong>PROPERTY<\/strong> index will increase optimization.<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">--create secondary xml property index\r\nCREATE XML INDEX [IX_EMP_DETAILS_propertyx] \r\nON EMP_XML(EMP_DETAILS)\r\nUSING XML INDEX [IX_EMP_DETAILSx]\r\nFOR PROPERTY\u00a0\u00a0\u00a0\u00a0\r\nGO\r\n\r\n--get all address data for specific employees\r\nSELECT \r\nEMP_DETAILS.value('(\/Employee\/FirstName)[1]', 'varchar(50)') AS FirstName,\r\nEMP_DETAILS.value('(\/Employee\/LastName)[1]', 'varchar(50)') AS LastName,\u00a0\u00a0\r\nEMP_DETAILS.query(' \r\nfor $Address in \/Employee\/Address\r\nreturn $Address\r\n') AS [Address]\r\nFROM EMP_XML\r\nWHERE EMP_DETAILS.exist('\/Employee\/FirstName[.=\"Desiree\"]') = 1\r\nAND EMP_DETAILS.exist('\/Employee\/Address[@state=\"MI\"]') = 1\r\nGO <\/pre>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1567-1-7d55d440-fb39-468d-bede-aaeb72bceab0.png\" alt=\"1567-1-7d55d440-fb39-468d-bede-aaeb72bce\" \/><\/p>\n<p>After running the &#8216;addresses&#8217; query again, this time with a <strong>PROPERTY<\/strong> index in place, we see yet another reduction in execution time &#8211; from <strong>332<\/strong> ms. to <strong>190<\/strong> ms. It seems pretty obvious that the <strong>PROPERTY<\/strong> index was the reason for the performance gain here, but how can we tell for sure? We now have 2 different secondary indexes on our table, <strong>PATH<\/strong> and <strong>PROPERTY<\/strong>, and their nodes both look similar in the graphical execution plan:<\/p>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1567-1-d6a8654d-d69c-439c-9bc7-783ac89720fa.png\" alt=\"1567-1-d6a8654d-d69c-439c-9bc7-783ac8972\" \/><\/p>\n<p>To find out, all we have to do is hover over the <strong>SecondaryXML<\/strong> node and look in the &#8216;Object&#8217; section of the popup &#8211; OR right-click the node, select Properties, and examine the &#8216;Object&#8217; section of the pane:<\/p>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1567-1-6cf8db62-5162-4662-8043-ad62d47c58e5.png\" alt=\"1567-1-6cf8db62-5162-4662-8043-ad62d47c5\" \/><\/p>\n<p>We can clearly see the name of the index (<strong>IX_EMP_DETAILS_propertyx<\/strong>) that was used in the query. The <strong>PROPERTY<\/strong> index was utilized for our query. This is a good example of how experimenting with secondary XML indexes can result in highly optimized queries.<\/p>\n<h3>Using VALUE Secondary XML Indexes<\/h3>\n<p>We have received another request: write a query that returns all details for all employees living in zip code <strong>98679<\/strong>. Let&#8217;s imagine that we aren&#8217;t familiar with the XML structure, but still need to query it without bothering to look up schema details. We are not sure of the node attribute name &#8211; is it &#8216;<strong>zip<\/strong>&#8216;, &#8216;<strong>zipcode<\/strong>&#8216;, or &#8216;<strong>zip code<\/strong>&#8216;? We <strong>do<\/strong> know that it is listed as a node attribute, not a node element. We&#8217;ll choose a query that reflects what we know about the XML structure:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">--get all employee details for specific zipcode\r\nSELECT ID, EMP_DETAILS\r\nFROM EMP_XML\r\nWHERE EMP_DETAILS.exist('\/\/@*[.=\"98679\"]') = 1\r\nGO <\/pre>\n<p>We&#8217;ve used the &#8216;<strong>\/\/<\/strong>&#8216; path wildcard to indicate that we don&#8217;t know the full path to the attribute, and the &#8216;<strong>*<\/strong>&#8216; attribute wildcard to allow for the value to be found in <strong>any<\/strong> element attribute. The &#8216;<strong>@<\/strong>&#8216; shows that we know that the zip code value is an element attribute value.<\/p>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1567-1-d6ae2520-11bd-469d-a5cc-c5a505726949.png\" alt=\"1567-1-d6ae2520-11bd-469d-a5cc-c5a505726\" \/><\/p>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1567-1-d68aa08a-8c32-4d73-b997-721d03371d1f.png\" alt=\"1567-1-d68aa08a-8c32-4d73-b997-721d03371\" \/><\/p>\n<p>The results are processed in <strong>1809<\/strong> ms. If we look at the graphical execution plan, we see that the <strong>PATH<\/strong> secondary index was used almost exclusively:<\/p>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1567-1-617d04db-afad-44e8-bde8-e11d85097e38.png\" alt=\"1567-1-617d04db-afad-44e8-bde8-e11d85097\" \/><\/p>\n<p>Since our &#8216;zipcode&#8217; query uses an incomplete path AND a wildcard, it&#8217;s a good candidate for a <strong>VALUE<\/strong> secondary index. Let&#8217;s create a <strong>VALUE<\/strong> index, and test the execution time from a second query run:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">--create secondary xml value index\r\nCREATE XML INDEX [IX_EMP_DETAILS_valuex] ON EMP_XML(EMP_DETAILS)\r\nUSING XML INDEX [IX_EMP_DETAILSx]\r\nFOR VALUE\u00a0\u00a0\u00a0\u00a0\r\nGO\r\n\r\n--get all employee details for specific zipcode\r\nSELECT ID, EMP_DETAILS\r\nFROM EMP_XML\r\nWHERE EMP_DETAILS.exist('\/\/@*[.=\"98679\"]') = 1\r\nGO <\/pre>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1567-1-a252421d-eb68-44d1-aa29-7eeb0ad8dc4d.png\" alt=\"1567-1-a252421d-eb68-44d1-aa29-7eeb0ad8d\" \/><\/p>\n<p>The <strong>VALUE<\/strong>index reduced the query execution time to <strong>less than a millisecond<\/strong>!<\/p>\n<p>We can see the <strong>SecondaryXML<\/strong> index node in the graphical execution plan:<\/p>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1567-1-5a6db675-2a27-40b8-b36a-0ca2199f3e41.png\" alt=\"1567-1-5a6db675-2a27-40b8-b36a-0ca2199f3\" \/><\/p>\n<p>Let&#8217;s check the &#8216;Object&#8217; section of the node&#8217;s Properties pane to confirm that the <strong>VALUE<\/strong> index was used:<\/p>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1567-1-6c0d2aea-4c03-4fc3-ac34-ad44d6b08142.png\" alt=\"1567-1-6c0d2aea-4c03-4fc3-ac34-ad44d6b08\" \/><\/p>\n<p>We see that the <strong>VALUE<\/strong> index (<strong>IX_EMP_DETAILS_valuex<\/strong>) was used for the query.<\/p>\n<p>If we had run this query without <strong>any<\/strong> XML indexes (by disabling the primary index and all secondary indexes), the query execution time results would have been as follows:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">--disable all 3 secondary xml indexes\r\nALTER INDEX [IX_EMP_DETAILS_valuex] ON EMP_XML DISABLE\r\nGO\r\nALTER INDEX [IX_EMP_DETAILS_propertyx] ON EMP_XML DISABLE\r\nGO\r\nALTER INDEX [IX_EMP_DETAILS_pathx] ON EMP_XML DISABLE\r\nGO\r\n\r\n--disable primary xml index\r\nALTER INDEX IX_EMP_DETAILSx ON EMP_XML DISABLE\r\nGO\u00a0\u00a0 \r\n\r\n--get all employee details for specific zipcode\r\nSELECT ID, EMP_DETAILS\r\nFROM EMP_XML\r\nWHERE EMP_DETAILS.exist('\/\/@*[.=\"98679\"]') = 1\r\nGO <\/pre>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1567-1-4937bc4a-0848-401f-a0a8-2e8bf674f1a5.png\" alt=\"1567-1-4937bc4a-0848-401f-a0a8-2e8bf674f\" \/><\/p>\n<p>So, a query that originally cost more than 7 seconds has now been optimized so that it takes <strong>less than 1 millisecond<\/strong>.<\/p>\n<h3>Disadvantages<\/h3>\n<p>Some disadvantages to using XML indexes include:<\/p>\n<ul>\n<li>As with any index, data modifications are more expensive. Updates and inserts to an XML field that is indexed will take longer, as the index(es) must be maintained and updated.<\/li>\n<li><a href=\"https:\/\/social.msdn.microsoft.com\/Forums\/sqlserver\/en-US\/947282c5-762a-4f76-a460-14176730a7d0\/database-size-dramatically-increases-with-the-addition-of-an-xml-primary-index?forum=sqlxml\">XML indexes can claim massive amounts of disk space<\/a>. To get an idea of how our XML indexes have affected disk space, consider how the <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms188776.aspx\"><strong>sp_spaceused<\/strong><\/a> statistics have changed with each index addition:\n<pre class=\"theme:ssms2012 lang:tsql\">--get disk space used info for table\r\nEXEC sp_spaceused 'EMP_XML'\r\nGO<\/pre>\n<ul>\n<li>With no XML indexes:<\/li>\n<li><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1567-1-3125d89f-8b09-4b6a-bb80-e111a9970cd3.png\" alt=\"1567-1-3125d89f-8b09-4b6a-bb80-e111a9970\" \/><\/li>\n<li>After adding the primary XML index:<\/li>\n<li><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1567-1-505cc84a-7470-4873-a449-3bf602650f02.png\" alt=\"1567-1-505cc84a-7470-4873-a449-3bf602650\" \/><\/li>\n<li>After adding the <strong>PATH<\/strong> secondary index:<\/li>\n<li><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1567-1-49f31374-8e68-4528-9ada-00698f74bb87.png\" alt=\"1567-1-49f31374-8e68-4528-9ada-00698f74b\" \/><\/li>\n<li>After adding the <strong>VALUE<\/strong> secondary index:<\/li>\n<li><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1567-1-3a085bbc-c8d5-41a5-b759-9df079985c8e.png\" alt=\"1567-1-3a085bbc-c8d5-41a5-b759-9df079985\" \/><\/li>\n<li>Ater adding the <strong>PROPERTY<\/strong> secondary index:<\/li>\n<li><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1567-1-a8c6bd9e-3841-4e16-bca4-00f84c75e5e9.png\" alt=\"1567-1-a8c6bd9e-3841-4e16-bca4-00f84c75e\" \/><\/li>\n<li>The cumulative index space used for the table was increased by <strong>more than 2GB<\/strong> after adding the primary XML index and the 3 secondary indexes. That is a substantial disk space cost overhead for the XML indexes , considering that the data size of the 500000-row table is only about 300MB.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h2>Conclusion<\/h2>\n<p>We&#8217;ve looked at some of the reasons that using XML indexes can increase a query&#8217;s performance, and have gone over some of the basic rules for using them.<\/p>\n<p>We&#8217;ve delved into some specific test cases in order to prove the performance benefits of XML indexes on the XML data type. We found by comparing time statistics that using a primary XML index can reduce query execution times dramatically. We looked at the different types of secondary XML indexes, and saw that by implementing the correct secondary index, a query&#8217;s performance can be optimized even further.<\/p>\n<p>We also reviewed some disadvantages to using XML indexes, such as increased table modification times due to index maintenance, and increased disk space consumption. However, when XML indexes are used correctly to support specific query demands, any increased disk and index maintenance costs may be seen as a small price for such large query performance gains.<\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>XML Indexes make a huge difference to the speed of XML queries, as Seth Delconte explains; and demonstrates by running queries against half a million XML employee records. The execution time of a query is reduced from two seconds to being too quick to measure, purely by creating the right type of secondary index for the query.&hellip;<\/p>\n","protected":false},"author":221920,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143527],"tags":[4170,5722,4150,4217],"coauthors":[48345],"class_list":["post-1410","post","type-post","status-publish","format-standard","hentry","category-database-administration-sql-server","tag-database-administration","tag-indexing","tag-sql","tag-xml"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/1410","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/221920"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=1410"}],"version-history":[{"count":5,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/1410\/revisions"}],"predecessor-version":[{"id":73071,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/1410\/revisions\/73071"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=1410"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=1410"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=1410"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=1410"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}