Precision Indexing: Basics of Selective XML Indexes in SQL Server 2012

Comments 0

Share to social media

SQL Server 2012 has introduced a brand-new feature called ‘Selective XML Indexes’, available in the SP1 update. Selective XML indexes work to extend the existing XML indexing options by offering what can be a much more efficient indexing solution. Selective XML indexes have the benefits of standard XML indexes, but have a dramatically reduced storage and maintenance requirement. They can be customized during creation to accommodate specific types of XML queries.

How Are Selective XML Indexes Different?

One of the main compromises of using standard XML indexes is that you’re likely to have to use huge amounts of disk space to store the index. When a standard XML index is created, the entire XML document or fragment is shredded to provide a relational version of the data because a relational index table is actually created from the nodes in the XML document. This index table is usually many times larger than the original size of the XML document. Even if you only need to query certain nodes or paths, the XML indexing process shreds all portions of the document. Selective XML indexes, however, shred only the node paths of the XML document that are explicitly specified on index creation – thereby greatly reducing the index overhead in terms of storage space and creation time. Therefore, the main advantages of using a selective XML index over standard XML indexes are:

  • Greatly decreased storage requirements
  • Greatly decreased index creation times
  • Greatly decreased index maintenance costs

Selective XML Index Basics

To demonstrate the advantages of selective XML indexes, let’s run a script that will generate an XML representation of the SalesOrderHeader and SalesOrderDetails tables from the AdventureWorks2012 sample database:

The reason we’ve added the ‘ID’ column to our new Sales_XML table is because a primary key is required in order to use selective XML indexes. After we’ve run the above script, let’s take a look at a random XML fragment:

You’ll notice that the XML fragment contains relevant data from the SalesOrderHeader record, combined with data from all associated SalesOrderDetail records. In this case, there are two associated SalesOrderDetail records. We can quickly verify that this is an accurate count by checking the originating table:

1729-2ef4e19c-433c-4dd8-8acb-df9c2476899

There are indeed two SalesOrderDetail records for SalesOrderID 67180. We will also check to see that a record has been generated for every sales order (SalesHeaderOrder record):

1729-110b547b-a9f7-4f28-ae6c-57b4b69fd94

This verifies that there exists an XML fragment (containing all related SalesOrderDetail records) in the new Sales_XML table for every SalesOrderID in the SalesOrderHeader table.

Now that we have XML data, we can start experimenting with a selective XML index. Although we are only allowed to have one selective XML index per XML column, we can customize a single index to work with several types of XML queries. Initially, let’s plan on writing queries that filter the XML fragments by SalesPersonID, like the following:

This query will give us the number of sales orders that were handled by the salesperson having ID #276. Let’s run the query before we add a selective XML index – but first we’ll SET STATISTICS TIME ON so we can see the query time before and after creating the index:

Now we’ll run the above ‘sales order count’ query:

1729-9fdc1a52-180c-421f-a4dc-23d44715e46

In the Messages tab, we see that the query execution time was 428 ms:

1729-a6941a4a-f71f-4cdf-b6af-62418aba765

To see if we can speed up this type of query, we’ll create a basic selective XML index:

Note that we’ve customized this index for a specific XQuery path (/SalesOrders/SalesOrder/SalesPersonID’), and we’ve given it a name (pathSalesPersonID). However, we get the following error when running the create index statement:

1729-2b7d52c1-7271-477a-83e8-f8f6123967f

This is because selective XML indexes must be enabled for each database. To enable selective indexes, use the sp_db_selective_xml_index stored procedure:

1729-184f1319-14f8-4b9f-827d-6ca64681035

Interestingly, in order to reverse this, the database must be set to the simple recovery model. If the recovery model is set otherwise, the following message is seen:

1729-3c7f3f0a-6d86-4c8e-95d7-8c4786e823a

Now that selective XML indexes are enabled for the Sales_XML database, we can run the create index script again, and we can also check to make sure that the index was created:

1729-bdb44cf4-c658-4b93-b7c4-c54de3a1d8d

The index exists. We can also verify that it is a selective XML index by expanding the Indexes node for the table in Object Explorer:

1729-8d85065d-ef65-4f8c-875a-13f3482f252

Let’s run our query again:

We now see a greatly reduced query execution time:

1729-1dc3792e-18e3-4edc-a560-162070e3891

That’s a reduction in query execution time by more than 90%!

Selective XML Index Optimizations

Selective XML indexes make allowances for some fine-tuning customizations of path specifications, using either XQuery or SQL Server data types (for untyped XML). By using XQuery data types, we can take advantage of what is called ‘user-specified mapping mode’. In this mode, optional optimization hints and XQuery data types are defined, in addition to the XQuery path specifications. When default mapping mode is used, only the path is specified. SQL Server data types, on the other hand, are designed to efficiently handle return values from the XQuery value() method. Although better query performance can be expected when using optimization hints on untyped XML, there is a risk of failed cast operations. This is because the index assumes that data of the correct type will be present in the path, when this may not always be the case. By using default mapping mode (no optimization hints), one can guarantee that there will be no cast operation failures. In the case of typed XML, there is no reason to specify the data type, as data types are discovered by using the schema.

We’re using untyped XML, so let’s try to improve our query execution time by adding an optimization hint. Before we specify an appropriate hint, we will need to define an XQuery data type for the SalesPersonID node.

The available XQuery types for untyped XML are:

  • xs:boolean
  • xs:double
  • xs:string
  • xs:date
  • xs:time
  • xs:dateTime

Notice that there is no xs:integer type. We’ll use the closest available type, xs:double. Also, since we are familiar with our data, we know that there is always only one SalesPersonID node in its parent (SalesOrder node). This will allow us to implement the SINGLETON optimization hint. The SINGLETON hint expects that a node appears only once inside its parent node. Again, as with data type casting, there is a slight risk of failure if the SINGLETON hint is used where there may be more than one node (of its kind) in a parent node.

Before we recreate our index to include our optimization hint, let’s record the current size of the index:

1729-b9782a21-e81e-4d9c-a5f6-3d52ae0a72c

The current index size is 1432 KB. We’ll now drop the selective index and recreate it, this time using an XQuery type and optimization hint:

Does this optimization improve our query speed? Let’s check it by running our sales order count query once again:

1729-094f055a-0718-46c9-8327-97750751f96

This shows a reduction in query time by almost another 90%, down to 4 ms! We can see how important it is to tune the index by using appropriate optimization hints. A quick index size check reveals only a tiny storage cost increase (8 KB) that is offset by a huge performance gain:

1729-9331b2a0-9e1d-413e-b942-a3fd92eb926

Adding Additional XQuery Paths to the Index

Suppose that we must also support slightly more complex queries that find PurchaseOrderNumbers for a given salesperson, using our existing selective index. We’ll be dealing with queries that look like this:

Before we do anything to our existing selective index, let’s see how this query performs with the current index configuration:

1729-9344ce8a-ab02-4886-a588-70db37f1822

Remember how we mentioned that you can handle XQuery value() method return values effectively by using SQL Server data types in a selective index path? Let’s try using a SQL Server data type here, instead of an XQuery data type. We can still use an optimization hint in conjunction with the SQL Server data type. We don’t have to start over with our index; all we have to do is add a new path:

You’ll have seen that we defined the SQL Server data type as VARCHAR(13). This is because the maximum length of any existing PurchaseOrderNumber is 13 (If we expect that there could be future data with longer PurchaseOrderNumber values, we would want to allow for this by using a larger VARCHAR size).

1729-26195e4e-692b-4f52-94bb-6a78faa57e4

Since there is always just one PurchaseOrderNumber node in its parent, we can use the SINGLETON hint once again in this new path. Now that we’ve added a second path to our selective XML index, let’s run our query again, and see if the new path specification will reduce our current query execution time (22 ms):

1729-356ba43c-5a6b-42bb-9317-aee628cb9e8

The addition of the second path specification has reduced the execution time by more than 50%.

Storage Savings

Our current index storage cost, which includes two path specifications and associated hints, is 1536 KB:

1729-fd4e66bb-a839-4668-b53c-b902ba416b6

Let’s take a look at what our storage cost would have been if we had used a primary XML index instead of a selective XML index. Remember, standard XML indexes shred and index the entire XML document, instead of only the specific paths that will be used for queries, as selective indexes do. Let’s replace our selective index with a primary XML index to see what our hypothetical storage savings are:

1729-78227e35-db47-4eec-8662-e4dc71840b5

A huge increase in storage space is seen. So, by using a selective XML index instead of an ordinary primary XML index, we’ve reduced our storage cost by over 98%.

Requirements/Limitations

Some prerequisites and limitations regarding selective XML indexes include:

  • A clustered index and a primary key are required on the table that a selective XML index is created on. If a selective index creation is attempted on a table with no primary key, the following error will result:

1729-07a1a18d-7d31-4984-8862-b7c1f83fd31

  • Each XML column in a table can have only one selective XML index. Conversely, a selective XML index must be created to index only one column. In other words, an XML column can have at most one selective index, and a single selective index cannot span multiple XML columns. However, a table can contain up to 249 selective indexes.

When Not to Use Selective XML Indexes

Microsoft recommends using selective XML indexes for most XML query operations. However, if you find that a large number of node paths need to be mapped in the selective index, you may benefit more from standard XML indexes than selective indexes. Also, selective indexes are not recommended if your queries search for unknown elements or unknown node locations.

Summary

We’ve gone over some operations involving basic uses of selective XML indexes. We’ve used examples that showed the advantages to using selective XML indexes over standard XML indexes – we saw that index storage and creation time costs are greatly reduced with selective indexes. We also implemented optimization hints, and discovered how they can considerably increase index performance. Finally, we looked into some of the basic prerequisites to using selective indexes, such as the clustered primary key requirement, and the one-selective-index-per-column limitation.

About the author

Seth Delconte

See Profile

Seth Delconte is a SQL Server Developer. In addition to SQL Server database development and administration, he enjoys C#.NET and Linux development.

Seth Delconte's contributions