Using the FOR XML Clause to Return Query Results as XML

The FOR XML clause in SQL Server causes a lot of difficulty, mainly because it is rather poorly explained in Books-on-Line. We challenged Bob Sheldon to make it seem simple. Here is his sublime response.

SQL Server lets you retrieve data as XML by supporting the FOR XML clause, which can be included as part of your query. You can use the FOR XML clause in the main (outer) query as well as in subqueries. The clause supports numerous options that let you define the format of the XML data.

When you include the FOR XML clause in your query, you must specify one of the four supported modes-RAW, AUTO, EXPLICIT, or PATH. The options available to each mode vary according to that mode; however, many of the options are shared among the modes. In this article, I explain how to use each of these modes to retrieve data as XML and provide examples that demonstrate how they use the various options.

The RAW Mode

The RAW mode generates a single XML element for each row in the result set returned by the query.

To use the FOR XML clause in RAW mode, you simply append the clause and RAW keyword to your SELECT statement, as shown in the following example:

Notice that the SELECT statement itself is a very basic query. (The statement pulls data from the AdventureWorks sample database.) Without the FOR XML clause, the statement would return the following results:

With the addition of the FOR XML clause, the statement returns the data as the following XML:

As you can see, each <row> element maps to a row that is returned by the SELECT statement, and each column, by default, is treated as an attribute of that element.

Note: You can include a FOR XML clause only in SELECT statements, if those statements define the outer, or top-level, query. However, you can also include the clause in INSERT, UPDATE, and DELETE statements that are part of a subquery.

In the preceding example, each element in the XML is named <row> by default. However, you can override the default behavior by providing a name for the element, as the following example shows:

Now the element associated with each row returned by the query will be named <Employee>, rather than the default <row>:

In addition to being able to provide a name for the row element, you can also specify that a root element be created to wrap all other elements. To create a root element, add the ROOT keyword to your FOR XML clause:

Notice that you must include a comma when adding an option such as ROOT in order to separate the elements. As the following results show, a <root> element is now included in the XML:

As with the row element, you can also provide a specific name for the root element:

In this case, I’ve named the root element <Employees>, as shown in the following results:

Up to this point, the examples I’ve shown you have added column values as attributes to each row element. This is the default behavior of the RAW mode. However, you can instead specify that the column values be added as child elements to the row element by including the ELEMENTS option in the FOR XML clause:

Once again, I’ve added a comma to separate the options. As you can see in the following results, each <Employee> element now includes a set of child elements that correspond to the columns returned by the query:

Now the <Employee> elements no longer include any attributes and all data is rendered through individual child elements.

If you refer back to the XML returned by the previous example, you’ll notice that the data for employee 4 (Rob Walters) does not include a middle name. This is because that MiddleName value is null in the source data, and by default, no elements are created for a column whose value is null. However, you can override this behavior by adding the XSINIL keyword to the ELEMENTS option:

Now the results will include an element for the MiddleName column and will include the xsi:nil attribute with a value of true when a value is null, as shown in the following XML:

Notice that the xmlns:xsi attribute has also been added to the root node and provides the name of the default schema instance.

Another important option that is supported by the RAW node is XMLSCHEMA, which specifies that an inline W3C XML Schema (XSD) be included in the XML data. You add the XMLSCHEMA option in the same way you add other options:

As you can see in the following results, the schema is fully defined and is incorporated in the XML results:

When you specify that a schema be created, you can also specify the name of the target namespace. For example, the following FOR XML clause includes the XMLSCHEMA option, followed by the name of the target namespace (urn:schema_example.com):

The statement will return the same results as the previous example, except that the XML will now include the new name of the target namespace.

The SELECT statements shown in the preceding examples have retrieved data from non-XML columns (in this case, integer and string columns). However, your queries might also retrieve data from XML columns. In such cases, the FOR XML clause will incorporate the data retrieved from an XML column into the XML result set.

For example, the following SELECT statement uses the XML query() method to retrieve education-related data from the Resume column in the JobCandidate table:

The query() method itself retrieves the following data from the Resume column:

This data is incorporated into the rest of the result set when you use the FOR XML clause, as shown in the following results:

As you can see, the <ns:Education> element and its child elements have been added to the XML data. The namespace defined on the source data in the XML column is also included.

The AUTO Mode

The AUTO mode in a FOR XML clause is slightly different from the RAW mode in the way that it generates the XML result set. The AUTO mode generates the XML by using heuristics based on how the SELECT statement is defined. The best way to understand how this works is to look at an example. The following SELECT statement, as in the previous examples, retrieves employee data from the AdventureWorks database:

Notice that I’ve provided meaningful alias names to the tables (Employee and Contact info). These names are used in defining the XML element names, so you’ll want to construct your SELECT statements accordingly. Now take a look at the results returned by this query:

As you can see, the <Employee> element has been named automatically based on the table alias name. Notice too that the <ContactInfo> element is a child element of <Employee>. The structure of the elements is based on the order in which the columns are defined in the SELECT list and the tables that are specified in the FROM clause. In this case, because EmployeeID is the first column in the SELECT list and the Employee table is included in the FROM clause, the first element is <Employee>. And because the remaining columns, which are associated with the ContactInfo table, appear next in the SELECT list, they are added as a child element. If an additional table and its columns were included in the SELECT list, after the other columns, they would appear as a child element of <ContactInfo>.

In addition, the columns and their values are added as attributes to the table-related elements. This structure is similar to what you saw in the RAW mode examples. And in the same way, you can override the default behavior by using the ELEMENTS option:

As you can see in the following XML result set, the column values are now included as child elements, rather than attributes:

Notice that the <ContactInfo> element also contains child elements, one for each column.

If you want to include an element for columns with null values, you can use the XSINIL option, as you saw when using the RAW mode:

Now the results will include all elements. That means, if a value is null, the xsi:nil attribute is included:

As you’ve seen in these examples, the XML is based on how the columns are listed in the SELECT list. However, as I mentioned earlier, the XML is also based on the tables listed in the FROM clause. In the preceding examples, the SELECT list contained only columns that are referenced in the FROM clause. If a column is not directly associated with a table in the FROM clause (as in a computed or aggregate column), the column is nested at the deepest level wherever it appears.

For example, the following SELECT statement includes the FullName computed column, which concatenates the first and last names:

Because the FullName column appears in the SELECT list after the EmployeeID column, the FullName column is added as a child element of <Employee>, as shown in the following XML:

As I’ve mentioned, the placement of columns in the SELECT list impacts the resulting XML. This is also the case with computed columns. For example, in the following SELECT statement, I’ve added the FullName column after the EmailAddress column:

Now the FullName column will be added as a child element to the <ContactInfo> element, as the following XML demonstrates.

As these results show, you must be aware of the order you place columns when you define your SELECT list.

Now let’s take a look at another aspect of the AUTO mode. One of the limitations of this mode (as well as the RAW mode) is that the column data is added as either attributes or child elements, depending on whether you specify the ELEMENTS option. However, there might be times when you want to return some of the data as attributes and some as child elements. One method you can use with the AUTO mode is to return some of the data in a subquery. For example, the following SELECT statement includes a subquery that returns the employee’s first and last names:

Notice that the subquery includes a FOR XML clause that uses AUTO mode and includes the ELEMENTS option. The FOR XML clause also includes the TYPE option, which specifies that the data returned by the subquery be returned as the XML type. You must include the TYPE option to preserve the data as XML in the outer SELECT statement.

The outer SELECT statement also includes a FOR XML clause, but the ELEMENTS option is not included. As a result, only the first and last names will be returned as child elements, but the employee ID and login ID will be returned as attributes, as shown in the following XML:

As you can see, subqueries let you maintain some control over the output. However, the AUTO mode (and the RAW mode, for that matter) provides little control over the XML returned by your query. For greater control, you’ll want to use the EXPLICIT mode or the PATH mode.

The EXPLICIT Mode

The EXPLICIT mode provides very specific control over your XML, but this mode is much more complex to use than the RAW or AUTO modes. To use this mode, you must build your SELECT statements in such as way as to define the XML hierarchy and structure. In addition, you must create a SELECT statement for each level of that hierarchy and use UNION ALL clauses to join those statements.

There are a number of rules that describe how to define your SELECT statements when using the EXPLICIT mode, and it is beyond the scope of this article to review all those rules, so be sure to refer to the topic “Using EXPLICIT Mode” in SQL Server Books Online for the details about how to construct your SELECT statements. In the meantime, let’s take a look at a few examples that help demonstrate some of the basic elements of the EXPLICIT mode.

When constructing your SELECT statement, you must include two columns in your SELECT list that describe the XML hierarchy. The first column, Tag, is assigned a numerical value for each level of the hierarchy. For instance, the first SELECT statement should include a Tag column with a value of 1. This is the top level of the hierarchy. The second SELECT statement should include a Tag column with a value of 2, and so on.

The second column that you should include in your SELECT statement is Parent. Again, this is a numerical value that identifies the parent of the hierarchy based on the Tag values you’ve assigned. In the first SELECT statement, the Parent value should be null to indicate that this is a top level hierarchy.

Your first SELECT statement should also include a reference to all the columns that will make up the XML structure. The columns must also include aliases that define that structure. Let’s look at an example to help understand how this all works. The following SELECT statements return results similar to what you’ve seen in previous examples; however, the SELECT statements themselves are more detailed:

In the first SELECT statement, I begin by defining the Tag column and assigning a value of 1 to that column. Next I define the Parent column and assign a null value. I then define the EmployeeID column and assign an alias to that column. Notice that I use a very specific structure to define the alias name:

As the syntax shows, the first three components are required, and the last is optional:

  • <ElementName>: The name of the element that the value should be assigned to.
  • <TagNumber>: The tag number associated with the hierarchy that the value should be assigned to, as defined in the Tag column.
  • <AttributeName>: The name of the attribute associated with the column value, unless an optional directive is specified. For example, if the ELEMENT directive is specified, <AttributeName> is the name of the child element.
  • <OptionalDirective>: Additional information for how to construct the XML.

For example, based on the alias name assigned to the EmployeeID column, you can see that the EmployeeID attribute will be associated with the <Employee> element on the first level of the hierarchy.

Because the next three columns in the SELECT list are associated with the second level of the XML hierarchy, which is defined in the second SELECT statement, null values are assigned to the alias names for the column. This will provide the XML structure necessary to join the two SELECT statements.

The second SELECT statement is much simpler, but it still includes the Tag and Parent columns in the SELECT list. The remaining columns in the SELECT list are defined as you would normally define columns in your query.

The result set for the two SELECT statements is then ordered by the EmployeeID and FirstName columns. This is necessary so that null values appear first in the result set to ensure that the XML is properly formatted. The FOR XML clause is then appended to the end of the SELECT statement in order to generate the following XML:

The EmployeeID column has now been added as an attribute to the <Employee> element. However, you can change the EmployeeID column to a child element simply by adding the ELEMENT directive, as I did with the other columns:

Now the EmployeeID value will be displayed as a child element of <Employee>,the first level element:

You can also ensure that columns with null values will still display the element by changing the ELEMENTS directive to ELEMENTSXSINIL, as shown in the following SELECT statement:

Now the results will include the xsi:nil attribute where values are null in the MiddleName column, as shown in the following XML:

As you can see from these examples, the EXPLICIT mode can cause your SELECT statements to become quite complex, especially if you want to add more levels to the hierarchy or want to create more intricate SELECT statements.  Fortunately, most of what you can do with the EXPLICIT mode, you can do with the PATH mode, and do it in a much simpler way.

The PATH Mode

When you specify the PATH mode in the FOR XML clause, column names (or their aliases) are treated as XPath expressions that determine how the data values will be mapped to the XML result set. By default, XML elements are defined based on column names. You can modify the default behavior by using the at (@) symbol to define attributes or the forward slash (/) to define the hierarchy. Let’s take a look at a few examples to demonstrate how all this works.

We’ll begin with the PATH mode’s default behavior. The following example includes a FOR XML clause that specifies only the PATH option:

Because no specific attributes or hierarchies have been defined, the query will return the following XML:

As you can see, each column is added as a child element to the <row> element. You do not have to specify the ELEMENTS directive because individual elements are returned by default, based on the column names.

You can also rename the row element and define a root element, as you’ve seen in earlier examples:

As the following results show, the XML now includes the <Employees> root element and the individual <Employee> row elements:

Suppose, now, that you want to include the EmployeeID value as an attribute of <Employee>.You can easily do this by adding an alias to the EmployeeID column in the SELECT clause and preceding the alias name with @, as shown in the following example:

Now the <Employee>elements contain the EmpID attribute, along with the employee ID:

You can see how easy it is to return both attributes and child elements by using the PATH mode. And if you want to include elements with null values, you simply include the ELEMENTS XSINIL option in your FOR XML clause:

Now your results include the xsi:nil attribute for those fields that contain null values:

As you can see, the xsi:nil attribute in the <MiddleName> element has been set to true.

Note: Because the PATH mode automatically returns values as individual child elements, the ELEMENTS directive has no effect when used by itself in a FOR XML clause. It is only when the XSINIL option is also specified that the ELEMENTS directive adds value to the clause.

In addition to defining attributes within your column aliases in the SELECT list, you can also define hierarchies. You define hierarchies by using the forward slash and specifying the element names. For example, the following SELECT defines the <EmployeeName> element and its three child elements: <FirstName>, <MiddleName>, and <LastName>:

The statement returns the following XML result set:

Notice that each <Employee>element now includes an <EmployeeName> element, and each of those elements includes the individual parts of the name.

Suppose that you now want to add an email address to your result set. You can simply add the column to the SELECT list after the other columns, as shown in the following example:

Because the column name is EmailAddress and no alias has been defined on that column, your XML results will now include the <EmailAddress> element as a child element to <Employee>,right after <EmployeeName>:

You must be careful on how you order your columns in the SELECT list. For example, in the following SELECT statement, I added the EmailAddress column after MiddleName, but before LastName:

Because I do not list the parts of the employee names consecutively, they are separated in the XML results:

As the XML shows, there are now two instances of the <EmployeeName> child element in each <Employee> element. The way to address this issue is to make certain you list the columns in your SELECT list in the order you want the XML rendered.

In an earlier example, I demonstrated how to include an XML column in your query. You can also include an XML column when using the PATH mode. The XML data returned by the column is incorporated into the XML that is returned by the query. For instance, the following SELECT statement adds education data to the result set:

The <Education> element and child elements are now included the XML result set:

As these preceding examples demonstrate, the PATH mode provides a relatively easy way to define elements and attributes in your XML result set. However, the PATH mode, like the other FOR XML modes, supports additional options. For that reason, be sure to check out SQL Server Books Online for more information about each mode and about the FOR XML clause in general. Despite how basic the clause itself might seem, it provides numerous options for returning exactly the type of XML data you need.