The Performance of Traversing a SQL Hierarchy

Dwain Camps show that, depending on the size and characteristics of some hierarchical data, six different methods of traversal can each be the fastest at some point. He illustrates convincingly that It is dangerous to generalize from just one set of test data, and it is foolish to assume that, just because SQL code looks neat, it will perform well.

The Performance of Traversing a SQL Hierarchy

It is well known, and blogged to death, that the way to traverse a hierarchy that is stored in a T-SQL table as an adjacency list, is to use a recursive Common Table Expression (rCTE). But what if there was a faster way of traversing an adjacency list? Would you use it?

The adjacency list approach to saving a hierarchy is quite common and can represent:

  • Reporting levels between managers and their successive employees.
  • A parts hierarchy, particularly the case of parts in a kit with the kit/assembly having multiple levels. Another example is parts succession, where older parts are replaced by newer ones.
  • A Bill of Materials (BOM).

Today we’ll look at various ways a hierarchy represented as an adjacency list can be traversed and compare the elapsed times for each method.

Data Setup – A Simple Hierarchy as an Adjacency List

Let’s create an adjacency list (ManagersDirectory) and insert a simple hierarchy into it.

The final SELECT returns all 39 elements we’ve created in our adjacency list. We put a primary key on the most common place you’d expect to find one: ManagerID and EmployeeID.

Pictorially, this hierarchy looks like this (three children reporting to each parent):

1980-13a87884-0fa3-4171-94b9-3e6a34447f9

Traverse the Hierarchy Using a Traditional rCTE

When we generate the traversal, we add a level counter and a string that represents the manager-to-employee reporting relationship for this hierarchy. It uses the same rCTE that you’ve seen many times before. You’ll have to excuse me for complaining about everyone rehashing this over and over, while I’m doing the same, but hopefully in the end I’ll add a little value.

This produces the fairly familiar results (39 rows), abbreviated below:

A Non-traditional Hierarchy Traversal in T-SQL

It has been said that T-SQL only supports recursion through a recursive CTE. Or does it? Let’s see if we can persuade SQL Server to let us create a recursive FUNCTION:

Note where we reference dbo.HierarchyTraversal (the FUNCTION we are creating) in the JOIN. Thanks to the miracle of deferred name resolution we gleefully receive the message back from SQL Server Management Studio (SSMS) that our “Command(s) completed successfully.”

When we run this FUNCTION to traverse our adjacency list:

We behold with wonder that the returned results are identical to those returned by the rCTE!

Note that the FUNCTION cannot be given the property WITH SCHEMABINDING, nor can it be an in-line Table Valued FUNCTION (TVF). It only works with a multi-line TVF. Another limitation is that if you ever exceed 32 levels, this function is going to throw an error.

The Method of Repeating JOINs

Another way to do this is with a series of repeating JOINs, using UNION ALL operators to combine each successive level. For our tiny hierarchy, this can be achieved as follows:

This too returns the same results as our prior attempts. Of course, it only works if you reconstruct the query based on the number of levels you want to traverse, unless you construct the query with dynamic SQL.

A look at the printed SQL query should show that it is identical to the first but will grow in size and complexity depending on the number of levels (@NumLevels) you’ve set. Note the line we commented out in the middle, which we’ll use in our test harness later to dump the results to a temporary table when we do our timings.

The Set-based Loop

Since any recursive CTE can always be rewritten as a set-based loop, even though I don’t particularly like loops, we’ll give it a go nonetheless.

Once again, a check against our expected results shows they match. Note that, since we’re using #Temp1 to JOIN back to the ManagersDirectory table, having an index (the primary key) on the temporary table should improve the performance over the case where we have only a heap.

All of the queries we’ve shown so far, including the CREATE for our function, and the next one are in the attached resource file: 01 Example Queries.sql.

You may have noticed the OPTION (RECOMPILE) on the query within the loop. We were concerned about parameter sniffing impacting the performance of the query. So, of course, we tested that using 02 Loop With and Without RECOMPILE.sql and concluded that the query generally runs faster with the OPTION than without it. You can try it yourself at various levels to confirm our conclusion.

The WHILE Loop Revisited

Since we’ve taken our WHILE loop on a test drive, let’s drive it all the way to Memphis and see if we can meet Elvis.  As perversely absurd as that sounds, the question that follows may seem even more so.  Is it possible that our WHILE loop hierarchy traversal may be impacted by the Halloween Problem?  I first learned about this and the SQL Optimizer’s Halloween protection in SQL MVP Itzik Ben-Gan’s article: Identifying a Subsequence in a Sequence, Part 1, where he implemented a method of avoiding SQL’s Halloween protection in his third solution.  I found the problem interesting, so I was testing that solution and was frankly amazed at the lightning speed that it processed his ten million rows of test data, literally trouncing the first set-based solutions that I could come up with.  The similarity of what he was doing (INSERTs into an indexed temporary table within the loop) got me to thinking that it may be applicable here.   So I read the outstanding articles he recommended by SQL MVP Paul White, who clearly showed that Halloween protection can impact INSERTs (and DELETEs and MERGEs) in addition to UPDATEs.  More reading led me to another article by Itzik Ben-Gan: Divide and Conquer Halloween, where he addressed the hierarchy traversal problem directly. 

His solution modified to our requirements is shown below with two changes: 1) I added the OPTION(RECOMPILE) because I did a little pre-performance testing and it seemed to be marginally better and 2) his solution always traverses the entire hierarchy while I’d like mine to only go as deep as I tell it to (using @NumLevels).

First we must create two indexed temporary tables.

We then populate the two temporary tables, avoiding the SQL Optimizer’s need to provide Halloween protection by alternating our INSERTs between the two temporary tables.

Now we just need to combine the results stored into our two temporary tables to obtain our hierarchy traversal.

Since Itzik Ben-Gan and Paul White are both so much more eloquent and technically skilled than me, we’ll leave the explanation of detecting, and why avoiding, Halloween protection is a good thing.

A Bigger Test Harness

After learning rather quickly that generating a large test harness for an adjacency list can be quite annoying, we finally managed to do so as follows. This extends the hierarchy adding three nodes to each prior node for each iteration of the WHILE loop.

Be patient when you run the above script because it does take a little longer to run that the prior ones (about 3.5 minutes). To construct our hierarchy, we needed to add a Level column which we later drop off to restore the adjacency list to its original structure. The final SELECT tells us that our table now contains 7,174,452 rows.

To test the performance of this new method, we’ll run through traversals of the hierarchy from 5 to 13 levels using a batch that we’ll execute 9 times. The #Results table will capture the results for each execution of the batch. The #Sequence table is used to simulate a SQL 2012 SEQUENCE object, since we’re running our test in SQL Server 2008 R2.

The batch will start at level five, because below that all of the methods are pretty swift. The rest of the test harness is rather lengthy so we won’t include it here, however you can find it in the attached resources as: 03 Performance Test Harness.sql

After the final run of the timing batch (takes about 16 minutes), the results from our #Results table (reformatted) are shown below for elapsed time in milliseconds. Red font indicates the worst performing solution at the level while green font (often tied) represents the fastest solution.

Elapsed (ms)

rCTE

rFCN

dSQL

sbLP

sbLPwR

LAHP

LAHPwR

5 Levels
1092 Rows

16

63

63

30

76

46

16

6 Levels
3279 Rows

46

33

30

123

80

60

46

7 Levels
9840 Rows

160

93

123

123

96

110

93

8 Levels
29523 Rows

503

313

170

296

230

296

200

9 Levels
88572 Rows

1280

983

593

576

563

596

563

10 Levels
265719 Rows

3780

3113

1763

1853

1763

1843

1400

11 Levels
797160 Rows

11443

9630

5520

7913

6660

4573

4616

12 Levels
2391483 Rows

35300

29790

17510

17020

17336

21740

15646

13 Levels
7174452 Rows

108113

104420

57950

52563

81153

50156

52643

Each of our methods are abbreviated as:

  • rCTE – The traditional recursive CTE approach
  • rFCN – The non-traditional recursive FUNCTION approach
  • dSQL – The dynamic SQL constructed from repeated JOINs and UNION ALL
  • sbLP – The set-based WHILE loop
  • sbLPwR – The set-based WHILE loop with RECOMPILE option
  • LAHP – The set-based WHILE loop avoiding Halloween protection
  • LAHPwR – The set-based WHILE loop avoiding Halloween protection with RECOMPILE option

Note that you can probably drop the OPTION(RECOMPILE) whenever your intention is to traverse the entire hierarchy to its full depth.

The results are clearly illustrated when you graph them as a percentage of the worst performing solution, which is usually the rCTE.

1980-9da4e504-a22c-46c4-b8ff-923d147fdb2

Conclusions

When traversing the hierarchy we constructed between eight and thirteen levels deep, the set-based WHILE loop avoiding Halloween protection with RECOMPILE option (LAHPwR) is most often the elapsed time winner. Perhaps surprisingly, the non-traditional recursive function seems to do just a little better than the rCTE between six and twelve levels. It is exactly tied with the set-based loop at seven levels. The repeated JOINs method seems contra-indicated across the board (except at six levels) because there’s always a faster performing solution available.

We present these results with a caveat. This is that, depending on your hierarchy one or the other results may be faster for you. Our enlarged hierarchy was exactly balanced; specifically each node had exactly three nodes growing from it. You may find that when only a few of your branches extend out to the higher depths, one of the other approaches may be more suitable for traversal. They also show that sometimes a well-constructed, set-based loop can out-perform the traditional recursive CTE approach.

While adjacency lists for storing hierarchies are quite prevalent, there are other methods that offer opportunities for high performance depending on the requirements of the results from the hierarchies. Nested sets, closure tables and the data mart concept proposed by SQL MVP Jeff Moden in his Hierarchies on Steroids – Parts One and Two all have different advantages depending on circumstance. Perhaps we’ll explore these alternatives in future submissions.

Writing fast SQL is facilitated by knowing what code patterns perform well under various circumstances. We have provided you with several code patterns in this article that you can use to traverse your adjacency list hierarchies, so that you can decide which is best for your particular case.

Further Reading