Window Functions in SQL Server: Part 3: Questions of Performance

A SQL expression may look elegant on the page but it is only valuable if its performance is good. Do window functions run quicker than their conventional equivalent code? Fabiano gives a definitive answer.

Are Window Functions as fast as the equivalent alternative SQL methods? In this article I’ll be exploring the performance of the window functions. After you read this article I hope you’ll be in a better position to decide whether it is really worth changing your application code to start using window functions.

Before I start the tests, I would like to explain the most important factor that affects the performance of window functions. A window spool operator has two alternative ways of storing the frame data, with the inmemory worktable or with a diskbased worktable. You’ll have a huge difference on performance according to the way that the query processor is executing the operator.

The in-memory worktable is used when you define the frame as ROWS and it is lower than 10000 rows. If the frame is greater than 10000 rows, then the window spool operator will work with the on-disk worktable.

The on-disk based worktable is used with the default frame, that is, “range and a frame with more than 10000 rows.

It’s possible to check whether the window spool operator is using the on-disk worktable by looking at the results of the SET STATISTICS IO. Let’s see a small example of the window spool using the on-disk based worktable:

 

 

 

As you can see, the worktable has some reads when you use a “range”, but it doesn’t happen with the in-memory worktable which is used when you use “Rows”.

Optionally you also can look at the xEvent called window_spool_ondisk_warning. Fortunately on SQL Server 2012 we’ve a very nice interface that we can use to deal with xEvents. In SSMS, you can easily capture an event, and then look at the live data captured. Here you can see a screen-capture of this interface:

1403-image001.png 1403-image003.png

There are many aspects of the window spool operator that could be improved, I just opened some connect items about this and I’d appreciate it if you could take a minute or so to vote on this items.

For now, when the logic of the query allows it, it is a good practice to always define the window frame as ROWS in order to try to use the in-memory worktable.

Performance tests

I’ll now show some performance tests that I did with the most-used windows functions such as LEAD, LAG, LAST_VALUE and the OVER clause.

To execute the tests, I used the free SQLQueryStress toll created by Adam Machanic (blog | twitter), you can download it here, I also will use the sample database AdventureWorks2008R2.

Let’s start by testing the performance of the clause OVER with the running aggregation solution, let’s see how it would scale with many users demanding information from the server.

First let’s create a table with 453772 rows:

Running Aggregation, Clause OVER + Order By

Now let’s run the old solution to the running aggregation query:

Let’s look at the execution plan:

1403-image004small.png

This is a pretty straightforward plan: For each row read in the table, it executes the seek to calculate the running totals.

I’m filtering by a random ProductID to simulate many users requiring different products at the same time. Let’s run it in on SQLQueryStress:

1403-image006.png

Now let’s run it using the clause OVER with ORDER BY:

The execution plan is:

1403-image008small.png

As you can see, there is no double access to the table here. There is just a window spool doing the magic of keeping the rows to be aggregated.

The results from SQLQueryStress:

1403-image010.png

Now let’s specify the ROWS frame in the query and run again:

The results from SQLQueryStress:

1403-image012.png

As you can see in the elapsed time, the difference between the old solution and the solution using the worktable is very good.

Query

Client Seconds/Iteration (Avg)

CPU Seconds/Iteration (Avg)

Actual Seconds/Iteration (Avg)

Logical Reads/Iteration (Avg)

old solution

0,4312

1,4528

1,2097

9514,3370

RANGE query

0,1148

0,04

0,0931

9206,5540

ROWS query

0,0385

0,0075

0,0241

9,0800

What about the LEAD and LAG function?

LEAD

First let’s see the execution plan for a query to return the next row from the TransactionHistory table.

1403-image014small.png

As you can see, an easy way to do get the next row of a table is using the OUTER APPLY clause. Let’s see the performance of this query:

1403-image016.png

Now let’s see the same query but now using the LEAD function to get the next row.

Execution Plan:

1403-image018small.png

As you can see from the execution plan, we could improve the query by creating an index on the columns ProductID and TransactionID to remove the sort operator, but in order to compare the performance, let’s see the results of the execution on this query:

1403-image020.png

Again, the window function is much better than the old subquery solution.

Query

Client Seconds/Iteration (Avg)

CPU Seconds/Iteration (Avg)

Actual Seconds/Iteration (Avg)

Logical Reads/Iteration (Avg)

Subquery solution

2,4539

0,4108

1,7373

12129,2840

Window Function

0,0222

0,0077

0,0148

9,1640

Now let’s see the LAG function.

LAG

First the old approach to get the previews row:

The results from SQLQueryStress:

1403-image022.png

Now using the LAG function:

The results from SQLQueryStress:

1403-image024.png

Once again, the performance on windowing functions is much better.

Query

Client Seconds/Iteration (Avg)

CPU Seconds/Iteration (Avg)

Actual Seconds/Iteration (Avg)

Logical Reads/Iteration (Avg)

Subquery solution

2,9293

0,4106

1,9069

11001,4880

Lag Window function

0.0193

0,0062

0,0117

8,2160

Note: The LEAD and LAG functions doesn’t accept a frame, in this case the default of window spool is to use the in memory worktable.

First_Value and Last_Value

Let’s suppose I want to return all the employees from a specific department. I also want to know which employee had the largest number of hours not working because he/she was sick and who that employee was.

To write it without window functions, I could use subqueries to return the data of most hours sick and who was the employee; something like the following query:

This is the result:

1403-image026small.png

The employee that had the most time off-sick was “Brian LaMee”. I had to use a CTE to make the query a little easier to understand, and I wrote two subqueries on this CTE.

Let’s see the performance:

1403-image028.png

Now how it would be with window functions?

When you first try the functions FIRST_VALUE and LAST_VALUE, it may seem confusing and you might wonder if the functions MAX and MIN wouldn’t return the same values of FIRST_VALUE and LAST_VALUE, let’s see an example.

And the results:

1403-image030small.png

The same results were returned, and the in fact the function MAX can be used to return the LAST_VALUE of the value in a partition, in this case the partition is the department, so the MAX of a window partitioned by department with a frame that goes from UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING will return the last value based on the ORDER BY of the column specified in the MAX function.

The LAST_VALUE function will return the last value based on the ORDER BY clause specified in the OVER clause. If you didn’t understand, then try to change the LAST_VALUE for the MAX and see that the results will not be the ones you might expect.

Now let’s see the performance:

1403-image032.png

This time we can’t see a big difference between the queries, but that’s probably because the tables I used are very small.

Query

Client Seconds/Iteration (Avg)

CPU Seconds/Iteration (Avg)

Actual Seconds/Iteration (Avg)

Logical Reads/Iteration (Avg)

CTE solution

0,0268

0,0017

0,0029

411,3282

Window function

0,0253

0,0011

0,0024

209,5995

Note: The performance will vary in your SQL Server environment since the codes are all random.

Conclusion

In general the window functions will perform better than the old solutions using subqueries that we are used to writing.

I recommend you to always test both solutions the old and the new using the window function, if the new solution is not performing better than the old one you should always check whether the window spool operator is working with the worktable on-disk.

That’s all folks.