{"id":1750,"date":"2014-01-17T00:00:00","date_gmt":"2014-01-17T00:00:00","guid":{"rendered":"https:\/\/test.simple-talk.com\/uncategorized\/the-performance-of-the-t-sql-window-functions\/"},"modified":"2021-09-29T16:21:39","modified_gmt":"2021-09-29T16:21:39","slug":"the-performance-of-the-t-sql-window-functions","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/t-sql-programming-sql-server\/the-performance-of-the-t-sql-window-functions\/","title":{"rendered":"The Performance of the T-SQL Window Functions"},"content":{"rendered":"<div id=\"pretty\">\n<p class=\"start\">SQL has always had Aggregate functions like SUM(), MAX(); without them, it would be impossible to do many routine SQL queries. The SQL Standard introduced window functions in ANSI SQL 2003 to deal with a range of processing tasks that required aggregating on a partition of a set, and extended in ANSI SQL:2008.<span class=\"apple-converted-space\">\u00a0<\/span> In SQL 2005, Microsoft introduced the first of the class of window functions in two flavors: Ranking and Aggregates, and released further functions in subsequent releases.\u00a0<\/p>\n<h2>When were Window Functions introduced in SQL Server?<\/h2>\n<p>Let&#8217;s take a quick look at some abridged history of the implementation of window functions in SQL Server.<\/p>\n<table class=\"MsoTableGrid\">\n<tbody>\n<tr>\n<td valign=\"top\">\n<p><b>SQL<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p><b>FUNCTION Types<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p><b>FUNCTIONS<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p><b>Remark<\/b><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td rowspan=\"2\" valign=\"top\">\n<p>2005<\/p>\n<\/td>\n<td valign=\"top\">\n<p>Ranking<\/p>\n<\/td>\n<td valign=\"top\">\n<p>ROW_NUMBER, DENSE_RANK, RANK, NTILE<\/p>\n<\/td>\n<td valign=\"top\">\n<p>Supports PARTITION BY (optional) and ORDER BY (required) in the OVER clause.<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>Aggregates<\/p>\n<\/td>\n<td valign=\"top\">\n<p>SUM, AVG, COUNT, MIN, MAX<\/p>\n<\/td>\n<td valign=\"top\">\n<p>Requires PARTITION BY in the OVER clause.<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td rowspan=\"2\" valign=\"top\">\n<p>2008<\/p>\n<\/td>\n<td valign=\"top\">\n<p>Ranking<\/p>\n<\/td>\n<td valign=\"top\">\n<p>ROW_NUMBER, DENSE_RANK, RANK, NTILE<\/p>\n<\/td>\n<td rowspan=\"2\" valign=\"top\">\n<p>No relevant changes to supported functionality.<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>Aggregates<\/p>\n<\/td>\n<td valign=\"top\">\n<p>SUM, AVG, COUNT, MIN, MAX<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td rowspan=\"3\" valign=\"top\">\n<p>2012<\/p>\n<\/td>\n<td valign=\"top\">\n<p>Ranking<\/p>\n<\/td>\n<td valign=\"top\">\n<p>ROW_NUMBER, DENSE_RANK, RANK, NTILE<\/p>\n<\/td>\n<td valign=\"top\">\n<p>No relevant changes to supported functionality.<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>Aggregates<\/p>\n<\/td>\n<td valign=\"top\">\n<p>SUM, AVG, COUNT, MIN, MAX<\/p>\n<\/td>\n<td valign=\"top\">\n<p>Window frame capability (ROWS\/RANGE) was added.<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>Analytic<\/p>\n<\/td>\n<td valign=\"top\">\n<p>CUME_DIST, LEAD, FIRST_VALUE, PERCENTILE_CONT, LAG, PERCENTILE_DISC, LAST_VALUE, PERCENT_RANK<\/p>\n<\/td>\n<td valign=\"top\">\n<p>Added<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>There are plenty of excellent articles that explain the workings of the SQL window aggregate functions, for example:<\/p>\n<ul>\n<li>This recent one by Joe Celko: <a href=\"https:\/\/www.simple-talk.com\/sql\/t-sql-programming\/window-functions-in-sql\/\">Window Functions in SQL<\/a>.\u00a0<\/li>\n<li>An earlier article by Microsoft Certified Master Wayne Sheffield (recently updated): <a href=\"https:\/\/www.sqlservercentral.com\/articles\/the-new-analytic-functions-in-sql-server-2012\">The new Analytic functions in SQL Server 2012<\/a><\/li>\n<\/ul>\n<h2>They&#8217;re useful, but do they perform well?<\/h2>\n<p>I&#8217;ve seen very few discussions of the performance characteristics of these functions, so today we will attempt to do just that.\u00a0 We will assume that you have a basic understanding of how both SQL aggregate and window aggregate functions work.\u00a0 If you are unsure or they are new to you, you should probably read either\/both Joe Celko&#8217;s or Wayne Sheffield&#8217;s articles first.<\/p>\n<p>Before window functions existed, we had to produce quite complex code to do those tasks for which window functions were designed.\u00a0 Nowadays, we find window functions so useful and flexible that their use has become ubiquitous in SQL.<\/p>\n<p>Although there is no doubt that the window functions add richness to the SQL language, greatly simplifying the syntax and queries they appear in, we&#8217;re still left with the nagging doubt as to whether they are as fast as the older methods.\u00a0 They&#8217;re more easily maintained, but are they faster? This is what we want to find out. We haven&#8217;t the space to look at every window function in detail, so we&#8217;ll limit our testing to just a few, and offer some references where you can look for additional comparisons.<\/p>\n<h2>Sample Data and a Simple Example<\/h2>\n<p>The simple table and sample data we&#8217;ll use for this example is the same as one that I&#8217;ve published in a Simple Talk article in the past.\u00a0 You&#8217;ll see the reference when we get around to discussing PERCENTILE_CONT.<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">CREATE TABLE #SampleTable\r\n(\r\n\u00a0\u00a0\u00a0 ID\u00a0 INT\r\n\u00a0\u00a0\u00a0 ,N\u00a0\u00a0 INT\r\n);\r\nCREATE INDEX i1 ON #SampleTable (ID, N);\r\n\r\n-- Basic test data\r\nINSERT INTO #SampleTable\r\nVALUES (1,1),(1,2),(1,3),(1,4),(1,5)\r\n\u00a0\u00a0\u00a0 ,(2,1),(2,2),(2,3),(2,4)\r\n\u00a0\u00a0\u00a0 ,(3,10),(3,2),(3,10),(3,4)\r\n\u00a0\u00a0\u00a0 ,(4,1),(4,5),(4,1),(4,3),(4,3);\r\n\r\n-- SQL 2005 window aggregate example\r\nSELECT ID, N, [Rows]=COUNT(*) OVER (PARTITION BY ID)\r\nFROM #SampleTable\r\n--WHERE ID = 1;\r\n<\/pre>\n<p>If we run this code with the WHERE clause uncommented, it produces this results set showing how the COUNT of the rows within the PARTITION are appended as a column to each row.<\/p>\n<pre>ID\u00a0\u00a0\u00a0\u00a0 N\u00a0\u00a0\u00a0\u00a0\u00a0 Rows\r\n1\u00a0\u00a0\u00a0\u00a0\u00a0 1\u00a0\u00a0\u00a0\u00a0\u00a0 5\r\n1\u00a0\u00a0\u00a0\u00a0\u00a0 2\u00a0\u00a0\u00a0\u00a0\u00a0 5\r\n1\u00a0\u00a0\u00a0\u00a0\u00a0 3\u00a0\u00a0\u00a0\u00a0\u00a0 5\r\n1\u00a0\u00a0\u00a0\u00a0\u00a0 4\u00a0\u00a0\u00a0\u00a0\u00a0 5\r\n1\u00a0\u00a0\u00a0\u00a0\u00a0 5\u00a0\u00a0\u00a0\u00a0\u00a0 5\r\n<\/pre>\n<p>Even though they&#8217;re named the same, the window aggregates operate differently from the original aggregate functions, in that the result adds a column across the entire row set, rather than condensing the set to the partitions that result from the GROUP BY.\u00a0 With just a little extra effort on the coding side, it is possible to use the original SQL aggregate functions to mimic what the window aggregates do.<\/p>\n<p>In SQL 2000 and earlier, the equivalent result could be obtained using the following code pattern:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">-- SQL 2000 equivalent\r\nSELECT a.ID, N, [Rows]\r\nFROM #SampleTable a\r\nJOIN\r\n(\r\n\u00a0\u00a0\u00a0 SELECT ID, [Rows]=COUNT(*)\r\n\u00a0\u00a0\u00a0 FROM #SampleTable b\r\n\u00a0\u00a0\u00a0 GROUP BY ID\r\n) b ON a.ID = b.ID;\r\n<\/pre>\n<p>And when SQL 2005 came along, this could also be done with a CROSS APPLY:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">-- SQL 2005 equivalent\r\nSELECT a.ID, N, [Rows]\r\nFROM #SampleTable a\r\nCROSS APPLY\r\n(\r\n\u00a0\u00a0\u00a0 SELECT ID, [Rows]=COUNT(*)\r\n\u00a0\u00a0\u00a0 FROM #SampleTable b\r\n\u00a0\u00a0\u00a0 WHERE a.ID = b.ID\r\n\u00a0\u00a0\u00a0 GROUP BY ID\r\n) b;\r\n<\/pre>\n<p>When I compared the SQL 2000 pattern and the CROSS APPLY pattern available in SQL 2005, I noted no discernable performance difference.\u00a0 Since I like CROSS APPLY and I&#8217;m the writer, ( <i>&#8230; and I&#8217;m the editor: Ed<\/i>) our initial comparison will be between the window aggregate COUNT vs. the SQL 2005 CROSS APPLY pattern.<\/p>\n<h2>The Performance Test Harness<\/h2>\n<p>If we TRUNCATE the data we put into our table initially, we can add a large number of rows to it using the following test harness.<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">DECLARE @N INT = 10000;\u00a0 -- 10,000 = ~1,000,000 rows\r\n\r\nWITH Tally (n) AS\r\n(\r\n\u00a0\u00a0\u00a0 SELECT TOP (@N) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))\r\n\u00a0\u00a0\u00a0 FROM sys.all_columns a CROSS JOIN sys.all_columns b\r\n)\r\nINSERT INTO #SampleTable\r\nSELECT a.n, ABS(CHECKSUM(NEWID()))%@N\r\n-- Always create exactly 100 IDs from the Tally table\r\nFROM (SELECT TOP 100 n FROM Tally) a\r\nCROSS APPLY \r\n(\r\n\u00a0\u00a0\u00a0 SELECT TOP (@N) n\r\n\u00a0\u00a0\u00a0 FROM Tally \r\n)b;\r\n<\/pre>\n<p>Our initial test at 1 million rows, shunting the results into a local variable, generated these timing results for COUNT.<\/p>\n<pre>SQL 2008 Window Aggregate: COUNT\r\n\r\n\u00a0SQL Server Execution Times:\r\n\u00a0\u00a0 CPU time = 3105 ms,\u00a0 elapsed time = 2187 ms.\r\n\r\nSQL 2005 Equivalent (CROSS APPLY): COUNT\r\n\r\n\u00a0SQL Server Execution Times:\r\n\u00a0\u00a0 CPU time = 390 ms,\u00a0 elapsed time = 376 ms.\r\n<\/pre>\n<p>You can already see that this particular window aggregate seemingly cannot be classified as &#8220;high-performance&#8221; SQL.\u00a0\u00a0 But we don&#8217;t want to generalize, so let&#8217;s test each of the window aggregate functions vs. its corresponding traditional pattern and graphically demonstrate the performance results.<\/p>\n<h2>Performance of Traditional Aggregates vs. Window Aggregates<\/h2>\n<p>Since it is easy to replace COUNT(*) with XXX(N) (where XXX=SUM, AVG, MIN and MAX) in each of the two queries we&#8217;re testing, we can easily generate queries that compare each aggregate function against its corresponding window aggregate.\u00a0 Resource files (described at the end) are provided if you want to see each of these queries or run these performance tests yourself.<\/p>\n<p>We&#8217;ll also include a test case with all of the aggregates in one query, to determine whether the cost is cumulative or not.\u00a0 Those two queries look like this.<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT ID, N\r\n\u00a0\u00a0\u00a0 ,[Rows]=COUNT(*) OVER (PARTITION BY ID)\r\n\u00a0\u00a0\u00a0 ,[Sum]=SUM(N) OVER (PARTITION BY ID)\r\n\u00a0\u00a0\u00a0 ,[Average]=AVG(N) OVER (PARTITION BY ID)\r\n\u00a0\u00a0\u00a0 ,[Minimum]=MIN(N) OVER (PARTITION BY ID)\r\n\u00a0\u00a0\u00a0 ,[Maximum]=MAX(N) OVER (PARTITION BY ID)\r\nFROM #SampleTable;\r\nSELECT a.ID, N\r\n\u00a0\u00a0\u00a0 ,[Rows], [Sum], [Average], [Minimum], [Maximum] \r\nFROM #SampleTable a\r\nCROSS APPLY\r\n(\r\n\u00a0\u00a0\u00a0 SELECT ID\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ,[Rows]=COUNT(*)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ,[Sum]=SUM(N)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ,[Average]=AVG(N)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ,[Minimum]=MIN(N)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ,[Maximum]=MAX(N)\r\n\u00a0\u00a0\u00a0 FROM #SampleTable b\r\n\u00a0\u00a0\u00a0 WHERE a.ID = b.ID\r\n\u00a0\u00a0\u00a0 GROUP BY ID\r\n) b;\r\n<\/pre>\n<p>The histograms show the comparative elapsed times for each of the aggregates, with the last column showing the case of all aggregates applied to the same partition within one query, with and without the suggested INDEX.<\/p>\n<p class=\"illustration\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1927-clip_image001-620x314.jpg\" alt=\"1927-clip_image001-620x314.jpg\" width=\"620\" height=\"314\" \/><\/p>\n<p class=\"illustration\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1927-clip_image002-620x314.jpg\" alt=\"1927-clip_image002-620x314.jpg\" width=\"620\" height=\"314\" \/><\/p>\n<p>Many of these cases perform better without the index than with it.<\/p>\n<p>This table shows the CPU and elapsed time, indicating the &#8220;cost&#8221; of each window aggregate compared to the traditional aggregation method.<\/p>\n<table class=\"MsoNormalTable\">\n<tbody>\n<tr>\n<td colspan=\"8\" valign=\"bottom\">\n<p><b> With INDEX<\/b><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p><b> Measure<\/b><\/p>\n<\/td>\n<td>\n<p><b> Query<\/b><\/p>\n<\/td>\n<td>\n<p><b> COUNT<\/b><\/p>\n<\/td>\n<td>\n<p><b> SUM<\/b><\/p>\n<\/td>\n<td>\n<p><b> AVG<\/b><\/p>\n<\/td>\n<td>\n<p><b> MIN<\/b><\/p>\n<\/td>\n<td>\n<p><b> MAX<\/b><\/p>\n<\/td>\n<td>\n<p><b> ALL<\/b><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td rowspan=\"2\">\n<p>CPU (ms)<\/p>\n<\/td>\n<td>\n<p>Window Aggregate (WA)<\/p>\n<\/td>\n<td>\n<p>3039<\/p>\n<\/td>\n<td>\n<p>3136<\/p>\n<\/td>\n<td>\n<p>3167<\/p>\n<\/td>\n<td>\n<p>2901<\/p>\n<\/td>\n<td>\n<p>3028<\/p>\n<\/td>\n<td>\n<p>3556<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Traditional Aggregate (TA)<\/p>\n<\/td>\n<td>\n<p>390<\/p>\n<\/td>\n<td>\n<p>499<\/p>\n<\/td>\n<td>\n<p>531<\/p>\n<\/td>\n<td>\n<p>453<\/p>\n<\/td>\n<td>\n<p>452<\/p>\n<\/td>\n<td>\n<p>858<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p><b> \u00a0\u00a0 Cost<\/b><\/p>\n<\/td>\n<td>&nbsp;<\/td>\n<td>\n<p><b> 679%<\/b><\/p>\n<\/td>\n<td>\n<p><b> 528%<\/b><\/p>\n<\/td>\n<td>\n<p><b> 496%<\/b><\/p>\n<\/td>\n<td>\n<p><b> 540%<\/b><\/p>\n<\/td>\n<td>\n<p><b> 570%<\/b><\/p>\n<\/td>\n<td>\n<p><b> 314%<\/b><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td rowspan=\"2\">\n<p>Elapsed (ms)<\/p>\n<\/td>\n<td>\n<p>Window Aggregate (WA)<\/p>\n<\/td>\n<td>\n<p>2208<\/p>\n<\/td>\n<td>\n<p>2260<\/p>\n<\/td>\n<td>\n<p>2419<\/p>\n<\/td>\n<td>\n<p>2236<\/p>\n<\/td>\n<td>\n<p>2233<\/p>\n<\/td>\n<td>\n<p>2676<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Traditional Aggregate (TA)<\/p>\n<\/td>\n<td>\n<p>385<\/p>\n<\/td>\n<td>\n<p>491<\/p>\n<\/td>\n<td>\n<p>525<\/p>\n<\/td>\n<td>\n<p>465<\/p>\n<\/td>\n<td>\n<p>457<\/p>\n<\/td>\n<td>\n<p>855<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p><b> \u00a0\u00a0 Cost<\/b><\/p>\n<\/td>\n<td>&nbsp;<\/td>\n<td>\n<p><b> 474%<\/b><\/p>\n<\/td>\n<td>\n<p><b> 360%<\/b><\/p>\n<\/td>\n<td>\n<p><b> 361%<\/b><\/p>\n<\/td>\n<td>\n<p><b> 381%<\/b><\/p>\n<\/td>\n<td>\n<p><b> 389%<\/b><\/p>\n<\/td>\n<td>\n<p><b> 213%<\/b><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"bottom\">\n<p>&nbsp;<\/p>\n<\/td>\n<td valign=\"bottom\">\n<p>&nbsp;<\/p>\n<\/td>\n<td valign=\"bottom\">\n<p>&nbsp;<\/p>\n<\/td>\n<td valign=\"bottom\">\n<p>&nbsp;<\/p>\n<\/td>\n<td valign=\"bottom\">\n<p>&nbsp;<\/p>\n<\/td>\n<td valign=\"bottom\">\n<p>&nbsp;<\/p>\n<\/td>\n<td valign=\"bottom\">\n<p>&nbsp;<\/p>\n<\/td>\n<td valign=\"bottom\">\n<p>&nbsp;<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td colspan=\"8\" valign=\"bottom\">\n<p><b> Without INDEX<\/b><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p><b> Measure<\/b><\/p>\n<\/td>\n<td>\n<p><b> Query<\/b><\/p>\n<\/td>\n<td>\n<p><b> COUNT<\/b><\/p>\n<\/td>\n<td>\n<p><b> SUM<\/b><\/p>\n<\/td>\n<td>\n<p><b> AVG<\/b><\/p>\n<\/td>\n<td>\n<p><b> MIN<\/b><\/p>\n<\/td>\n<td>\n<p><b> MAX<\/b><\/p>\n<\/td>\n<td>\n<p><b> ALL<\/b><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td rowspan=\"2\">\n<p>CPU (ms)<\/p>\n<\/td>\n<td>\n<p>Window Aggregate (WA)<\/p>\n<\/td>\n<td>\n<p>4945<\/p>\n<\/td>\n<td>\n<p>5226<\/p>\n<\/td>\n<td>\n<p>5130<\/p>\n<\/td>\n<td>\n<p>5257<\/p>\n<\/td>\n<td>\n<p>5008<\/p>\n<\/td>\n<td>\n<p>6287<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Traditional Aggregate (TA)<\/p>\n<\/td>\n<td>\n<p>1141<\/p>\n<\/td>\n<td>\n<p>1357<\/p>\n<\/td>\n<td>\n<p>1391<\/p>\n<\/td>\n<td>\n<p>1200<\/p>\n<\/td>\n<td>\n<p>1247<\/p>\n<\/td>\n<td>\n<p>1529<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p><b> \u00a0\u00a0 Cost<\/b><\/p>\n<\/td>\n<td>&nbsp;<\/td>\n<td>\n<p><b> 333%<\/b><\/p>\n<\/td>\n<td>\n<p><b> 285%<\/b><\/p>\n<\/td>\n<td>\n<p><b> 269%<\/b><\/p>\n<\/td>\n<td>\n<p><b> 338%<\/b><\/p>\n<\/td>\n<td>\n<p><b> 302%<\/b><\/p>\n<\/td>\n<td>\n<p><b> 311%<\/b><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td rowspan=\"2\">\n<p>Elapsed (ms)<\/p>\n<\/td>\n<td>\n<p>Window Aggregate (WA)<\/p>\n<\/td>\n<td>\n<p>1933<\/p>\n<\/td>\n<td>\n<p>1967<\/p>\n<\/td>\n<td>\n<p>1921<\/p>\n<\/td>\n<td>\n<p>1663<\/p>\n<\/td>\n<td>\n<p>1823<\/p>\n<\/td>\n<td>\n<p>2050<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Traditional Aggregate (TA)<\/p>\n<\/td>\n<td>\n<p>441<\/p>\n<\/td>\n<td>\n<p>440<\/p>\n<\/td>\n<td>\n<p>453<\/p>\n<\/td>\n<td>\n<p>416<\/p>\n<\/td>\n<td>\n<p>423<\/p>\n<\/td>\n<td>\n<p>648<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p><b> \u00a0\u00a0 Cost<\/b><\/p>\n<\/td>\n<td>&nbsp;<\/td>\n<td>\n<p><b> 338%<\/b><\/p>\n<\/td>\n<td>\n<p><b> 347%<\/b><\/p>\n<\/td>\n<td>\n<p><b> 324%<\/b><\/p>\n<\/td>\n<td>\n<p><b> 300%<\/b><\/p>\n<\/td>\n<td>\n<p><b> 331%<\/b><\/p>\n<\/td>\n<td>\n<p><b> 216%<\/b><\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Cost in all cases is calculated as: 100*(WA-TA)\/TA<\/p>\n<p>I&#8217;m not sure where you, my valued reader stands, but this seems to be a pretty steep cost to pay for streamlining the query syntax!\u00a0 These timing results were obtained in SQL 2008 but, sadly, there was no perceptible improvement when running the equivalent test in SQL 2012.\u00a0 One could have hoped that Microsoft noticed or was informed of this lag in performance, and could have done something to improve it.<\/p>\n<h2>Redemption in the Ranking Functions<\/h2>\n<p>When I first started using the window ranking functions in SQL 2005, I was very impressed.\u00a0 All of them solve some very sticky querying predicaments and they do so with excellent speed.\u00a0 In fact, I&#8217;d be hard-pressed to come up with any SQL 2000 alternatives that have even reasonable performance (without using temporary tables) to compare against so I won&#8217;t even try.<\/p>\n<p>Kudos to Microsoft on this one!<\/p>\n<h2>SQL 2012: Enter the Window Frame for the Aggregate Functions<\/h2>\n<p>In SQL versions past, there are a few problems that posed some very difficult cases for producing a high performing query.\u00a0 The most notable of these are the following:<\/p>\n<ul>\n<li><a href=\"http:\/\/www.sqlservercentral.com\/articles\/T-SQL\/68467\/\">The Running Totals problem<\/a><\/li>\n<li><a href=\"https:\/\/www.simple-talk.com\/sql\/t-sql-programming\/calculating-values-within-a-rolling-window-in-transact-sql\/\"> Calculating values within a Rolling Window<\/a> (e.g., getting a rolling 12 months total)<\/li>\n<\/ul>\n<p>Both of these problems are now quite easily solved by the SQL 2012 window frame approach.\u00a0 Rather than go through all of the details here, I&#8217;ve provided some useful links above where you can take a look at these problems yourself.\u00a0 I will summarize the results you can expect here.<\/p>\n<ol>\n<li>Both problems can be solved using a SQL 2012 window frame applied to the window aggregate SUM function.<\/li>\n<li>You can expect the window frame approach to be faster than any triangular or semi-triangular JOIN approach (these are explained in the two linked articles above).<\/li>\n<li>Neither problem can be solved faster with a window frame than using what is known as the <a href=\"https:\/\/www.simple-talk.com\/sql\/learn-sql-server\/robyn-pages-sql-server-cursor-workbench\/\"> Quirky Update<\/a> (QU).\u00a0 The article by SQL MVP Jeff Moden linked in above describes in great detail all the rules needed to correctly implement the QU.\u00a0 My article on calculating values within a rolling window shows how it is much faster than the window frame approach for that particular problem.\u00a0 Microsoft Certified Master Wayne Sheffield showed how the QU is faster than a SQL 2012 window frame for the running totals problem in his blog <a href=\"http:\/\/blog.waynesheffield.com\/wayne\/archive\/2011\/08\/running-totals-in-denali-ctp3\/\">Running totals in &#8220;Denali&#8221; CTP3<\/a>.\u00a0 Even though that study was done in a pre-release of SQL 2012, it is doubtful the performance characteristic has changed.<\/li>\n<\/ol>\n<p>Overall, the SQL 2012 window frame offers a welcome addition to the SQL window functions.\u00a0 The performance is better than most methods and given that its behavior is fully documented (unlike the QU), it is a prime choice to use for these problems, unless you absolutely must squeak out the highest possible performance and you&#8217;re willing to live with the &#8220;undocumented behavior&#8221; of the QU approach.<\/p>\n<h2>SQL 2012: And Then There are the Analytic Functions<\/h2>\n<p>The analytic functions in SQL 2012 offer unprecedented new functionality that is extremely useful in statistical analyses.\u00a0 Just the fact that they exist will save many developers of scientific, statistical and engineering applications lots of headaches trying to replicate the same functionality within complex SQL code.\u00a0 Of course, I&#8217;m sure CLR versions exist, but there will be cases when they can&#8217;t be used due to development constraints.\u00a0 Overall I have to give kudos to Microsoft for these as well.<\/p>\n<p>But in the world of SQL, it always depends.\u00a0 Sometimes there are alternatives that may be just enough faster that you&#8217;d want to consider using them even though they may look more complicated.<\/p>\n<h3>PERCENTILE_CONT<\/h3>\n<p>A case in point is the PERCENTILE_CONT analytic function.\u00a0 Its most common usage is to calculate the median of a statistical sample.\u00a0 With the PARTITION clause, it can do so against partitioned sets.\u00a0 Let&#8217;s compare a couple of queries that calculate median for this case.\u00a0 We can use the same sample data that we created before.<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">-- Calculate Median in SQL 2012\r\nSELECT ID, N\r\n\u00a0\u00a0\u00a0 ,Median=PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY N) OVER (PARTITION BY ID)\r\nFROM #SampleTable;\r\n\r\n-- One Way to Calculate Median in SQL 2008\r\nWITH Counts AS\r\n(\r\n\u00a0\u00a0\u00a0 SELECT ID, b.c\r\n\u00a0\u00a0\u00a0 FROM\r\n\u00a0\u00a0\u00a0 (\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 SELECT ID, c=COUNT(*)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 FROM #SampleTable\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 GROUP BY ID\r\n\u00a0\u00a0\u00a0 ) a\r\n\u00a0\u00a0\u00a0 CROSS APPLY (VALUES((c+1)\/2),(CASE c%2 WHEN 0 THEN 1+c\/2 ELSE 0 END)) b(c)\r\n),\r\n\u00a0\u00a0\u00a0 CalculateMedian AS\r\n(\r\n\u00a0\u00a0\u00a0 SELECT a.ID, Median=CAST(AVG(0.+b.N) AS FLOAT)\r\n\u00a0\u00a0\u00a0 FROM\r\n\u00a0\u00a0\u00a0 (\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 SELECT ID, N\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ,rn=ROW_NUMBER() OVER (PARTITION BY ID ORDER BY N)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 FROM #SampleTable a\r\n\u00a0\u00a0\u00a0 ) a \r\n\u00a0\u00a0\u00a0 CROSS APPLY\r\n\u00a0\u00a0\u00a0 (\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 SELECT N\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 FROM Counts b\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 WHERE a.ID = b.ID AND a.rn = b.c\r\n\u00a0\u00a0\u00a0 ) b\r\n\u00a0\u00a0\u00a0 GROUP BY a.ID\r\n)\r\nSELECT a.ID, N, Median\r\nFROM #SampleTable a\r\nJOIN CalculateMedian b ON a.ID = b.ID;\r\n<\/pre>\n<p>While the second query may look extraordinarily complex, the results when we run it are much better than PERCENTILE_CONT.<\/p>\n<table class=\"MsoNormalTable\">\n<tbody>\n<tr>\n<td rowspan=\"2\" valign=\"bottom\">\n<p><b> \u00a0<\/b><\/p>\n<p><b> MEDIAN<\/b><\/p>\n<\/td>\n<td colspan=\"2\" valign=\"bottom\">\n<p><b> With INDEX<\/b><\/p>\n<\/td>\n<td colspan=\"2\" valign=\"bottom\">\n<p><b> Without INDEX<\/b><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"bottom\">\n<p><b> CPU (ms)<\/b><\/p>\n<\/td>\n<td valign=\"bottom\">\n<p><b> Elapsed (ms)<\/b><\/p>\n<\/td>\n<td valign=\"bottom\">\n<p><b> CPU (ms)<\/b><\/p>\n<\/td>\n<td valign=\"bottom\">\n<p><b> Elapsed (ms)<\/b><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"bottom\">\n<p>SQL 2012: PERCENTILE_CONT<\/p>\n<\/td>\n<td valign=\"bottom\">\n<p>6520<\/p>\n<\/td>\n<td valign=\"bottom\">\n<p>5842<\/p>\n<\/td>\n<td valign=\"bottom\">\n<p>13385<\/p>\n<\/td>\n<td valign=\"bottom\">\n<p>3786<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"bottom\">\n<p>SQL 2008: Complex Median Query<\/p>\n<\/td>\n<td valign=\"bottom\">\n<p>2340<\/p>\n<\/td>\n<td valign=\"bottom\">\n<p>2327<\/p>\n<\/td>\n<td valign=\"bottom\">\n<p>5477<\/p>\n<\/td>\n<td valign=\"bottom\">\n<p>1897<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"bottom\">\n<p><b> Cost<\/b><\/p>\n<\/td>\n<td valign=\"bottom\">\n<p>179%<\/p>\n<\/td>\n<td valign=\"bottom\">\n<p>151%<\/p>\n<\/td>\n<td valign=\"bottom\">\n<p>144%<\/p>\n<\/td>\n<td valign=\"bottom\">\n<p>100%<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>And there are even (much) faster solutions available to <a href=\"https:\/\/www.simple-talk.com\/sql\/t-sql-programming\/calculating-the-median-value-within-a-partitioned-set-using-t-sql\/\"> calculate median<\/a> for cases where the suggested index is present.\u00a0 The test harness provided in this article&#8217;s resources section is described at the end.<\/p>\n<p>On the other hand, <a href=\"http:\/\/technet.microsoft.com\/en-us\/library\/hh231473.aspx\">PERCENTILE_CONT<\/a> does a lot more than just calculate the median.\u00a0 We&#8217;ll let you explore the description by Microsoft linked into the function name to find out more.<\/p>\n<h3>LAG and LEAD Applied to Finding Gaps in Sequence Numbers<\/h3>\n<p>Recently, Simple Talk published an article of mine entitled <a href=\"https:\/\/www.simple-talk.com\/sql\/t-sql-programming\/the-sql-of-gaps-and-islands-in-sequences\/\">The SQL of Gaps and Islands in Sequences<\/a>.\u00a0 In that article I compared 3 &#8220;traditional&#8221; solutions for the gaps problem to one of my own design.\u00a0 We&#8217;ll now take that test harness provided for 1M rows and modify it (see description of the resource files provided at the end) to consider the methods within SQL 2012 where LAG and LEAD can be applied to identify gaps.\u00a0 Oftentimes you see one or the other of these solutions suggested by the author but here we&#8217;ll show both.<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT ID, StartGap=SeqNo+1, EndGap=nxt-1\r\nFROM (\r\n\u00a0\u00a0\u00a0 SELECT ID, SeqNo\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ,nxt=LEAD(SeqNo, 1) OVER (PARTITION BY ID ORDER BY SeqNo)\r\n\u00a0\u00a0\u00a0 FROM dbo.GapsIslands\r\n) a\r\nWHERE nxt - SeqNo &lt;&gt; 1;\r\n\r\nSELECT ID, StartGap=prv+1, EndGap=SeqNo-1 \r\nFROM (\r\n\u00a0\u00a0\u00a0 SELECT ID, SeqNo\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ,prv=LAG(SeqNo, 1) OVER (PARTITION BY ID ORDER BY SeqNo)\r\n\u00a0\u00a0\u00a0 FROM dbo.GapsIslands\r\n) a\r\nWHERE SeqNo - prv &lt;&gt; 1;\r\n<\/pre>\n<p>Once again, these solutions are nice and concise, but let&#8217;s see how they perform in our test harness.<\/p>\n<table class=\"MsoNormalTable\">\n<tbody>\n<tr>\n<td valign=\"bottom\">\n<p><b> MEDIAN<\/b><\/p>\n<\/td>\n<td valign=\"bottom\">\n<p><b> CPU (ms)<\/b><\/p>\n<\/td>\n<td valign=\"bottom\">\n<p><b> Elapsed (ms)<\/b><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"bottom\">\n<p>Gaps Solution #1 from SQL MVP Deep Dives<\/p>\n<\/td>\n<td valign=\"bottom\">\n<p>780<\/p>\n<\/td>\n<td valign=\"bottom\">\n<p>782<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"bottom\">\n<p>Gaps Solution #2 from SQL MVP Deep Dives<\/p>\n<\/td>\n<td valign=\"bottom\">\n<p>4665<\/p>\n<\/td>\n<td valign=\"bottom\">\n<p>1451<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"bottom\">\n<p>Gaps Solution #3 from SQL MVP Deep Dives<\/p>\n<\/td>\n<td valign=\"bottom\">\n<p>1233<\/p>\n<\/td>\n<td valign=\"bottom\">\n<p>1219<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"bottom\">\n<p>CROSS APPLY VALUES &#8211; Islands to Gaps<\/p>\n<\/td>\n<td valign=\"bottom\">\n<p>421<\/p>\n<\/td>\n<td valign=\"bottom\">\n<p>462<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"bottom\">\n<p>SQL 2012 LEAD<\/p>\n<\/td>\n<td valign=\"bottom\">\n<p>1747<\/p>\n<\/td>\n<td valign=\"bottom\">\n<p>1740<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"bottom\">\n<p>SQL 2012 LAG<\/p>\n<\/td>\n<td valign=\"bottom\">\n<p>1357<\/p>\n<\/td>\n<td valign=\"bottom\">\n<p>1363<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>In this case, all of the traditional solutions outperformed the SQL 2012 LEAD function, and all but one outperformed LAG, in elapsed time.\u00a0 Only one traditional solution consumed more CPU time. \u00a0My conclusion from this is that, while LAG (or LEAD) may improve the clarity of your code, opting for one of the well-documented, traditional approaches to gaps will solve the problem (particularly against very large row sets) more efficiently.<\/p>\n<p>Wayne Sheffield also blogged on some of the existing high performance approaches to gaps in <a href=\"http:\/\/blog.waynesheffield.com\/wayne\/archive\/2012\/04\/sql-server-2012-performance-test-gap-detection\/\">SQL 2012 Performance Test: Gap Detection<\/a>.\u00a0 In this blog, he compares the performance of LEAD only to a couple of competing high-performance methods that can be run in earlier versions of SQL.\u00a0 His test of LEAD also showed it to be much slower (cost about 250% according to my prior calculation method) than the other approaches that he tested.\u00a0 Using LAG would be expected to have approximately the same performance characteristic (although in my test harness it appears to be slightly better).<\/p>\n<h2>Conclusions<\/h2>\n<p>Just as new doesn&#8217;t always mean better, simpler sometimes does not always translate into better either, particularly where performance is concerned.\u00a0 Ultimately you must be the judge of whether simpler query forms are more desirable from a maintenance perspective than having a better performing solution.<\/p>\n<p>As far as the SQL window ranking functions are concerned, there is probably no better approach available in earlier versions of SQL.\u00a0 On the other hand, the SQL window aggregate functions can be much improved upon with respect to performance by relying on only a slightly less simple code pattern based on standard aggregate functions.<\/p>\n<p>For the SQL 2012 enhancements to the window aggregates (the window frame), for problems such as running totals they&#8217;re probably a good alternative if you don&#8217;t like or understand how the Quirky Update works.\u00a0 LEAD and LAG show promise, but there remain a few traditional methods that outperform them for problems like finding gaps in sequence numbers.<\/p>\n<p>All of the new SQL 2012 analytic window functions provided unprecedented flexibility.\u00a0 But using PERCENTILE_CONT for calculating median can be improved upon from a performance perspective using more traditional, albeit more complicated code patterns.<\/p>\n<p>In the end, using the latest query forms may offer a shorter path to getting your code working, because the queries tend to be simpler.\u00a0 This article is simply a suggestion that you consider looking into and testing other code patterns in cases where performance is really important to you.<\/p>\n<p>The resource files provided with this article are:<\/p>\n<ul>\n<li><b><i> Window Aggregate Functions Test Harness.sql<\/i><\/b> &#8211; This test harness is initially set up to create a 1,000,000 row test table and then display the timing results for window aggregates vs. corresponding traditional aggregations, while suppressing the actual query results by assigning to a local variable.\u00a0 Comments in the beginning will guide you if you want to just display results for the simple test data shown in the initial example.<\/li>\n<li><b><i> Median Test Harness.sql<\/i><\/b> &#8211; This test harness is initially set up to create a 1,000,000 row (approximate) test table and then display the timing results for the SQL 2012 vs. an alternative median solution, while suppressing the actual query results by assigning to a local variable.\u00a0 Comments in the beginning will guide you if you want to just display results for the simple test data shown in the initial example.<\/li>\n<li><b><i> Gaps Test Harness.sql<\/i><\/b> &#8211; This test harness is initially set up to create a 1,000,000 row (approximate) test table filled with gaps and then test 4 &#8220;traditional&#8221; gaps solutions vs. the SQL 2012 LEAD and LAG functions, while suppressing the actual query results by assigning to a local variable.\u00a0 Comments in the beginning will guide you if you want to just display results for the simple test data shown in the initial example.<\/li>\n<\/ul>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Window Functions in SQL greatly simplify a whole range of financial and statistical aggregations on sets of data. Because there is less SQL on the page, it is easy to assume that the performance is better too: but is it? Dwain gets out the test harness to investigate.&hellip;<\/p>\n","protected":false},"author":221942,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143531],"tags":[4150,4183,4252],"coauthors":[6800],"class_list":["post-1750","post","type-post","status-publish","format-standard","hentry","category-t-sql-programming-sql-server","tag-sql","tag-t-sql","tag-t-sql-programming"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/1750","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/221942"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=1750"}],"version-history":[{"count":7,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/1750\/revisions"}],"predecessor-version":[{"id":84244,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/1750\/revisions\/84244"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=1750"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=1750"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=1750"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=1750"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}