{"id":793,"date":"2010-02-04T00:00:00","date_gmt":"2010-02-04T00:00:00","guid":{"rendered":"https:\/\/test.simple-talk.com\/uncategorized\/writing-efficient-sql-set-based-speed-phreakery\/"},"modified":"2021-06-03T16:44:18","modified_gmt":"2021-06-03T16:44:18","slug":"writing-efficient-sql-set-based-speed-phreakery","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/performance-sql-server\/writing-efficient-sql-set-based-speed-phreakery\/","title":{"rendered":"Writing Efficient SQL: Set-Based Speed Phreakery"},"content":{"rendered":"<div id=\"pretty\">\n<p class=\"start\">SQL is a declarative language, designed to work with sets of data. However, it does support procedural, &#8220;row-by-row&#8221; constructs in the form of explicit cursors, loops and so on. The use of such constructs often offers the most intuitive route to the desired SQL solution, especially for those with backgrounds as application developers. Unfortunately, the performance of these solutions is often sub-optimal; sometimes <i>very<\/i> sub-optimal.<\/p>\n<p>Some of the techniques used to achieve fast SQL code, and avoid row-by-row processing can, at first, seem somewhat obscure. This, in turn, raises questions over the maintainability of the code. However, in fact, there are really only a handful of these techniques that need to be dissected and understood in order to open a path towards fast, efficient SQL code. The intent of this article is to investigate a very common &#8220;running total&#8221; reporting problem, offer some fairly typical &#8220;procedural&#8221; SQL solutions, and then present much faster &#8220;set-based&#8221; solutions and dissect the techniques that they use.<\/p>\n<p>The examples in this article are based on the classic &#8220;running total&#8221; problem, which formed the basis for Phil Factor&#8217;s first <a href=\"http:\/\/ask.sqlservercentral.com\/questions\/92\/the-subscription-list-sql-problem\">SQL Speed Phreak Competition<\/a>. This challenge is not some far-fetched scenario that you will never encounter in reality, but a common reporting task, similar to a report that your boss may ask you to produce at any time.<\/p>\n<p>In my experience, I&#8217;ve mostly found that the ease of producing the solution is inversely proportional to its performance and scalability. In other words, there is an easy solution and a fast solution, as well as many solutions in between. One may argue that for a report that runs once a month, clarity and maintainability are as important as speed, and there is some truth is this. However, bear in mind that while you can get away with a simple solution on a table with a few thousand rows, it won&#8217;t scale. As the number of rows grows so the performance will degrade, sometimes exponentially.<\/p>\n<p>Furthermore, if you don&#8217;t know what&#8217;s possible in terms of performance, then you have no basis to judge the effectiveness of your solution. Once you&#8217;ve found the fastest solution possible then, if necessary, you can &#8220;back out&#8221; to a solution that is somewhat slower but more maintainable, in full knowledge of the size of the compromise you&#8217;ve made.<\/p>\n<h2>The Subscription List Challenge<\/h2>\n<p>The Subscription List challenge, as presented in Phil Factor&#8217;s Speed Phreak competition, consisted of one table containing a list of subscribers, with the dates that the subscribers joined, and the dates that they cancelled their subscription. The task was to produce a report, by month, listing:<\/p>\n<ul>\n<li>The number of new subscriptions<\/li>\n<li>The number of subscribers who have left<\/li>\n<li>A running total of current subscribers<\/li>\n<\/ul>\n<p>Just about anything you wished to do was allowed in this challenge. You could use any version of SQL Server, change the indexes on the original table, use temp tables, or even use cursors. The person whose T-SQL solution returned the correct results in the fastest time was the winner.<\/p>\n<p>In a nice twist, Phil Factor reported the standings daily and allowed multiple submissions. This allowed the entrants to fine-tune their solutions, and to help out some of the other contestants. The original solutions were developed against a sample dataset of 10,000 rows. However, the top entries performed so well that they were also tested against a 1 million row dataset. Likewise, all of the solutions in this were developed against the 10K-row dataset, but tested against both.<\/p>\n<p>The eventual winner was <b>Peso<\/b>, the SQL Server MVP Peter Larsson, whose solution we will present and dissect, and which returned the required report, running against a 1 million row table, in just over 300 milliseconds. Eight other entrants ran very close, with solutions that ran in under 600 milliseconds.<\/p>\n<p>Speed was all important in this contest; this was a super-charged drag race and there was little point turning up to the starting line in a mini-van. However, for lesser SQL mortals, you could be forgiven for regarding Peso&#8217;s winning solution with an equal mix of amazement and confusion. Hopefully, after reading the rest of this article, the solution will be clearer and the techniques accessible to you in solving your own T-SQL problems.<\/p>\n<h2>Setting Up<\/h2>\n<p>All you need to do to get started is to:<\/p>\n<ol>\n<li>Create a new database in your test or development SQL Server instance.<\/li>\n<li>Download the sample data from <a href=\"http:\/\/ask.sqlservercentral.com\/questions\/92\/the-subscription-list-sql-problem\">http:\/\/ask.sqlservercentral.com\/questions\/92\/the-subscription-list-sql-problem<\/a>.<\/li>\n<li>Play the script found in the download to create and populate the table in your sample database.<\/li>\n<\/ol>\n<p>Go ahead and take a look at the data by running <span class=\"STCodeinTextChar\">SELECT<\/span> <span class=\"STCodeinTextChar\">*<\/span> <span class=\"STCodeinTextChar\">FROM<\/span> <span class=\"STCodeinTextChar\">Registrations<\/span>. Figure 1 shows a sample of the data that we will be using to create the report.<\/p>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/933-KK1.JPG\" alt=\"933-KK1.JPG\" \/><\/p>\n<p class=\"caption\">Figure 1: The data found in the Registrations table<\/p>\n<p>Notice that many registrations do not have a <span class=\"STCodeinTextChar\">DateLeft<\/span> value. These subscriptions are ongoing and are active past the time period required for the report.<\/p>\n<h2>Validation<\/h2>\n<p>In order to prove that our solution works, we need to know what the correct answer should look like. In this case, of course, we could simply run Peso&#8217;s winning script. However, in practice, we won&#8217;t have that script and so we&#8217;ll need to do our own investigation of the data, to make sure we fully understand the report&#8217;s requirements.<\/p>\n<p>The report must provide a list of the months, from the beginning of the data to the month of the contest (September, 2009), along with the count of new subscribers and cancellations for each month. After September 2009, there are no new registrations, only cancellations. The report must also supply a running total of the number of active subscriptions. The simple calculation for the number of active subscriptions in a given month is:<\/p>\n<p>Number of subscriptions from previous month<br \/>\n+ new subscribers in current month<br \/>\n&#8211; cancellations in current month<\/p>\n<p>To get started, run these two queries from Listing 1:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">-- new registrations per month  \r\nSELECT\u00a0 count(*) AS newRegistrations ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 year(datejoined) AS Year ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 month(datejoined) AS Month\r\nFROM\u00a0\u00a0\u00a0 REGISTRATIONS\r\nGROUP BY year(datejoined) ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 month(datejoined)\r\nORDER BY Year ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Month ;\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \r\n-- Unsubscribes per month\r\nSELECT\u00a0 count(*) AS Cancellations ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 year(dateleft) AS Year ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 month(dateleft) AS Month\r\nFROM\u00a0\u00a0\u00a0 REGISTRATIONS\r\nGROUP BY year(dateleft) ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 month(dateleft)\r\nORDER BY Year ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Month ;\r\n<\/pre>\n<p class=\"caption\">Listing 1: Another look at the data<\/p>\n<p>Figure 2 shows the partial results. The first query shows the new registrations for each month, and the second query shows the cancellations. Notice that there are no cancellations until June 2004.<\/p>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/933-KK2.JPG\" alt=\"933-KK2.JPG\" \/><\/p>\n<p class=\"caption\">Figure 2: The counts of new registrations and cancellations.<\/p>\n<p>Let&#8217;s manually calculate the first few results. The first month, January, 2004, has 167 new subscriptions with no cancellations; therefore, the running total is 167. There are no cancellations until June 2004, so we can just add the new subscriptions for each month through May. In June, we have to subtract one cancellation. Table 1 shows the results for the first few months of 2004, manually calculated, which provide a good representation of the report.<\/p>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/933-KK3.JPG\" alt=\"933-KK3.JPG\" \/><\/p>\n<p class=\"caption\">Table 1: The report for 2004<\/p>\n<p>Now we have a means to validate the results returned by our report, we&#8217;re ready to start building it. Remember that, just like Peso, we are unlikely to come up with the best solution first; more likely we&#8217;ll create a workable solution first and then tweak it to improve performance. Of course, after each tweak of your own solutions, be sure to implement regression testing to make sure that the improved solution still returns the correct results.<\/p>\n<h2>Row by Agonizing Row<\/h2>\n<p>The reason that so much SQL code found out in the wilds performs poorly is that SQL is nobody&#8217;s &#8220;first language&#8221; (with the possible exception of Joe Celko). Developers tend to master the basic SQL syntax quickly, but then start firing out SQL data retrieval code in a manner strongly influenced by the philosophy and practices of their primary, procedural programming language. When speaking at my local VB.NET group, I found that some of the developers in the audience were shocked that there is another, better way to perform updates than one row at a time.<\/p>\n<p>At its most brutal, the row-by-row solution might look something like the one shown in Listing 2. There is no aggregation going on at all. What we have here is basically an explicit cursor that runs through every single row in the dataset, one at a time, performing the necessary calculations and keeping track of the running totals in a temporary table.<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">--Variables\r\nDECLARE @the_month DATETIME\r\nDECLARE @last_month DATETIME\r\nDECLARE @date_joined DATETIME\r\nDECLARE @date_left DATETIME \r\nDECLARE @outside_month DATETIME\r\n\u00a0\r\n--Table to hold results\r\nCREATE TABLE #subscriptions\r\n\u00a0\u00a0\u00a0 (\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 theMonth DATETIME ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 PeopleJoined INT ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 PeopleLeft INT ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 Subscriptions INT\r\n\u00a0\u00a0\u00a0 )\r\n\u00a0\r\n--Set variables for processing\r\nSELECT\u00a0 @the_month = min(DateJoined) ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 @last_month = max(DateJoined)\r\nFROM\u00a0\u00a0\u00a0 Registrations\r\n\u00a0\r\nSELECT @the_month = cast(year(@the_month)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 AS VARCHAR) + '-'\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 + cast(month(@the_month) AS VARCHAR) + '-1' ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 @last_month = cast(year(@last_month)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 AS VARCHAR) + '-'\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0+ cast(month(@last_month) AS VARCHAR) + '-1'\r\n\u00a0\r\nSET @outside_month = dateadd(M, 1, @last_month)\r\n\u00a0\r\n--Insert a row for each month in the report\r\nWHILE @the_month &lt;= @last_month \r\n\u00a0\u00a0\u00a0 BEGIN\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 INSERT\u00a0 INTO #subscriptions\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ( theMonth ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 PeopleJoined ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 PeopleLeft ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Subscriptions\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 )\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 VALUES\u00a0 ( @the_month ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0 ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0 ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 )\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 SET @the_month = dateadd(M, 1, @the_month)\r\n\u00a0\u00a0\u00a0 END\r\n\u00a0\r\n--Cursor to look at each row\r\nDECLARE regs CURSOR FAST_FORWARD FOR\r\nSELECT dateJoined, coalesce(dateLeft,@outside_month)\r\nFROM Registrations\r\n\u00a0\r\nOPEN regs\r\nFETCH NEXT FROM regs INTO @date_joined, @date_left\r\nWHILE @@FETCH_STATUS = 0 \r\n\u00a0\u00a0\u00a0 BEGIN\r\n\u00a0\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 SELECT\u00a0 @date_joined = cast(year(@date_joined)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 AS VARCHAR) + '-'\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 + cast(month(@date_joined)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 AS VARCHAR) + '-1' ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 @date_left = cast(year(@date_left)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0AS VARCHAR) + '-'\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 + cast(month(@date_left)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 AS VARCHAR) + '-1'\r\n\u00a0\r\n\u00a0\u00a0 --Update every row that the subscription is valid\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 UPDATE\u00a0 #subscriptions\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 SET\u00a0\u00a0\u00a0\u00a0 PeopleJoined = PeopleJoined + 1 ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Subscriptions = Subscriptions + 1\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 WHERE\u00a0\u00a0 theMonth &gt;= @date_joined\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 AND theMonth &lt; @date_left\r\n\u00a0\r\n\u00a0\u00a0 --Process calculations\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 UPDATE\u00a0 #subscriptions\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 SET\u00a0\u00a0\u00a0\u00a0 PeopleLeft = PeopleLeft + 1\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 WHERE\u00a0\u00a0 theMonth = @date_left\u00a0\u00a0 \r\n\u00a0\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 FETCH NEXT FROM regs INTO @date_joined, @date_left\r\n\u00a0\u00a0\u00a0 END\r\nCLOSE regs\r\nDEALLOCATE regs\r\n\u00a0\r\n--The report!\r\nSELECT\u00a0 *\r\nFROM\u00a0\u00a0\u00a0 #subscriptions\r\nDROP TABLE #subscriptions\r\n<\/pre>\n<p class=\"caption\">Listing 2: The RBAR solution<\/p>\n<p>The term row-by-agonizing row (RBAR), first coined by Jeff Moden on SQLServerCentral.com, is apt here. This solution, while providing the correct results, runs in 7 seconds on 10,000 rows, on the hardware used to test the solutions in the contest. When applied to the 1 million row dataset, it took 13 minutes!<\/p>\n<p>Imagine a box of marbles sitting in a box on a table. If you needed to move all the marbles to another box on the other side of the room without picking up either box, how would you do it? Would you pick up each marble, one at a time and march it across the room to the second box? Unless you were getting paid by the minute, you would probably scoop them all up and carry them all in one trip. You would avoid the marble-by-marble method, if you wanted to be efficient.<\/p>\n<h2>Faster, but Still Iterative<\/h2>\n<p>A kinder, though still iterative solution might take the following approach:<\/p>\n<ul>\n<li>Create a temporary table <span class=\"STCodeinTextChar\">(#subscriptions<\/span>), load it with the <span class=\"STCodeinTextChar\">DateJoined<\/span> data from the <span class=\"STCodeinTextChar\">Registrations<\/span> table, and then aggregate on this column, by month and year, to get a count of the number of people who joined in each month<\/li>\n<li>Create a CTE (<span class=\"STCodeinTextChar\">cancellations<\/span>), load it with the <span class=\"STCodeinTextChar\">DateLeft<\/span> data from the <span class=\"STCodeinTextChar\">Registrations<\/span> table, and then aggregate on this column by month and year to get a count of the number of people who left in each month, in each year. In lieu of a CTE, you could also use another temp table or a derived table.<\/li>\n<li>Update the <span class=\"STCodeinTextChar\">PeopleLeft<\/span> column in <span class=\"STCodeinTextChar\">#subscriptions<\/span> with the number of people who left in each month, from the CTE<\/li>\n<li>Use a cursor to loop through each month in the temporary table, calculating the required running total of subscribers<\/li>\n<\/ul>\n<p>This solution is shown in Listing 3.<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">--variables\r\nDECLARE @total INT\r\nDECLARE @people_joined INT\r\nDECLARE @people_left INT \r\nDECLARE @the_month DATETIME \r\n\u00a0\r\n--create a table to hold the results\r\nCREATE TABLE #subscriptions\r\n\u00a0\u00a0\u00a0 (\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 theMonth DATETIME ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 PeopleJoined INT ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 PeopleLeft INT ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 Subscriptions INT\r\n\u00a0\u00a0\u00a0 )\r\n\u00a0\r\n--insert a row for each month\r\n--in the data\r\nINSERT\u00a0 INTO #subscriptions\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ( theMonth ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 PeopleJoined ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 PeopleLeft ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Subscriptions\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 )\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 SELECT\u00a0 cast(year(DateJoined) AS VARCHAR) + '-'\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 + cast(month(DateJoined) AS VARCHAR) + '-1' ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 count(*) ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0 ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 FROM\u00a0\u00a0\u00a0 Registrations\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 GROUP BY cast(year(DateJoined) AS VARCHAR) + '-'\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 + cast(month(DateJoined) AS VARCHAR) + '-1'\r\n\u00a0\r\n--update for cancellations\r\n;\r\nWITH\u00a0\u00a0\u00a0 CANCELLATIONS\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 AS ( SELECT\u00a0\u00a0 count(*) CANC_COUNT ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 cast(year(DateLeft) AS VARCHAR) + '-'\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 + cast(month(DateLeft) AS VARCHAR) + '-1' AS Dateleft\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 FROM\u00a0\u00a0\u00a0\u00a0 Registrations\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 WHERE\u00a0\u00a0\u00a0 DATELEFT IS NOT NULL\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 GROUP BY cast(year(DateLeft) AS VARCHAR) + '-'\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 + cast(month(DateLeft) AS VARCHAR) + '-1'\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 )\r\n\u00a0\u00a0\u00a0 UPDATE\u00a0 S\r\n\u00a0\u00a0\u00a0 SET\u00a0\u00a0\u00a0\u00a0 PeopleLeft = CANC_COUNT\r\n\u00a0\u00a0\u00a0 FROM\u00a0\u00a0\u00a0 #subscriptions S\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 INNER JOIN CANCELLATIONS C \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ON S.theMonth = Dateleft\r\n\u00a0\r\nSET @total = 0\r\n--set up a cursor to update the total subscriptions\r\n--for each month\r\nDECLARE SUBSCRIPTIONS CURSOR FOR\r\nSELECT THEMONTH, PEOPLEJOINED, PEOPLELEFT\r\nFROM #SUBSCRIPTIONS ORDER BY THEMONTH\r\n\u00a0\r\nOPEN SUBSCRIPTIONS\r\nFETCH NEXT FROM SUBSCRIPTIONS INTO @the_month, @people_joined, \r\n\u00a0\u00a0 @people_left\r\nWHILE @@FETCH_STATUS = 0 \r\n\u00a0\u00a0\u00a0 BEGIN\r\n\u00a0\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 SET @total = @total + @people_joined - \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 @people_left \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 UPDATE\u00a0 #subscriptions\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 SET\u00a0\u00a0\u00a0\u00a0 Subscriptions = @total\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 WHERE\u00a0\u00a0 theMonth = @the_month \r\n\u00a0\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 FETCH NEXT FROM SUBSCRIPTIONS \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 INTO @the_month, @people_joined, @people_left\r\n\u00a0\u00a0\u00a0 END\r\nCLOSE SUBSCRIPTIONS\r\nDEALLOCATE SUBSCRIPTIONS\r\n\u00a0\r\n--the report\r\nSELECT\u00a0 *\r\nFROM\u00a0\u00a0\u00a0 #subscriptions\r\nORDER BY THEMONTH\r\n\u00a0\r\nDROP TABLE #subscriptions\r\n<\/pre>\n<p class=\"caption\">Listing 3: The Easy, Iterative Solution<\/p>\n<p>So, instead of looping through every row in the data, we aggregate the data to the month level, and then loop through each possible month in the report performing the calculations. We have a lot of date calculations going on in our <span class=\"STCodeinTextChar\">INSERT<\/span> and <span class=\"STCodeinTextChar\">UPDATE<\/span> statements. These date calculations are very expensive because multiple functions must operate on each date.<\/p>\n<p>When tested, this solution took 360 milliseconds on 10,000 rows and three seconds on 1 million rows. While still not performing as well as the winning solution, it is a big improvement over the original row-by-row solution.<\/p>\n<p>So, this solution, while still flawed, is at least one that will work adequately on a small report table. The other advantage is that the logic behind it is crystal clear, and it will be an easily maintainable solution. BUT&#8230;it doesn&#8217;t scale. As the results indicate, as more and more rows are added to the <span class=\"STCodeinTextChar\">registrations<\/span> table, the report will take longer and longer to run.<\/p>\n<p>We could spend some time tweaking to improve performance, but let&#8217;s move on to an even better way.<\/p>\n<h2>The Fastest Solution: DATEDIFF, UNPIVOT, and a &#8220;Quirky Update&#8221;<\/h2>\n<p>The winning script in the competition, script &#8220;4e&#8221; submitted by Peter Larsson (Peso), is reproduced in Listing 4. As noted earlier, it performs so well that, against 10K rows it is almost un-measurably fast, and even when tested against a dataset of one million rows, it took just milliseconds to run. In other words, Peso&#8217;s solution is scalable!<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">\/***************************************************** \r\n\u00a0\u00a0\u00a0 Peso 4e - 20091017 \r\n*******************************************************\/\r\n--A\r\nCREATE TABLE #Stage\r\n\u00a0\u00a0 (\r\n\u00a0\u00a0\u00a0\u00a0 theMonth SMALLINT NOT NULL ,\r\n\u00a0\u00a0\u00a0\u00a0 PeopleJoined INT NOT NULL ,\r\n\u00a0\u00a0\u00a0\u00a0 PeopleLeft INT NOT NULL ,\r\n\u00a0\u00a0\u00a0\u00a0 Subscribers INT NOT NULL\r\n\u00a0\u00a0 )\r\n--B\r\nINSERT\u00a0\u00a0 #Stage\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ( theMonth ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 PeopleJoined ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 PeopleLeft ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Subscribers\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 )\r\n--C\r\nSELECT\u00a0\u00a0 u.theMonth ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 sum(case WHEN u.theCol = 'DateJoined'\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 THEN u.Registrations\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ELSE 0\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 END) AS PeopleJoined ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 sum(case WHEN u.theCol = 'DateLeft' \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 THEN u.Registrations\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ELSE 0\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 END) AS PeopleLeft ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0 AS Subscribers\r\n--D\r\n\u00a0\u00a0 FROM\u00a0\u00a0\u00a0\u00a0 (\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \r\n--E\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 SELECT datediff(MONTH, 0, DateJoined) AS DateJoined ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 datediff(MONTH, 0, DateLeft) AS DateLeft ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 count(*) AS Registrations\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 FROM\u00a0\u00a0 dbo.Registrations\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 GROUP BY datediff(MONTH, 0, DateJoined) ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0datediff(MONTH, 0, DateLeft)\r\n--F\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0) AS d\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 UNPIVOT ( theMonth\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 FOR theCol IN ( d.DateJoined, d.DateLeft )\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ) AS u\r\n--G\r\n\u00a0\u00a0 GROUP BY u.theMonth\r\n--H\r\n\u00a0\u00a0 HAVING\u00a0\u00a0 sum(case WHEN u.theCol = 'DateJoined'\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 THEN u.Registrations\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ELSE 0\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 END) &gt; 0\r\n--I\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \r\nDECLARE @Subscribers INT = 0 ;\r\n--J\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \r\n;\r\nWITH\u00a0 Yak ( theMonth, PeopleJoined, PeopleLeft,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Subscribers )\r\n\u00a0\u00a0AS (\u00a0\u00a0 \r\n--K\r\n\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0SELECT TOP 2147483647\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0dateadd(MONTH, theMonth, 0) AS theMonth ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0PeopleJoined ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 PeopleLeft ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0Subscribers\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0FROM\u00a0\u00a0\u00a0 #Stage\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ORDER BY theMonth\r\n--L\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 )\r\n\u00a0\u00a0 --M\r\nUPDATE\u00a0\u00a0 Yak\r\nSET\u00a0\u00a0\u00a0\u00a0\u00a0 @Subscribers = Subscribers = @Subscribers + PeopleJoined - PeopleLeft\r\n--N\r\nOUTPUT\u00a0\u00a0 inserted.theMonth ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 inserted.PeopleJoined ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 inserted.PeopleLeft ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 inserted.Subscribers\r\n--O\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \r\nDROP TABLE #stage\r\n--P\r\n<\/pre>\n<p class=\"caption\">Listing 4 Peso 4e<\/p>\n<p>As you can see, we still have the temporary table and the CTE, but what goes on inside each is quite different. Also, this solution has no need for the dreaded cursor, although it does use a few slightly &#8220;unconventional&#8221; techniques to arrive at the answer. Let&#8217;s break this bad boy apart and see if we can learn something.<\/p>\n<p>The overall strategy behind this solution is to pre-aggregate the data down to the rows required in the report while minimizing the date calculations. By doing so, the query engine makes just one pass through the initial data and works with as little data as possible when calculating the running total.<\/p>\n<p>There are three major sections to the code:<\/p>\n<ul>\n<li>Aggregation <span class=\"STBold\">Process<\/span>&#8211; the goal here is to reduce the working dataset, stored in a temp table called <span class=\"STCodeinTextChar\">#Stage,<\/span> to the minimum possible number of rows, so that date calculations and the subsequent running total calculation are inexpensive.<\/li>\n<li>Running Total calculation &#8211; here, we run an <span class=\"STCodeinTextChar\">UPDATE<\/span> statement on a CTE to calculate the running total<\/li>\n<li>Producing the report &#8211; \u00a0making clever use of the <span class=\"STCodeinTextChar\">OUTPUT<\/span> clause<\/li>\n<\/ul>\n<h2>The Aggregation Process<\/h2>\n<p>The essential goal of this step is the same in Peso&#8217;s solution (Listing 4) as it was in the solution seen in Listing 3. We need to build a temp table that contains a row for each month and a count of the new subscriptions and cancellations, upon which to base the running total calculation. However, the way that each solution populates the temp table is very different.<\/p>\n<p>Take a look at the table definition for <span class=\"STCodeinTextChar\">#Stage<\/span> table (lines A to B), and you will notice something unusual right away. Whereas in the previous solutions we used a conventional <span class=\"STCodeinTextChar\">DATETIME<\/span> type for the <span class=\"STCodeinTextChar\">theMonth<\/span> column, Peso uses an integer &#8211; more about this later. Go ahead and run the table creation statement (A to B).<\/p>\n<p>The <span class=\"STCodeinTextChar\">SELECT<\/span> statement used to populate the table is an aggregate query based on the <span class=\"STCodeinTextChar\">UNPIVOT<\/span> operator. Run the <span class=\"STCodeinTextChar\">SELECT<\/span> statement without the INSERT (C to I) to see what will actually populate <span class=\"STCodeinTextChar\">#Stage<\/span>. Figure 3 shows the partial results. Instead of a date, <span class=\"STCodeinTextChar\">theMonth<\/span> contains a number from 1248 to 1316 along with the <span class=\"STCodeinTextChar\">PeopleJoined<\/span> and <span class=\"STCodeinTextChar\">PeopleLeft<\/span> column totals. The <span class=\"STCodeinTextChar\">Subscribers<\/span> column contains zeros at this point. This is the data that will be used in the second part of the solution. Except for having integers in the month, it looks very similar to the spreadsheet we created at the beginning of the article to validate the results.<\/p>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/933-KK4.JPG\" alt=\"933-KK4.JPG\" \/><\/p>\n<p class=\"caption\">Figure 3: The data used to populate #Stage<\/p>\n<p>Let&#8217;s now break this <span class=\"STCodeinTextChar\">SELECT<\/span> down, step-by-step to see how it works.<\/p>\n<h3>The Pre-aggregation Step<\/h3>\n<p>In Listing 3, in order to arrive at a temp table that contained a row for each month and a count of the new subscriptions and cancellations, we had to make two full passes through the <span class=\"STCodeinTextChar\">Registrations<\/span> table, once aggregating on the <span class=\"STCodeinTextChar\">DateJoined<\/span> column, and once on the <span class=\"STCodeinTextChar\">DateLeft<\/span>, performing some expensive date calculations as we went.<\/p>\n<p>Using a &#8220;pre-aggregation&#8221; involving the <span class=\"STCodeinTextChar\">DATEDIFF<\/span> function, followed by a clever trick using the <span class=\"STCodeinTextChar\">UNPIVOT<\/span> operator, to normalize the data, Peso managed to take just one pass through the table and so minimize the impact of date manipulations.<\/p>\n<p>In order to understand the final result set shown in Figure 3, we need to first\u00a0 look at the query that populate the derived table, on which the <span class=\"STCodeinTextChar\">UNPIVOT<\/span> operator works. Run lines E to F of Listing 3 will return the contents of this derived table, as shown in Figure 4.<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT\u00a0\u00a0 datediff(MONTH, 0, DateJoined) AS DateJoined ,  \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 datediff(MONTH, 0, DateLeft) AS DateLeft ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 count(*) AS Registrations\r\nFROM\u00a0\u00a0\u00a0\u00a0 dbo.Registrations\r\nGROUP BY datediff(MONTH, 0, DateJoined) ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 datediff(MONTH, 0, DateLeft)\r\n<\/pre>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/933-KK5.JPG\" alt=\"933-KK5.JPG\" \/><\/p>\n<p class=\"caption\">Figure 4: The heart of the unpivoted data<\/p>\n<p>It may be hard to recognize the fact at first glance, but this is, in many respects, the <span class=\"STBold\">critical moment<\/span> in the script. In it, we make a single pass through the 10K (or 1 million) row base table, and &#8220;summarize&#8221; everything we need from it, in order to solve the reporting problem, into only 586 (or 1580) rows. All subsequent operations are performed on this limited working data set and so are relatively inexpensive.<\/p>\n<p>If you look at the original data, you will see that a date can fall anywhere within a month. In order to efficiently eliminate the &#8220;day,&#8221; this query uses the <span class=\"STCodeinTextChar\">DATEDIFF<\/span> function to convert all dates to the number of months that have passed since the date represented by 0 (1900-01-01). So, for example, any date within January 2004 becomes 1248 (104*12). Instead of grouping by month and year, or converting all dates to the first of the month, this solution just converted all the dates to an integer. Eventually, as we perform the running total calculation in the second part of the solution, these integer values are converted back to dates, using <span class=\"STCodeinTextChar\">DATEADD<\/span>.<\/p>\n<p>The query counts the number of rows grouped by the <span class=\"STCodeinTextChar\">DateJoined<\/span> and <span class=\"STCodeinTextChar\">DateLeft<\/span> calculated values. Therefore, all subscribers who joined and left at the same time are grouped together, or &#8220;connected&#8221;, as part of the pre-aggregation.<\/p>\n<p>You may be wondering why the solution leaves the dates as integers in the <span class=\"STCodeinTextChar\">INSERT<\/span> statement and converts them back to dates at the end of the process. It would be simple to just apply a <span class=\"STCodeinTextChar\">DATEADD<\/span> function to add the months back in the first statement:<\/p>\n<p class=\"MsoNormal\"><span class=\"STCodeinTextChar\">DATEADD(M,DATEDIFF(M,0,DateJoined),0)<\/span><\/p>\n<p>This would convert all the dates to the first of the month at one time, but Peso chose to convert them back to dates later, during the running totals calculation. By doing so, the <span class=\"STCodeinTextChar\">DATEADD<\/span> function to convert back to date was performed on just 69 values instead of up to 20,000 values (the 10,000 rows times two dates for each row minus the NULLs)<\/p>\n<h3>The UNPIVOT Trick<\/h3>\n<p>The next step is to &#8220;normalize&#8221;, or unpivot, the pre-aggregated data. This is the step that means we can avoid any further reads of the base table, because it gives us all the data we need calculate the monthly number of subscribers.<\/p>\n<p>Let&#8217;s look at the <span class=\"STCodeinTextChar\">UNPIVOT<\/span> part of this statement (F to G), by running the query in Listing 5, which temporarily removes the <span class=\"STCodeinTextChar\">GROUP<\/span> <span class=\"STCodeinTextChar\">BY<\/span> and <span class=\"STCodeinTextChar\">HAVING<\/span> clauses and replaces the <span class=\"STCodeinTextChar\">SELECT<\/span> list found in the original solution.<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT\u00a0\u00a0 *\r\nFROM\u00a0\u00a0\u00a0\u00a0 ( SELECT datediff(MONTH, 0, DateJoined) AS DateJoined ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 datediff(MONTH, 0, DateLeft) AS DateLeft ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 count(*) AS Registrations\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 FROM\u00a0\u00a0 dbo.Registrations\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 GROUP BY datediff(MONTH, 0, DateJoined) ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 datediff(MONTH, 0, DateLeft)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ) AS d UNPIVOT\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ( theMonth FOR theCol IN ( d.DateJoined, d.DateLeft ) ) AS u\r\n<\/pre>\n<p class=\"caption\">Listing 5: The un-aggregated <span class=\"STCodeinTextChar\">UNPIVOT<\/span> query<\/p>\n<p>Figure 5 shows the data after the pre-aggregation together with the unpivoted results:<\/p>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/933-KK6.JPG\" alt=\"933-KK6.JPG\" \/><\/p>\n<p class=\"caption\">Figure 5: Before and after unpivoting<\/p>\n<p>As you can see, the <span class=\"STCodeinTextChar\">UNPIVOT<\/span> operator, introduced with SQL Server 2005, takes column headings and turns them into data. This is the opposite of the <span class=\"STCodeinTextChar\">PIVOT<\/span> function that turns data into column headings.<\/p>\n<p class=\"note\"><b>NOTE:<\/b><br \/>\nPre SQL Server 2005, we could have used a derived table with UNION ALL to mimic the <span class=\"STCodeinTextChar\">UNPIVOT<\/span> functionality. However, that would have led to two passes on the source table, and not just one.<\/p>\n<p>In this case, the <span class=\"STCodeinTextChar\">DateJoined<\/span> and <span class=\"STCodeinTextChar\">DateLeft<\/span> column names now become values in the <span class=\"STCodeinTextChar\">theCol<\/span> column. The integer values stored in <span class=\"STCodeinTextChar\">DateJoined<\/span> and <span class=\"STCodeinTextChar\">DateLeft<\/span> become individual entries in a new column, <span class=\"STCodeinTextChar\">theMonth<\/span>.\u00a0 Now, each row contains a number representing the month, a number of registrations, and a value showing whether the data represents a new subscription or a cancellation.<\/p>\n<p>Now we have a list of registrations by month and action (see Figure 6). If you take a closer look at the data after the <span class=\"STCodeinTextChar\">UNPIVOT<\/span> is applied, you will see that we have multiple rows for each <span class=\"STCodeinTextChar\">theMonth<\/span> and <span class=\"STCodeinTextChar\">theCol<\/span> combination. For example, Figure 6 shows two rows for month 1306 and <span class=\"STCodeinTextChar\">DateLeft<\/span>. The duplicates exist because the initial aggregation grouped the data on the combination of <span class=\"STCodeinTextChar\">DateJoined<\/span> and <span class=\"STCodeinTextChar\">DateLeft<\/span>. Since we have split those values apart, there are now duplicates and further aggregation must be done.<\/p>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/933-KK7.JPG\" alt=\"933-KK7.JPG\" \/><\/p>\n<p class=\"caption\">Figure 6: The UNPIVOT results<\/p>\n<h2>Repivot and Aggregate<\/h2>\n<p>Moving out a bit more, let&#8217;s look at how we re-pivot the data and perform the final aggregation. The <span class=\"STCodeinTextChar\">SELECT<\/span> list (C to D) creates a sum of <span class=\"STCodeinTextChar\">PeopleJoined<\/span> and a sum of <span class=\"STCodeinTextChar\">PeopleLeft<\/span> grouped by <span class=\"STCodeinTextChar\">theMonth <\/span>(G to H). The code to focus on here is shown in Listing 6.<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT\u00a0\u00a0 u.theMonth ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 sum(case WHEN u.theCol = 'DateJoined'\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 THEN u.Registrations\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ELSE 0\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 END) AS PeopleJoined ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 sum(case WHEN u.theCol = 'DateLeft' \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 THEN u.Registrations\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ELSE 0\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 END) AS PeopleLeft ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0 AS Subscribers\r\n--D\r\nFROM\u00a0\u00a0\u00a0\u00a0 ( --------UNPIVOTED RESULTS-------- ) AS u\r\n--E \r\n--G\r\nGROUP BY u.theMonth\r\n\u00a0\u00a0 HAVING\u00a0 sum(case WHEN u.theCol = 'DateJoined'\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 THEN u.Registrations\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ELSE 0\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 END) &gt; 0\r\n<\/pre>\n<p class=\"caption\">Listing 6: Using CASE to pivot<\/p>\n<p>The <span class=\"STCodeinTextChar\">CASE<\/span> function in the <span class=\"STCodeinTextChar\">SELECT<\/span> list determines which rows to count for <span class=\"STCodeinTextChar\">PeopleJoined<\/span> and which to count for <span class=\"STCodeinTextChar\">PeopleLeft<\/span>. If <span class=\"STCodeinTextChar\">theCol<\/span> equals <span class=\"STCodeinTextChar\">DateJoined<\/span>, then the row belongs in <span class=\"STCodeinTextChar\">PeopleJoined<\/span>. If <span class=\"STCodeinTextChar\">theCol<\/span> equals <span class=\"STCodeinTextChar\">DateLeft<\/span>, then the row belongs to <span class=\"STCodeinTextChar\">PeopleLeft<\/span>. For those of you familiar with the <span class=\"STCodeinTextChar\">PIVOT<\/span> function, you probably know that using <span class=\"STCodeinTextChar\">CASE<\/span> is another way to pivot data. In this case, the query &#8220;pivots&#8221; the <span class=\"STCodeinTextChar\">theCol<\/span> column to create a total for <span class=\"STCodeinTextChar\">PeopleJoined<\/span> and <span class=\"STCodeinTextChar\">PeopleLeft<\/span> for each month. Peso could have chosen to use <span class=\"STCodeinTextChar\">PIVOT<\/span> instead, but he found that the <span class=\"STCodeinTextChar\">CASE<\/span> method was slightly faster. Figure 7 shows how the data looks after pivoting with <span class=\"STCodeinTextChar\">CASE<\/span>.<\/p>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/933-KK8.JPG\" alt=\"933-KK8.JPG\" \/><\/p>\n<p class=\"caption\">Figure 7: After aggregating<\/p>\n<p>Finally, the <span class=\"STCodeinTextChar\">HAVING<\/span> clause (H to I) filters out any rows with no new subscriptions. In the 10,000 rows, all months after September 2009 have cancellations, but no new subscriptions, and future months are not part of the report. Since <span class=\"STCodeinTextChar\">HAVING<\/span> filters rows out after aggregates are applied, the <span class=\"STCodeinTextChar\">CASE<\/span> function in the <span class=\"STCodeinTextChar\">WHERE<\/span> clause returns the total number of new subscriptions after grouping on <span class=\"STCodeinTextChar\">theMonth<\/span>. If a month contains only cancellations, as in the months after September 2009, the <span class=\"STCodeinTextChar\">CASE<\/span> function returns zero, and the <span class=\"STCodeinTextChar\">HAVING<\/span> clause filters those rows out.<\/p>\n<p>We now have 69 rows which is also the number of rows (months) that will appear in the report. For each month (still represented by the integer result of <span class=\"STCodeinTextChar\">DATEDIFF<\/span>), we know the number of subscribers and the number of cancellations. The only thing missing is the number of subscriptions, which is still set at zero. Peso has managed to get to this point with only one pass through the data by his masterful pre-aggregation, followed by use of <span class=\"STCodeinTextChar\">UNPIVOT<\/span> and <span class=\"STCodeinTextChar\">CASE<\/span>.<\/p>\n<p class=\"MsoNormal\">Go ahead and run lines B to I to populate <span class=\"STCodeinTextChar\">#Stage<\/span>.<\/p>\n<h2>Calculating the Running Total<\/h2>\n<p>The next part of the solution calculates the required running total of subscribers via a &#8220;quirky&#8221; update command operating on a common table expression (CTE) (lines J to O).<\/p>\n<p>Again, let&#8217;s walk through it step-by-step.<\/p>\n<h3>Use a Variable to help you Count: no Cursors!<\/h3>\n<p>So far, the solution has manipulated our data to get it ready for calculating the running total. Instead of using a while loop or a cursor to calculate this total, a variable <span class=\"STCodeinTextChar\">@Subscribers<\/span> is defined and set to zero (I to J). Initializing a variable on the same line that the variable was defined is new to SQL Server 2008. If you are using SQL Server 2005, you will have to change this line:<\/p>\n<pre>DECLARE @Subscribers int = 0 \r\n<\/pre>\n<p>Into these two lines<\/p>\n<pre>DECLARE @Subscribers INT\r\nSET @Subscribers = 0\r\n<\/pre>\n<p>Even though we are not using any kind of loop, behind the scenes SQL Server updates one row at a time. Luckily, the pre-aggregation used before getting to this point has transformed the 10,000 original rows into 69, and the implicit cursor that SQL Server uses is faster than an explicit cursor.<\/p>\n<h3>The Ordered CTE<\/h3>\n<p>CTEs were introduced with SQL Server 2005. At a minimum, you can use CTEs to replace temp tables and views or to separate out a part of the logic of your query. CTEs also have some special uses, for example, for performing recursive queries.<\/p>\n<p>Inside the CTE, we convert back to &#8220;normal&#8221; dates, and we do this by loading the data from our staging table into a CTE, using the <span class=\"STCodeinTextChar\">DATEADD<\/span> function as we go. Since <span class=\"STCodeinTextChar\">DATEADD<\/span> is used just once for this small number of rows, it is not that expensive.<\/p>\n<p>If you run just the query inside the CTE definition (K to L), you will see that now <span class=\"STCodeinTextChar\">theMonth<\/span> has been converted back into a date and that the data is sorted. Only the running total of subscribers is missing. Figure 8 shows the partial results.<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT TOP (2147483647)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 dateadd(MONTH, theMonth, 0) AS theMonth ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 PeopleJoined ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 PeopleLeft ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Subscribers\r\nFROM\u00a0\u00a0\u00a0\u00a0 #Stage\r\nORDER BY theMonth\r\n<\/pre>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/933-KK9.JPG\" alt=\"933-KK9.JPG\" \/><\/p>\n<p class=\"caption\">Figure 8: The CTE results<\/p>\n<p>Normally, you cannot order data inside a CTE. One way around this rule is to use the <span class=\"STCodeinTextChar\">TOP<\/span> keyword, which allows you to retrieve a number or percentage of rows. In order to control which rows a query returns when using <span class=\"STCodeinTextChar\">TOP<\/span>, you can use <span class=\"STCodeinTextChar\">ORDER BY<\/span> to sort the data. In this case, Peso chose to use the largest possible integer, 2147483647, to make sure that all the rows will be returned.<\/p>\n<h3>The Quirky Update<\/h3>\n<p>You can use a CTE in one statement to select, update, delete, or insert data. This is the first time that I have seen the <span class=\"STCodeinTextChar\">UPDATE<\/span> statement update the data in the CTE itself (M to N) and not just another table joined to the CTE, though there is nothing stopping you from doing so as long as no rules for updating data are violated.\u00a0 The statement actually updates the <span class=\"STCodeinTextChar\">#Stage<\/span> table, which is the basis for the CTE. The <span class=\"STCodeinTextChar\">UPDATE<\/span> statement includes an <span class=\"STCodeinTextChar\">OUTPUT<\/span> clause (N to O) which we will discuss shortly.<\/p>\n<p>The <span class=\"STCodeinTextChar\">SET<\/span> clause of the <span class=\"STCodeinTextChar\">UPDATE<\/span> statement has an interesting formula:<\/p>\n<pre>SET @Subscribers = Subscribers = @Subscribers + PeopleJoined - PeopleLeft\r\n<\/pre>\n<p>For each row of our data, the <span class=\"STCodeinTextChar\">Subscribers<\/span> column is set to the current value of the variable <span class=\"STCodeinTextChar\">@Subscribers<\/span> and then increased or decreased based on what happened that month. The value stored in <span class=\"STCodeinTextChar\">Subscribers<\/span>, once it is calculated, is saved in the variable. The variable is then available for the next row of data.<\/p>\n<p>This style of update is unofficially referred to as a &#8220;quirky update&#8221;. It refers to the ability to update variables and columns in a table simultaneously and in an ordered fashion. Here, the ordering is guaranteed by the <span class=\"STCodeinTextChar\">ORDER<\/span> <span class=\"STCodeinTextChar\">BY <\/span>in our CTE. If this technique were applied to a permanent table the ordering would be established by the table&#8217;s clustered index; the update would be done according to the clustered index, but would fail of the table was partitioned, or if parallelism occurred. In any event, it is very important that the calculation is done in the correct order, and that order is guaranteed in this example.<\/p>\n<p>Since the <span class=\"STCodeinTextChar\">#Stage table is a heap we won&#8217;t have problems with the ordering being affected by a clustered index. We won&#8217;t get parallelism either because the data is now only 69 records, and holds much less data than a page. Finally, partitioning is not possible on temporary tables.<\/span><\/p>\n<h2>Producing the Report: A clever use for OUTPUT<\/h2>\n<p>The <span class=\"STCodeinTextChar\">UPDATE<\/span> statement populates the <span class=\"STCodeinTextChar\">Subscribers<\/span> column of the <span class=\"STCodeinTextChar\">#Stage<\/span> table. After the update, if we just select the current contents of <span class=\"STCodeinTextChar\">#Stage<\/span>, we have the report.\u00a0 Instead of doing that, Peso chose to use an <span class=\"STCodeinTextChar\">OUTPUT<\/span> clause, another relatively new T-SQL feature, introduced with SQL Server 2005. The <span class=\"STCodeinTextChar\">OUTPUT<\/span> clause can be used along with any data manipulation statement. It provides <span class=\"STCodeinTextChar\">deleted<\/span> and <span class=\"STCodeinTextChar\">inserted<\/span> tables, just like triggers do. The <span class=\"STCodeinTextChar\">deleted<\/span> table returns the data before the transaction. It can be data that is deleted or the data before it is updated. The <span class=\"STCodeinTextChar\">inserted<\/span> table returns the data after the transaction. It can be data that is inserted or data after the update. In this case, the <span class=\"STCodeinTextChar\">inserted<\/span> table returns the data how it looks after the update. By using this technique, we avoided accessing the table again to display the data.<\/p>\n<p class=\"MsoNormal\">Run the last part of the script (I to P) to see that the report works. Figure 9 shows the partial results.<\/p>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/933-KK10.JPG\" alt=\"933-KK10.JPG\" \/><\/p>\n<p class=\"caption\">Figure 9: The report<\/p>\n<h2>One Small Hole<\/h2>\n<p>It was stated as part of the competition rules that it could be assumed that all months had at least one new subscriber.<\/p>\n<p>However, it&#8217;s worth noting that if there was a particular month that had no new subscriptions, then that month would not be included in the results. In fact, if a month in the middle of the data had no new subscriptions but had some cancellations, the report will not count the cancellations, and the report will be incorrect. To demonstrate this, change the <span class=\"STCodeinTextChar\">DateJoined<\/span> values to 2004-07-01 in the rows where the <span class=\"STCodeinTextChar\">DateJoined<\/span> values are within June, 2004 in the <span class=\"STCodeinTextChar\">Registrations<\/span> table. Then run the script again. You will see that the report no longer includes the cancellation and all the subsequent values are incorrect.<\/p>\n<p class=\"MsoNormal\">Fortunately, this problem is easily overcome, via something like an outer join to a sequence or numbers table (see, for example, <a href=\"http:\/\/weblogs.sqlteam.com\/peterl\/archive\/2009\/11\/03\/Superfast-sequence-generators.aspx\">http:\/\/weblogs.sqlteam.com\/peterl\/archive\/2009\/11\/03\/Superfast-sequence-generators.aspx<\/a>)<\/p>\n<h1>Not Quite as Fast, but a Bit More Conventional<\/h1>\n<p>Not all queries and scripts require that you take the time to tune them to be fastest they can possibly be. The performance of a query that runs once a month does not matter as much as a query that runs 1000 times a minute. Of course, as you learn more, you will automatically incorporate better performing techniques into your code.<\/p>\n<p>In the real world, there is a balance to be struck between maintainability and outright speed. However, once you&#8217;ve started to understand the considerations that are involved in getting the code to run as fast as possible, you can then decide to &#8220;back out&#8221; to what might be a slightly slower but more maintainable solution. Alternatively, you might decide that the time required to get the code as fast as possible is more time than you have to spend.<\/p>\n<p>So, for example, if we had established Peter&#8217;s solution (Listing 4) as the fastest, but felt that we were willing to sacrifice some of the speed in return for ease of understanding and maintainability, then we might opt for a compromise solution, such as that shown in Listing 7. It is based on the &#8220;faster but still iterative&#8221; solution from Listing 3, but borrows Peso&#8217;s <span class=\"STCodeinTextChar\">DATEDIFF<\/span> technique to reduce the number of required date calculations.<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">--variables  \r\nDECLARE @total INT\r\nDECLARE @people_joined INT\r\nDECLARE @people_left INT \r\nDECLARE @the_month INT\r\n--create a table to hold the results\r\nCREATE TABLE #subscriptions\r\n\u00a0\u00a0 (\r\n\u00a0\u00a0\u00a0\u00a0 theMonth INT ,\r\n\u00a0\u00a0\u00a0\u00a0 PeopleJoined INT ,\r\n\u00a0\u00a0\u00a0\u00a0 PeopleLeft INT ,\r\n\u00a0\u00a0\u00a0\u00a0 Subscriptions INT\r\n\u00a0\u00a0 )\r\n--insert a row for each month\r\n--in the data\r\nINSERT\u00a0\u00a0 INTO #subscriptions\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ( theMonth ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 PeopleJoined ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 PeopleLeft ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Subscriptions\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 )\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 SELECT\u00a0\u00a0 datediff(M, 0, DateJoined) ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 count(*) ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0 ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 FROM\u00a0\u00a0\u00a0\u00a0 Registrations\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 GROUP BY datediff(M, 0, DateJoined)\r\n\u00a0\r\n--update for cancellations\r\n;\r\nWITH\u00a0 CANCELLATIONS\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 AS ( SELECT\u00a0 count(*) CANC_COUNT ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 datediff(M, 0, DateLeft) AS\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0DateLeft\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 FROM\u00a0\u00a0\u00a0 Registrations\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 WHERE\u00a0\u00a0 DATELEFT IS NOT NULL\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 GROUP BY datediff(M, 0, DateLeft)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 )\r\n\u00a0\u00a0 UPDATE\u00a0\u00a0 S\r\n\u00a0\u00a0 SET\u00a0\u00a0\u00a0\u00a0\u00a0 PeopleLeft = CANC_COUNT\r\n\u00a0\u00a0 FROM\u00a0\u00a0\u00a0\u00a0 #subscriptions S\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 INNER JOIN CANCELLATIONS C ON S.theMonth = Dateleft\r\nSET @total = 0\r\n--set up a cursor to update the total subscriptions\r\n--for each month\r\nDECLARE SUBSCRIPTIONS CURSOR FOR\r\nSELECT THEMONTH, PEOPLEJOINED, PEOPLELEFT\r\nFROM #SUBSCRIPTIONS ORDER BY THEMONTH\r\nOPEN SUBSCRIPTIONS\r\nFETCH NEXT FROM SUBSCRIPTIONS INTO @the_month, \r\n\u00a0\u00a0\u00a0\u00a0\u00a0 @people_joined, @people_left\r\nWHILE @@FETCH_STATUS = 0 \r\n\u00a0\u00a0 BEGIN\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 SET @total = @total + @people_joined - @people_left \r\n\u00a0\u00a0\u00a0\u00a0\u00a0 UPDATE\u00a0\u00a0 #subscriptions\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 SET\u00a0\u00a0\u00a0\u00a0\u00a0 Subscriptions = @total\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 WHERE\u00a0\u00a0\u00a0 theMonth = @the_month \r\n\u00a0\u00a0\u00a0\u00a0\u00a0 FETCH NEXT FROM SUBSCRIPTIONS \r\n\u00a0\u00a0\u00a0\u00a0\u00a0 INTO @the_month, @people_joined, @people_left\r\n\u00a0\u00a0 END\r\nCLOSE SUBSCRIPTIONS\r\nDEALLOCATE SUBSCRIPTIONS\r\n--the report\r\nSELECT\u00a0\u00a0 dateadd(M, theMonth, 0) ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 PeopleJoined ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 PeopleLeft ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Subscriptions\r\nFROM\u00a0\u00a0\u00a0\u00a0 #subscriptions\r\nORDER BY THEMONTH\r\nDROP TABLE #subscriptions\r\n<\/pre>\n<p class=\"caption\">Listing 7: The compromise solution<\/p>\n<p>When this &#8220;compromise&#8221; solution was tested, it ran in 74 ms against 10,000 rows and in 847 ms against 1 million rows. Still quite a bit slower than Peso&#8217;s solution, but not bad! Even though it uses a cursor for the running the total calculations, the population of the temp table has been improved, and the performance is acceptable.<\/p>\n<p>The important point to remember is that, really, it is the aggregation process that is the most critical. Once you are working with a small data set, the exact manner in which the running total calculation is performed becomes less of an issue, although it will obviously still have some impact.<\/p>\n<p>Table 2 shows how the four solutions stack up:<\/p>\n<table class=\"MsoTableGrid\">\n<tbody>\n<tr>\n<td>\n<p class=\"MsoNormal\"><b>Solution<\/b><\/p>\n<\/td>\n<td>\n<p class=\"MsoNormal\"><b>10,000 Rows<\/b><\/p>\n<\/td>\n<td>\n<p class=\"MsoNormal\"><b>1 Million Rows<\/b><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p class=\"MsoNormal\">Row-by-row (Listing 2)<\/p>\n<\/td>\n<td>\n<p class=\"MsoNormal\">7 seconds<\/p>\n<\/td>\n<td>\n<p class=\"MsoNormal\">13 minutes<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p class=\"MsoNormal\">Faster but still iterative (Listing 3)<\/p>\n<\/td>\n<td>\n<p class=\"MsoNormal\">360 milliseconds<\/p>\n<\/td>\n<td>\n<p class=\"MsoNormal\">3 seconds<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p class=\"MsoNormal\">Peso\u00a0 (Listing 4)<\/p>\n<\/td>\n<td>\n<p class=\"MsoNormal\">0 milliseconds (too fast to measure)<\/p>\n<\/td>\n<td>\n<p class=\"MsoNormal\">300 milliseconds<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p class=\"MsoNormal\">Not quite as fast, but conventional (Listing 7)<\/p>\n<\/td>\n<td>\n<p class=\"MsoNormal\">74 milliseconds<\/p>\n<\/td>\n<td>\n<p class=\"MsoNormal\">840 milliseconds<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2>Summing Up<\/h2>\n<p>The real proof of performance to Peso&#8217;s brilliant solution is that, even when Phil Factor tested it against one million rows, the report ran in milliseconds. Here are some of the important points:<\/p>\n<ul>\n<li>Avoid row-by-row processing, unless on just a handful of rows<\/li>\n<li>Pass through the data as few times as possible, preferably just once<\/li>\n<li>Use <span class=\"STCodeinTextChar\">DATEDIFF<\/span> to convert the date to an integer if the day of the month is not needed<\/li>\n<li>Minimize calculations whenever possible by pre-aggregating to a small number of rows first<\/li>\n<li>Use <span class=\"STCodeinTextChar\">UNPIVOT<\/span> and <span class=\"STCodeinTextChar\">CASE<\/span> to realign the columns<\/li>\n<li>Use a variable, instead of a cursor, to calculate running totals<\/li>\n<li>The <span class=\"STCodeinTextChar\">OUTPUT<\/span> clause can be used to display data during an update instead of running an additional <span class=\"STCodeinTextChar\">SELECT<\/span> statement<\/li>\n<\/ul>\n<p>At the very least, I hope you have learned that there can be many ways to come up with the correct answer, but not all solutions are created equal.<\/p>\n<p>The fastest solution presented in this article has reached perfection in terms of performance and scalability. Is it always worth attempting to achieve perfection? Even the SQL Server query optimizer comes up with a plan that is &#8220;good enough&#8221; and not necessarily perfect. But some techniques perform so poorly on large datasets that learning how to avoid them will pay tremendous dividends.<\/p>\n<\/div>\n<h2>\u00a0<\/h2>\n<h2>\u00a0<\/h2>\n<div>\u00a0<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Phil Factor&#8217;s SQL Speed Phreak challenge is an event where coders battle to produce the fastest code to solve a common reporting problem on large data sets. It isn&#8217;t that easy on the spectators, since the programmers don&#8217;t score extra points for commenting their code. Mercifully, Kathi is on hand to explain some of the TSQL coding secrets that go to producing blistering performance. &hellip;<\/p>\n","protected":false},"author":110218,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143529],"tags":[5113,4206,4357,5112,4150,5111,4190],"coauthors":[11292],"class_list":["post-793","post","type-post","status-publish","format-standard","hentry","category-performance-sql-server","tag-kathi-kellenberger","tag-performance","tag-phil-factor","tag-speed","tag-sql","tag-sql-performance","tag-tsql"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/793","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/110218"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=793"}],"version-history":[{"count":5,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/793\/revisions"}],"predecessor-version":[{"id":75848,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/793\/revisions\/75848"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=793"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=793"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=793"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=793"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}