Back in 2009 and 2010, Phil Factor hosted the SQL Speed Phreak Competition, in which programmers were challenged to construct the fastest solution to a series of common programming challenges. Unfailingly, the best solutions used set-based T-SQL techniques to achieve lightning-fast execution times, even when the data load scaled to millions of rows. Although fast, the solutions were often complex and hard to understand, and occasionally used slightly unorthodox techniques.
The Simple-Talk editors called in “Aunt Kathi” to help. My job was to explain as simply as I could the mechanics of these solutions, and so make them more accessible to developers who faced similar business problems, and needed a faster and more scalable solution than more conventional “row by row” processing techniques could offer. In each case, I also tried to offer an alternative solution that “learned” from the winning solution, but was simpler and therefore easier to maintain, but inevitably compromised a little on speed.
Skip forward a few years, and SQL Server 2012 has now introduced many powerful new T-SQL window functions and enhancements. Could they offer a simpler alternative to the original solutions that would match or perhaps even beat them on speed?
My previous article revisited the running total problem, where I proved that a solution that used the OVER clause and a ‘sliding window’ of data, with some aggregate functions, performed identically to the original solution, which relied on the unorthodox “query update” technique.
This article revisits the ‘FIFO (First In, First Out) Stock Inventory Problem’, a somewhat-simplified, but still realistic inventory accounting problem, where stock items are either purchased, sold, or returned, and where the oldest items in the queue of stock items (those that were the first to be purchased or returned) must be the first items sold. The requirement is to calculate the remaining count, and value, of each item after all stock transactions are performed.
The original winning solution from Dave Ballantyne is brilliant, but a brain twister for the uninitiated. It uses one statement comprised of several common table expressions (CTEs) and includes a couple of CROSS APPLY
type joins. Was there a simpler but equally-speedy window function-based solution? Let’s find out.
Understanding the FIFO Stock Inventory Problem
In this puzzle, stock items are purchased ( IN), returned ( RET), and sold ( OUT). Each type of stock item is identified by an ArticleID
. Each movement of stock in or out of the warehouse, due to a purchase, sale or return of a given type of stock, results in a row being added to the Stock
table, uniquely identified by a value in the StockID
identity column, and describing how many items of that type of stock were added or removed, the price for purchases, the date of the stock transaction, and so on.
The objective is to calculate the current value of the stock, by ArticleID
, after all transactions are complete. Prices are available only when items are purchased. If an item is returned, the most recent purchase price is used. Items are sold on a “first in, first out” basis, and the price to use is the price paid when the item was added to the inventory.
The following table shows an extract of data from the Stock
table, all relating to a single type of stock, and hopefully illustrating how the rules work.
ArticleID | TranDate | TranCode | Items | Price | Current Count | Current Value | What Happened? |
1001 | 2016-01-01 | IN | 10 | $100 | 10 | $1000 | Added 10 at $100 |
1001 | 2016-01-02 | IN | 30 | $101 | 40 | $4030 | Added 30 at $101 |
1001 | 2016-01-03 | OUT | 25 | — | 15 | $1515 | Added 10 at $100 and 15 at $101 |
1001 | 2016-01-04 | RET | 10 | — | 25 | $2525 | Added 10 at $101 |
1001 | 2016-01-05 | IN | 9 | $99 | 34 | $3416 | Added 9 at $99 |
1001 | 2016-01-06 | OUT | 22 | — | 12 | $1194 | Sold 22 at $101 |
The key to solving this puzzle is understanding that what is left after the transactions are complete is all that matters. You do not need to step through the transactions or even calculate the price at which items were sold.
I like to compare this puzzle to the common habit of cleaning out the spare change from one’s pockets at the end of each day. Instead of a change jar, imagine that you had a change queue on the dresser. Each night, you add the day’s change to the end of the queue. The next morning, you take some coins from the front of the queue if you think you would need them that day. If you want to know the current value of the coins, you only have to look at the existing coins in the queue to calculate the value. You do not need to know about the coins that have been previously removed. You only need to look at each remaining coin, figure out the value of each one, and add them up.
The FIFO Stock Inventory Problem is quite a bit more complex than coins on a dresser, but you still must concentrate on how many of each ArticleID
are left and the price when each item was added to stock. Here are the steps to solving the puzzle:
- Calculate a final count of the items by
ArticleID
. This is easily done with an aggregate query. At the end of the transactions, there are 12 items left ofArticleID
1001 in the example above. - Use a reverse running total of purchases and returns to figure out which transactions are responsible for the remaining items. A reverse running total is used because the purchases and returns at the end are responsible for the remaining stock.
- Find the date at which the reverse running total equals or exceeds the amount remaining at the end of all transactions. On 2016-01-04 the reverse running total, 19, exceeded the number remaining, 12.
- Sum the items from all transactions after that date, in this case the transaction on 2016-01-06, which was 9 items.
- On the date calculated in step 3, figure out how many of the items added to stock should be counted. This is the difference between the items remaining (12) and the previous running total (9). One way to calculate the previous reverse running total is by subtracting the items purchased or returned from the current reverse running total. Then the formula becomes Items Remaining – (Reverse Running Total – Items Purchased or Returned). For this example, that would be 12 – (19 – 10) which resolves to 3.
- Multiply remaining items by the price. For returned items, the price must be calculated. In the example above, the ten items returned are priced at $101 because that was the last purchase price.
ArticleID | TrandDate | TranCode | Items | Price | Reverse Running Total | Source of Remaining Items |
1001 | 2016-01-01 | IN | 10 | $100 | 59 | |
1001 | 2016-01-02 | IN | 30 | $101 | 49 | |
1001 | 2016-01-04 | RET | 10 | $101 | 19 | 3 |
1001 | 2016-01-05 | IN | 9 | $99 | 9 | 9 |
Dave Ballantyne’s Original Winning Solution
Listing 1 shows Dave Ballantyne’s original solution. As part of his solution, he added several indexes, which was not against the rules. Also, evidently, creating the indexes didn’t count against the time to run the solution. On my laptop with 8 GB RAM, 4 procs and an SSD drive, the query took 1 second to run.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 |
CREATENONCLUSTERED INDEX IX_Dave_General ON dbo.Stock ( ArticleIDASC, TranDateDESC, TranCodeASC ) INCLUDE ( Items, Price); GO CREATENONCLUSTERED INDEX IX_Dave_Items ON dbo.Stock ( ArticleID ASC, TranDate ASC ) INCLUDE (Items) WHERE (TranCode IN('IN','RET')); CREATENONCLUSTERED INDEX IX_Dave_Price ON dbo.Stock ( ArticleID ASC, TranDate ASC ) INCLUDE ( Price) WHERE (TranCode='IN'); GO WITH cteStockSum AS (SELECT ArticleID , SUM(CASEWHEN TranCode ='OUT' THEN 0 - Items ELSE Items END)AS TotalStock FROM dbo.Stock GROUP BY ArticleID ), cteReverseInSum AS (SELECT s.ArticleID, s.TranDate, ( SELECT SUM(i.Items) FROM dbo.StockAS i WITH( INDEX( IX_Dave_Items) ) WHERE i.ArticleID= s.ArticleID AND i.TranCodeIN ( 'IN','RET' ) AND i.TranDate>= s.TranDate ) AS RollingStock, s.Items AS ThisStock FROM dbo.StockAS s WHERE s.TranCodeIN ( 'IN','RET' ) ), /* Using the rolling balance above find the first stock movement in that meets (or exceeds) our required stock level */ /* and calculate how much stock is required from the earliest stock in */ cteWithLastTranDate AS (SELECT w.ArticleID, w.TotalStock, LastPartialStock.TranDate, LastPartialStock.StockToUse, LastPartialStock.RunningTotal, w.TotalStock- LastPartialStock.RunningTotal + LastPartialStock.StockToUseAS UseThisStock FROM cteStockSum AS w CROSS APPLY ( SELECTTOP ( 1) z.TranDate, z.ThisStockAS StockToUse , z.RollingStockAS RunningTotal FROM cteReverseInSum AS z WHERE z.ArticleID= w.ArticleID AND z.RollingStock>= w.TotalStock ORDER BY z.TranDateDESC ) AS LastPartialStock ) /* Sum up the cost of 100% of the stock movements in after the returned stockid and for that stockid we need 'UseThisStock' items' */ SELECT y.ArticleID, y.TotalStockAS CurrentItems , SUM(CASEWHEN e.TranDate= y.TranDateTHEN y.UseThisStock ELSE e.Items END * Price.Price)AS CurrentValue FROM cteWithLastTranDate AS y INNER JOIN dbo.StockAS e WITH( INDEX( IX_Dave_Items) ) ON e.ArticleID= y.ArticleID AND e.TranDate>= y.TranDate AND e.TranCodeIN ( 'IN','RET' ) CROSS APPLY ( /* Find the Price of the item in */SELECT TOP( 1 ) p.Price FROM dbo.StockAS p WITH (INDEX ( IX_Dave_Price ) ) WHERE p.ArticleID= e.ArticleID AND p.TranDate<= e.TranDate AND p.TranCode= 'IN' ORDER BY p.TranDate DESC ) AS Price GROUP BY y.ArticleID , y.TotalStock ORDER BY y.ArticleID; |
Listing 1: Dave Ballantyne’s winning code
You can find a complete description of this solution in my original article, but I’ll summarize the highlights. The solution starts with an aggregate query in cteStockSum
, calculating the total remaining stock ( TotalStock
) of each ArticleID
, after all transactions have completed. It then uses a correlated sub-query to find the reverse running total of articles purchased or returned, in cteReverseInSum
.
Having calculated the final count and the reverse running total, you can find the date where the number of items at the end of the queue meets or exceeds the TotalStock
amount. This is done in the cteLastTranDte
CTE, using CROSS APPLY
and TOP(
1)
. On that date, you may need all of the items or just some of the items.
By using CROSS APPLY
, all columns from the row can be returned. Dave takes advantage of this and figures out how many items are needed on that date. In the outer query, the value for each row is calculated by multiplying the items by the price. The query in the CROSS APPLY
returns the price for the latest purchase, which is then the price applied for any the returns in the remaining stock.
The Aunt Kathi Window Function Method
There is no one tool for the job when working with T-SQL, and using windowing functions is not an exception to this rule. I systematically tried replacing each of Dave’s calculations with the equivalent calculation using window function, but occasionally the performance suffered badly as a result.
Finally, I settled on the code shown in Listing 2, which is sort of a hybrid approach. On my laptop, the query took 2 seconds to run. I was never able to match the 1-second execution time of Dave’s original code.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
WITH ItemEndTotal AS (SELECT ArticleID , SUM(CASEWHEN TranCode ='OUT' THEN-Items ELSE Items END)AS FinalCount FROM Stock GROUP BY ArticleID ), ReverseRunningTotal AS (SELECT StockID , ArticleID , TranCode , TranDate , Items , Price , SUM(Items)OVER (PARTITION BY ArticleID ORDER BY TranDate ROWS BETWEEN CURRENTROW ANDUNBOUNDED FOLLOWING ) ReverseTotal FROM Stock WHERE TranCode IN( 'IN','RET' ) ), FindDate AS (SELECT DISTINCT T.ArticleID, FinalCount , LAST_VALUE(TranDate)OVER (PARTITION BY P.ArticleID ORDER BY TranDate ROWS BETWEEN CURRENTROW ANDUNBOUNDED FOLLOWING )AS TheDate FROM ItemEndTotal AS T JOIN ReverseRunningTotal AS P ON T.ArticleID= P.ArticleID AND P.ReverseTotal>= T.FinalCount ) SELECT RRT.ArticleID, FinalCount , SUM(CASEWHEN TheDate = TranDate THEN FinalCount- ( ReverseTotal - Items ) ELSE Items END * PurchasePrice)AS Value FROM ReverseRunningTotal RRT JOIN FindDate ON RRT.ArticleID= FindDate.ArticleID CROSS APPLY (SELECT TOP( 1 ) Price AS PurchasePrice FROM ReverseRunningTotal AS R WHERE RRT.ArticleID= R.ArticleID AND TranCode= 'IN' AND R.TranDate<= RRT.TranDate ORDER BY TranDate DESC ) AS P WHERE RRT.TranDate>= TheDate GROUP BY RRT.ArticleID , FinalCount ORDER BY RRT.ArticleID; |
Listing 2: The Aunt Kathi FIFO using window functions
I started with an aggregate query in a CTE, ItemEndTotal
, which provides a final count of each ArticleID
. I could have eliminated the ItemEndTotal
CTE and instead used a windowing aggregate function in the second CTE, as follows:
1 |
SUM(CASE WHEN TranCode= 'OUT'THEN -ItemsELSE Items END)OVER(PARTITIONBY P.ArticleID) |
This, unfortunately, caused the performance to suffer so I abandoned that idea.
In the ReverseRunningTotal
CTE, I used an accumulating window aggregate to calculate the reverse running total. If you use this technique, be sure to specify the frame ( ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
for the best performance. When running this step in isolation, the windowing function method out performs Dave’s method.
To find the date at which the reverse running total equals or exceeds the amount remaining at the end of all transactions, I used the LAST_ROW
function. The LAST_ROW
function allows you to access any column in the last row of a partition. I could have used the LAST_ROW
function multiple times to grab all of the other columns needed from that row and then calculated how many stock items would be needed on the date. However, nesting the LAST_ROW
expressions in a calculation made this part of the query extremely complex. Instead, I just performed the calculation in the outer query which ended up being much simpler. However, I believe this step is where my process took extra time over Dave’s solution. I needed to return only one row for each ArticleID
, so I had to use DISTINCT
. Using GROUP BY
would not have worked in this situation because aggregation happens before windowing functions. Nevertheless, the performance was still acceptable, so I left this windowing function in place.
The outer query also uses a CROSS APPLY
to find the prices for the returned items, which I just copied from Dave’s code. There is a way to use windowing functions to find the prices; Listing 3 shows a sample of the alternative code.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
WITH GetMaxID AS (SELECT ArticleID , TranDate , Price , TranCode , MAX(CASEWHEN TranCode ='IN' THEN StockID END)OVER (PARTITION BY ArticleID ORDER BY TranDate ROWS BETWEEN UNBOUNDEDPRECEDING ANDCURRENT ROW) AS MaxID FROM Stock WHERE TranCode IN( 'RET','IN' ) ) SELECT ArticleID , TranDate , Price , TranCode , MAX(Price)OVER (PARTITION BY ArticleID, MaxID) AS ThePrice FROM GetMaxID; |
Listing 3: An alternative way to find the price for returned items in the remaining stock
This is a technique I learned from the great Itzik Ben-Gan. In the first level, I am using MAX
as an accumulating aggregate. I find the maximum StockID
for purchases only up to the current row. This means that the MaxID
for each return row will be the StockID
of the previous purchase row. In the outer query, by partitioning by MaxID
, the maximum price returned is the price I need. The technique is elegant but, in my tests, it slowed the query down quite a bit. The data had to be filtered first by the TranCode
and then sorted by ArticleID
and TranDate
, so there is no way to create a good index for the query. Sorting is the bottleneck, and using this technique ends up adding several seconds to the overall time.
Conclusion
I still remember staring at Dave’s winning code for hours back in 2010 trying to figure out how his solution even worked. Eventually, I figured out that the stock items left, not the in and out transactions from the beginning, were what was important. In this article, I was able to apply windowing functions every step of the way. Unfortunately, I had to back out in a couple of places to get acceptable performance.
In my experience, windowing functions are great tools to help you solve complex business problems. Once you begin using them, you may find that you start coming up with set based solutions more quickly. Your queries will be easier to understand and maintain. Often, they will perform better than queries using traditional methods. Like anything else, you must test each solution to make sure that the performance is acceptable.
Load comments