T-SQL Window Function Speed Phreakery: The FIFO Stock Inventory Problem

Sometimes, in the quest for raw SQL performance, you are forced to sacrifice legibility and maintainability of your code, unless you then document your code lavishly. Phil Factor's SQL Speed Phreak challenge produced some memorable code, but can SQL features introduced since then help to produce code that performs as well and is also easy to understand? Aunty Kathi investigates.

Back in 2009 and 2010, Phil Factor hosted the SQL Speed Phreak Competition, in which programmers were challenged to construct the fastest solution to a series of common programming challenges. Unfailingly, the best solutions used set-based T-SQL techniques to achieve lightning-fast execution times, even when the data load scaled to millions of rows. Although fast, the solutions were often complex and hard to understand, and occasionally used slightly unorthodox techniques.

The Simple-Talk editors called in “Aunt Kathi” to help. My job was to explain as simply as I could the mechanics of these solutions, and so make them more accessible to developers who faced similar business problems, and needed a faster and more scalable solution than more conventional “row by row” processing techniques could offer. In each case, I also tried to offer an alternative solution that “learned” from the winning solution, but was simpler and therefore easier to maintain, but inevitably compromised a little on speed.

Skip forward a few years, and SQL Server 2012 has now introduced many powerful new T-SQL window functions and enhancements. Could they offer a simpler alternative to the original solutions that would match or perhaps even beat them on speed?

My previous article revisited the running total problem, where I proved that a solution that used the OVER clause and a ‘sliding window’ of data, with some aggregate functions, performed identically to the original solution, which relied on the unorthodox “query update” technique.

This article revisits the ‘FIFO (First In, First Out) Stock Inventory Problem’, a somewhat-simplified, but still realistic inventory accounting problem, where stock items are either purchased, sold, or returned, and where the oldest items in the queue of stock items (those that were the first to be purchased or returned) must be the first items sold. The requirement is to calculate the remaining count, and value, of each item after all stock transactions are performed.

The original winning solution from Dave Ballantyne is brilliant, but a brain twister for the uninitiated. It uses one statement comprised of several common table expressions (CTEs) and includes a couple of CROSS APPLY type joins. Was there a simpler but equally-speedy window function-based solution? Let’s find out.

Understanding the FIFO Stock Inventory Problem

In this puzzle, stock items are purchased ( IN), returned ( RET), and sold ( OUT). Each type of stock item is identified by an ArticleID. Each movement of stock in or out of the warehouse, due to a purchase, sale or return of a given type of stock, results in a row being added to the Stock table, uniquely identified by a value in the StockID identity column, and describing how many items of that type of stock were added or removed, the price for purchases, the date of the stock transaction, and so on.

The objective is to calculate the current value of the stock, by ArticleID, after all transactions are complete. Prices are available only when items are purchased. If an item is returned, the most recent purchase price is used. Items are sold on a “first in, first out” basis, and the price to use is the price paid when the item was added to the inventory.

The following table shows an extract of data from the Stock table, all relating to a single type of stock, and hopefully illustrating how the rules work.

ArticleID TranDate TranCode Items Price Current Count Current Value What Happened?
1001 2016-01-01 IN 10 $100 10 $1000 Added 10 at $100
1001 2016-01-02 IN 30 $101 40 $4030 Added 30 at $101
1001 2016-01-03 OUT 25 15 $1515 Added 10 at $100 and 15 at $101
1001 2016-01-04 RET 10 25 $2525 Added 10 at $101
1001 2016-01-05 IN 9 $99 34 $3416 Added 9 at $99
1001 2016-01-06 OUT 22 12 $1194 Sold 22 at $101

The key to solving this puzzle is understanding that what is left after the transactions are complete is all that matters. You do not need to step through the transactions or even calculate the price at which items were sold.

I like to compare this puzzle to the common habit of cleaning out the spare change from one’s pockets at the end of each day. Instead of a change jar, imagine that you had a change queue on the dresser. Each night, you add the day’s change to the end of the queue. The next morning, you take some coins from the front of the queue if you think you would need them that day. If you want to know the current value of the coins, you only have to look at the existing coins in the queue to calculate the value. You do not need to know about the coins that have been previously removed. You only need to look at each remaining coin, figure out the value of each one, and add them up.

The FIFO Stock Inventory Problem is quite a bit more complex than coins on a dresser, but you still must concentrate on how many of each ArticleID are left and the price when each item was added to stock. Here are the steps to solving the puzzle:

  1. Calculate a final count of the items by ArticleID. This is easily done with an aggregate query. At the end of the transactions, there are 12 items left of ArticleID 1001 in the example above.
  2. Use a reverse running total of purchases and returns to figure out which transactions are responsible for the remaining items. A reverse running total is used because the purchases and returns at the end are responsible for the remaining stock.
  3. Find the date at which the reverse running total equals or exceeds the amount remaining at the end of all transactions. On 2016-01-04 the reverse running total, 19, exceeded the number remaining, 12.
  4. Sum the items from all transactions after that date, in this case the transaction on 2016-01-06, which was 9 items.
  5. On the date calculated in step 3, figure out how many of the items added to stock should be counted. This is the difference between the items remaining (12) and the previous running total (9). One way to calculate the previous reverse running total is by subtracting the items purchased or returned from the current reverse running total. Then the formula becomes Items Remaining – (Reverse Running Total – Items Purchased or Returned). For this example, that would be 12 – (19 – 10) which resolves to 3.
  6. Multiply remaining items by the price. For returned items, the price must be calculated. In the example above, the ten items returned are priced at $101 because that was the last purchase price.
ArticleID TrandDate TranCode Items Price Reverse Running Total Source of Remaining Items
1001 2016-01-01 IN 10 $100 59  
1001 2016-01-02 IN 30 $101 49  
1001 2016-01-04 RET 10 $101 19 3
1001 2016-01-05 IN 9 $99 9 9

Dave Ballantyne’s Original Winning Solution

Listing 1 shows Dave Ballantyne’s original solution. As part of his solution, he added several indexes, which was not against the rules. Also, evidently, creating the indexes didn’t count against the time to run the solution. On my laptop with 8 GB RAM, 4 procs and an SSD drive, the query took 1 second to run.

Listing 1: Dave Ballantyne’s winning code

You can find a complete description of this solution in my original article, but I’ll summarize the highlights. The solution starts with an aggregate query in cteStockSum, calculating the total remaining stock ( TotalStock) of each ArticleID, after all transactions have completed. It then uses a correlated sub-query to find the reverse running total of articles purchased or returned, in cteReverseInSum.

Having calculated the final count and the reverse running total, you can find the date where the number of items at the end of the queue meets or exceeds the TotalStock amount. This is done in the cteLastTranDte CTE, using CROSS APPLY and TOP( 1). On that date, you may need all of the items or just some of the items.

By using CROSS APPLY, all columns from the row can be returned. Dave takes advantage of this and figures out how many items are needed on that date. In the outer query, the value for each row is calculated by multiplying the items by the price. The query in the CROSS APPLY returns the price for the latest purchase, which is then the price applied for any the returns in the remaining stock.

The Aunt Kathi Window Function Method

There is no one tool for the job when working with T-SQL, and using windowing functions is not an exception to this rule. I systematically tried replacing each of Dave’s calculations with the equivalent calculation using window function, but occasionally the performance suffered badly as a result.

Finally, I settled on the code shown in Listing 2, which is sort of a hybrid approach. On my laptop, the query took 2 seconds to run. I was never able to match the 1-second execution time of Dave’s original code.

Listing 2: The Aunt Kathi FIFO using window functions

I started with an aggregate query in a CTE, ItemEndTotal, which provides a final count of each ArticleID. I could have eliminated the ItemEndTotal CTE and instead used a windowing aggregate function in the second CTE, as follows:

This, unfortunately, caused the performance to suffer so I abandoned that idea.

In the ReverseRunningTotal CTE, I used an accumulating window aggregate to calculate the reverse running total. If you use this technique, be sure to specify the frame ( ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) for the best performance. When running this step in isolation, the windowing function method out performs Dave’s method.

To find the date at which the reverse running total equals or exceeds the amount remaining at the end of all transactions, I used the LAST_ROW function. The LAST_ROW function allows you to access any column in the last row of a partition. I could have used the LAST_ROW function multiple times to grab all of the other columns needed from that row and then calculated how many stock items would be needed on the date. However, nesting the LAST_ROW expressions in a calculation made this part of the query extremely complex. Instead, I just performed the calculation in the outer query which ended up being much simpler. However, I believe this step is where my process took extra time over Dave’s solution. I needed to return only one row for each ArticleID, so I had to use DISTINCT. Using GROUP BY would not have worked in this situation because aggregation happens before windowing functions. Nevertheless, the performance was still acceptable, so I left this windowing function in place.

The outer query also uses a CROSS APPLY to find the prices for the returned items, which I just copied from Dave’s code. There is a way to use windowing functions to find the prices; Listing 3 shows a sample of the alternative code.

Listing 3: An alternative way to find the price for returned items in the remaining stock

This is a technique I learned from the great Itzik Ben-Gan. In the first level, I am using MAX as an accumulating aggregate. I find the maximum StockID for purchases only up to the current row. This means that the MaxID for each return row will be the StockID of the previous purchase row. In the outer query, by partitioning by MaxID, the maximum price returned is the price I need. The technique is elegant but, in my tests, it slowed the query down quite a bit. The data had to be filtered first by the TranCode and then sorted by ArticleID and TranDate, so there is no way to create a good index for the query. Sorting is the bottleneck, and using this technique ends up adding several seconds to the overall time.

Conclusion

I still remember staring at Dave’s winning code for hours back in 2010 trying to figure out how his solution even worked. Eventually, I figured out that the stock items left, not the in and out transactions from the beginning, were what was important. In this article, I was able to apply windowing functions every step of the way. Unfortunately, I had to back out in a couple of places to get acceptable performance.

In my experience, windowing functions are great tools to help you solve complex business problems. Once you begin using them, you may find that you start coming up with set based solutions more quickly. Your queries will be easier to understand and maintain. Often, they will perform better than queries using traditional methods. Like anything else, you must test each solution to make sure that the performance is acceptable.