{"id":1719,"date":"2013-10-31T00:00:00","date_gmt":"2013-10-31T00:00:00","guid":{"rendered":"https:\/\/test.simple-talk.com\/uncategorized\/window-functions-in-sql\/"},"modified":"2021-09-29T16:21:41","modified_gmt":"2021-09-29T16:21:41","slug":"window-functions-in-sql","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/t-sql-programming-sql-server\/window-functions-in-sql\/","title":{"rendered":"Window Functions in SQL"},"content":{"rendered":"<div id=\"pretty\">\n<p class=\"start\">Windowing functions were added to the ANSI\/ISO Standard SQL:2003 and then extended in ANSI\/ISO Standard SQL:2008. Microsoft was late to this game. DB2, Oracle, Sybase, PostgreSQL and other products have had full implementations for years. SQL Server did not catch up until SQL 2012. Better late than never.<\/p>\n<p>Here is a skeleton table we can use for the rest of this article. It has the personnel, their salary amounts and the department to which they are assigned.<\/p>\n<pre>\tCREATE TABLE Personnel_Assignments\r\n\t(emp_name VARCHAR(10) NOT NULL, \r\n\t\u00a0dept_name VARCHAR(15) NOT NULL\r\n\t\u00a0PRIMARY KEY (emp_name, dept_name), \r\n\t\u00a0salary_amt DECIMAL (8,2) NOT NULL\r\n\t\u00a0CHECK (salary_amt &gt; 0.00));\r\n\t<\/pre>\n<p>Load it with dummy data.<\/p>\n<pre>\tINSERT INTO Personnel_Assignments\r\n\tVALUES\r\n\t('Aaron', 'acct', 3000.00), \r\n\t('Abaddon', 'acct', 3000.00), \r\n\t('Abbott', 'acct', 3000.00), \r\n\t('Abel', 'acct', 3500.00), \r\n\t('Absalom', 'acct', 5500.00), \r\n\t\r\n\t('Shannen', 'ship', 1000.00), \r\n\t('Shannon', 'ship',2000.00), \r\n\t('Shaquille', 'ship', 3000.00), \r\n\t('Sheamus', 'ship', 4000.00), \r\n\t('Shelah', 'ship', 3000.00), \r\n\t('Shelby', 'ship', 4500.00), \r\n\t('Sheldon', ship', 5500.00), \r\n\t\r\n\t('Hamilton', 'HR',2300.00), \r\n\t('Hamish', 'HR', 1000.00), \r\n\t('Hamlet', 'HR', 1200.00), \r\n\t('Hammond', 'HR', 800.00), \r\n\t('Hamuel', 'HR', 700.00), \r\n\t('Hanael', 'HR', 600.00), \r\n\t('Hanan', 'HR', 1000.00);\r\n\t<\/pre>\n<p>Just as people annoyingly say &#8220;Sequel&#8221; instead of &#8220;S-Q-L&#8221;, the window functions usually get called &#8220;over functions&#8221; because the set of rows to which these functions apply are defined using the syntax:<\/p>\n<pre>\t&lt;window function&gt; OVER\r\n\t\u00a0([PARTITION BY &lt;expression list&gt;]\r\n\t\u00a0[ORDER BY &lt;expression [ASC|DESC] list&gt;]\r\n\t\u00a0[ROWS|RANGE &lt;window frame&gt;])\r\n\r\n<\/pre>\n<p>Partition is a set theory concept which we have see in the <code>GROUP BY<\/code> clause of the <code>SELECT..FROM<\/code>.. statement. Think of it as a local version of grouping.<\/p>\n<p>But we start losing naive set theory after that. The<code> ORDER BY<\/code> is the ordering concept we have seen in cursors done within a partition. I like to think it came from von Neuman&#8217;s definition of ordinal numbers using nested sets. I was a math major and it shows. Some functions are easier to define with an ordering than with von Neuman&#8217;s ordinals.<\/p>\n<p>The<code> [ROW|RANGE]<\/code> subclause is trickier. It is a local cursor. Yes, Celko said cursor. We start with the current row where the window function is invoked. This is where the imaginary read head is resting.<\/p>\n<p>This subclause defines a frame extent of ordered rows that precede or follow the current row. Let&#8217;s do this step by step:<\/p>\n<pre>\t&lt;ROW or RANGE clause&gt; ::=\r\n\t{ROWS | RANGE} &lt;window frame extent&gt;\r\n\t<\/pre>\n<p>There is a subtle difference between <code>ROWS<\/code> and RANGE. <code>ROWS<\/code> uses a count of the rows and it is the most common usage.<code> RANGE<\/code> use a count of distinct values. Think of a <code>SELECT DISTINCT<\/code> operation. None of this makes sense without an <code>ORDER BY<\/code> subclause.<\/p>\n<p>The next parts of the sub clause specify the rows preceding and the rows following the current row. If no following value is given, the default is the current row.<\/p>\n<pre>\t&lt;window frame extent&gt; ::= \r\n\t{&lt;window frame preceding&gt;\r\n\t\u00a0| &lt;window frame between&gt;}\r\n\t\r\n\t&lt;window frame between&gt; ::= \r\n\t\u00a0BETWEEN &lt;window frame preceding&gt; AND &lt;window frame following&gt;\r\n\r\n<\/pre>\n<p>The keyword <code>BETWEEN<\/code> is used here with the expected meaning; the<code> &lt;window frame preceding&gt; r<\/code>ows precede the <code>&lt;window frame following&gt; <\/code>rows.<\/p>\n<pre>\t&lt;window frame bound&gt; ::= \r\n\t{&lt;window frame preceding&gt;\r\n\t\u00a0| &lt;window frame following&gt;}\r\n\t\r\n\t&lt;window frame preceding&gt; ::= \r\n\t{UNBOUNDED PRECEDING\r\n\t\u00a0| &lt;unsigned_value_specification&gt; PRECEDING\r\n\t\u00a0| current_row}\r\n\r\n<\/pre>\n<p>The keyword <code>UNBOUNDED PRECEDING<\/code> is a shorthand for the start of the partition.<code> current_row<\/code> explains itself. You can also provide a constant count of rows with the <code>PRECEDING<\/code> option.<\/p>\n<pre>\t&lt;window frame following&gt; ::= \r\n\t{UNBOUNDED FOLLOWING\r\n\t\u00a0| &lt;unsigned_integer&gt; FOLLOWING\r\n\t\u00a0| current_row}\r\n\r\n<\/pre>\n<p>The keyword<code> UNBOUNDED FOLLOWING<\/code> is a shorthand for the end of the partition. <code>current_row <\/code>explains itself. You can also provide a constant count of rows with the<code> FOLLOWING<\/code> option.<\/p>\n<p>The most common usages are for running totals:<\/p>\n<pre>\tOVER (PARTITION BY .. ORDER BY ..\r\n\t\u00a0ROWS BETWEEN UNBOUNDED PRECEDING\r\n\t\u00a0AND CURRENT_ROW) \r\n\t<\/pre>\n<p>and for the entire partition as a unit:<\/p>\n<pre>\tOVER (PARTITION BY .. ORDER BY ..\r\n\t\u00a0ROWS BETWEEN UNBOUNDED PRECEDING\r\n\t\u00a0AND UNBOUNDED FOLLOWING) \r\n\t<\/pre>\n<h2>Aggregate Functions<\/h2>\n<p>The <code>&lt;window function&gt;<\/code> can be a common aggregate function, which include <code>SUM(), AVG(), MIN(), MAX()<\/code> and\u00a0<code>COUNT()<\/code>. By default the <code>PARTITION BY <\/code>subclause is always implied. When it is not explicitly given, the whole result set in the<code> FROM<\/code> clause of the query is used as the partition. For those common aggregate function, the<code> ORDER BY<\/code> subclause makes no sense\u00a0when you\u2019re dealing with the whole window . The computations are commutative; remember that word from High School algebra? The results do not depend on ordering.\u00a0However, if I\u2019m looking at a window defined by a ROW or RANGE clause, then the ORDER BY clause can be very important.\u00a0<\/p>\n<p>For example, let&#8217;s take a log of bids in an auction. Each bidder gets to make one bid daily. His bid has to be more than zero dollars.<\/p>\n<pre class=\"lang:tsql decode:true \">CREATE TABLE Auction_log\r\n(bid_date DATE DEFAULT CURRENT_TIMESTAMP NOT NULL,\u00a0 bidder_nbr CHAR(15) NOT NULL,\u00a0 PRIMARY KEY (bid_date, bidder_nbr),\u00a0 bid_amount DECIMAL (8,2) NOT NULL\r\n\u00a0\u00a0 CHECK (bid_amount &gt; 0.00)\r\n);<\/pre>\n<p>I want to query that shows me what the highest bid amount was on a particular day. The window is created with the order by clause from the start of the log up to whatever the current row is.<\/p>\n<pre class=\"lang:tsql decode:true \">SELECT bid_date,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 MAX(bid_amount)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 OVER (ORDER BY bid_date\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ROWS BETWEEN UNBOUNDED PRECEDING\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 AND CURRENT_ROW) AS high_bid_amount\u00a0 FROM Auction_Log;<\/pre>\n<p>You have to &#8220;nest&#8221; basic aggregate functions with CTEs or derived tables to preserve information. For example, the largest average department salary is done with this query:<\/p>\n<pre>\tWITH X \r\n\tAS\r\n\t(SELECT dept_name, \r\n\t\u00a0AVG(salary_amt) AS dept_salary_avg\r\n\t\u00a0FROM Personnel_Assignments\r\n\t\u00a0GROUP BY dept_name)\r\n\t\r\n\t\u00a0SELECT MAX(dept_salary_avg) FROM X;\r\n\t\r\n\t<\/pre>\n<p>Did you notice I lost the <code>dept_name<\/code> with the basic <code>GROUP BY<\/code>? But I can write:<\/p>\n<pre>\tWITH X1 -- get departmental averages \r\n\tAS\r\n\t(SELECT dept_name, \r\n\t\u00a0AVG(salary_amt) \r\n\t\u00a0AS dept_salary_avg\r\n\t\u00a0FROM Personnel_Assignments\r\n\t\u00a0GROUP BY dept_name), \r\n\t\r\n\tX2 -- use a windowed max() to keep department name\r\n\tAS\r\n\t(SELECT X1.dept_name, X1.dept_salary_avg, \r\n\t\u00a0MAX(X1.dept_salary_avg) \r\n\t\u00a0OVER () AS dept_salary_avg_max\r\n\t\u00a0FROM X1)\r\n\t\r\n\tSELECT X2.dept_name, X2.dept_salary_avg\r\n\t\u00a0FROM X2\r\n\t\u00a0WHERE X2.dept_salary_avg_max = X2.dept_salary_avg;\r\n\r\n<\/pre>\n<h2>Ranking Functions<\/h2>\n<p>The next &lt;window function&gt;s are the ranking functions;<code> ROW_NUMBER(), RANK(), DENSE_RANK() and  NTILE(n)<\/code>. By now, you have probably used <code>ROW_NUMBER() <\/code>or<code> DENSE_RANK()<\/code> to get a column to use for picking subsets from partitions of a result set. The classic example is to find the top (n) salaries by department.<\/p>\n<p>You cannot nest aggregate function inside each other. A function returns a scalar value; that would mean the innermost function would pass a scalar to the containing aggregate, not a set.<\/p>\n<p>Here is an example of a<code> DENSE_RANK() <\/code>that is passed to a containing query that lets you pick the top (n) salaries in each department.<\/p>\n<pre>\tWITH X \r\n\tAS\r\n\t(SELECT emp_name, dept_name, salary_amt, \r\n\t\u00a0DENSE_RANK() \r\n\t\u00a0OVER (PARTITION BY dept_name\r\n\t\u00a0ORDER BY salary_amt DESC) AS salary_rank\r\n\t\u00a0FROM Personnel_Assignments)\r\n\t\r\n\tSELECT X.emp_name, X.dept_name, X.salary_amt\r\n\t\u00a0FROM X\r\n\t\u00a0WHERE X.salary_rank &lt;= @n;\r\n\r\n<\/pre>\n<h2>Analytic Functions<\/h2>\n<p>The third group of <code>&lt;window function&gt;<\/code>s are the Analytic Functions. These are somewhat like the aggregate functions, but they do not make sense without the <code>ORDER BY<\/code>. But rather than doing a computation like the aggregate functions, or an ordinal integer like the ranking functions, these return a value from the partition.<\/p>\n<p>The two simplest such functions are the <code>FIRST_VALUE<\/code> and <code>LAST_VALUE<\/code> analytic functions. The <code>FIRST_VALUE<\/code> function returns the first value from a sorted partition, and the<code> LAST_VALUE <\/code>function returns the last value. In the following example, the <code>SELECT<\/code> statement uses both of these functions to calculate<code> salary_amt<\/code> values in each sales group:<\/p>\n<pre>\tSELECT emp_name, dept_name, salary_amt, \r\n\t\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 FIRST_VALUE(salary_amt) \r\n\t\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 OVER(PARTITION BY dept_name\r\n\t\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ORDER BY salary_amt DESC) \r\n\t\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 AS top_salary, \r\n\t\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 LAST_VALUE(salary_amt) \r\n\t\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 OVER(PARTITION BY dept_name\r\n\t\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ORDER BY salary_amt DESC) \r\n\t\u00a0\u00a0\u00a0\u00a0\u00a0 AS bottom_salary \r\n\t\u00a0FROM Personnel_Assignments;\r\n\t<\/pre>\n<p>The results are:<\/p>\n<table>\n<tbody>\n<tr>\n<td valign=\"top\">\n<p><strong>emp_name<\/strong><\/p>\n<\/td>\n<td valign=\"top\">\n<p><strong>dept_name<\/strong><\/p>\n<\/td>\n<td valign=\"top\">\n<p><strong>salary_amt<\/strong><\/p>\n<\/td>\n<td valign=\"top\">\n<p><strong>top_salary<\/strong><\/p>\n<\/td>\n<td valign=\"top\">\n<p><strong>bottom_salary<\/strong><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>Absalom\u00a0\u00a0<\/p>\n<\/td>\n<td valign=\"top\">\n<p>acct<\/p>\n<\/td>\n<td valign=\"top\">\n<p>5500<\/p>\n<\/td>\n<td valign=\"top\">\n<p>5500<\/p>\n<\/td>\n<td valign=\"top\">\n<p>5500<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>Abel\u00a0\u00a0<\/p>\n<\/td>\n<td valign=\"top\">\n<p>acct<\/p>\n<\/td>\n<td valign=\"top\">\n<p>3500<\/p>\n<\/td>\n<td valign=\"top\">\n<p>5500<\/p>\n<\/td>\n<td valign=\"top\">\n<p>3500<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>Aaron\u00a0<\/p>\n<\/td>\n<td valign=\"top\">\n<p>acct<\/p>\n<\/td>\n<td valign=\"top\">\n<p>3000<\/p>\n<\/td>\n<td valign=\"top\">\n<p>5500<\/p>\n<\/td>\n<td valign=\"top\">\n<p>3000<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>Abaddon<\/p>\n<\/td>\n<td valign=\"top\">\n<p>acct<\/p>\n<\/td>\n<td valign=\"top\">\n<p>3000<\/p>\n<\/td>\n<td valign=\"top\">\n<p>5500<\/p>\n<\/td>\n<td valign=\"top\">\n<p>3000<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>Abbott<\/p>\n<\/td>\n<td valign=\"top\">\n<p>acct<\/p>\n<\/td>\n<td valign=\"top\">\n<p>3000<\/p>\n<\/td>\n<td valign=\"top\">\n<p>5500<\/p>\n<\/td>\n<td valign=\"top\">\n<p>3000<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>Hamilton<\/p>\n<\/td>\n<td valign=\"top\">\n<p>HR<\/p>\n<\/td>\n<td valign=\"top\">\n<p>2300<\/p>\n<\/td>\n<td valign=\"top\">\n<p>2300<\/p>\n<\/td>\n<td valign=\"top\">\n<p>2300<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>Hamlet<\/p>\n<\/td>\n<td valign=\"top\">\n<p>HR<\/p>\n<\/td>\n<td valign=\"top\">\n<p>1200<\/p>\n<\/td>\n<td valign=\"top\">\n<p>2300<\/p>\n<\/td>\n<td valign=\"top\">\n<p>1200<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>Hamish<\/p>\n<\/td>\n<td valign=\"top\">\n<p>HR<\/p>\n<\/td>\n<td valign=\"top\">\n<p>1000<\/p>\n<\/td>\n<td valign=\"top\">\n<p>2300<\/p>\n<\/td>\n<td valign=\"top\">\n<p>1000<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>Hanan\u00a0<\/p>\n<\/td>\n<td valign=\"top\">\n<p>HR<\/p>\n<\/td>\n<td valign=\"top\">\n<p>1000<\/p>\n<\/td>\n<td valign=\"top\">\n<p>2300<\/p>\n<\/td>\n<td valign=\"top\">\n<p>1000<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>Hammond<\/p>\n<\/td>\n<td valign=\"top\">\n<p>HR<\/p>\n<\/td>\n<td valign=\"top\">\n<p>800<\/p>\n<\/td>\n<td valign=\"top\">\n<p>2300<\/p>\n<\/td>\n<td valign=\"top\">\n<p>800<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>Hamuel<\/p>\n<\/td>\n<td valign=\"top\">\n<p>HR<\/p>\n<\/td>\n<td valign=\"top\">\n<p>700<\/p>\n<\/td>\n<td valign=\"top\">\n<p>2300<\/p>\n<\/td>\n<td valign=\"top\">\n<p>700<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>Hanael<\/p>\n<\/td>\n<td valign=\"top\">\n<p>HR<\/p>\n<\/td>\n<td valign=\"top\">\n<p>600<\/p>\n<\/td>\n<td valign=\"top\">\n<p>2300<\/p>\n<\/td>\n<td valign=\"top\">\n<p>600<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>Sheldon<\/p>\n<\/td>\n<td valign=\"top\">\n<p>ship<\/p>\n<\/td>\n<td valign=\"top\">\n<p>5500<\/p>\n<\/td>\n<td valign=\"top\">\n<p>5500<\/p>\n<\/td>\n<td valign=\"top\">\n<p>5500<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>Shelby<\/p>\n<\/td>\n<td valign=\"top\">\n<p>ship<\/p>\n<\/td>\n<td valign=\"top\">\n<p>4500<\/p>\n<\/td>\n<td valign=\"top\">\n<p>5500<\/p>\n<\/td>\n<td valign=\"top\">\n<p>4500<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>Sheamus<\/p>\n<\/td>\n<td valign=\"top\">\n<p>ship<\/p>\n<\/td>\n<td valign=\"top\">\n<p>4000<\/p>\n<\/td>\n<td valign=\"top\">\n<p>5500<\/p>\n<\/td>\n<td valign=\"top\">\n<p>4000<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>Shelah<\/p>\n<\/td>\n<td valign=\"top\">\n<p>ship<\/p>\n<\/td>\n<td valign=\"top\">\n<p>3000<\/p>\n<\/td>\n<td valign=\"top\">\n<p>5500<\/p>\n<\/td>\n<td valign=\"top\">\n<p>3000<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>Shaquille<\/p>\n<\/td>\n<td valign=\"top\">\n<p>ship<\/p>\n<\/td>\n<td valign=\"top\">\n<p>3000<\/p>\n<\/td>\n<td valign=\"top\">\n<p>5500<\/p>\n<\/td>\n<td valign=\"top\">\n<p>3000<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>Shannon<\/p>\n<\/td>\n<td valign=\"top\">\n<p>ship<\/p>\n<\/td>\n<td valign=\"top\">\n<p>2000<\/p>\n<\/td>\n<td valign=\"top\">\n<p>5500<\/p>\n<\/td>\n<td valign=\"top\">\n<p>2000<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>Shannen<\/p>\n<\/td>\n<td valign=\"top\">\n<p>ship<\/p>\n<\/td>\n<td valign=\"top\">\n<p>1000<\/p>\n<\/td>\n<td valign=\"top\">\n<p>5500<\/p>\n<\/td>\n<td valign=\"top\">\n<p>1000<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>As expected, the <code>PARTITION BY<\/code> subclause gives us departmental sets, then the <code>ORDER BY <\/code>subclause puts the <code>salary_amt<\/code> values in descending order. Look at the output. There are problems. The result set is grouped by the <code>dept_name <\/code>column, with the <code>salary_amt<\/code> values in each group sorted. Look at the output from the sample data. The <code>top_salary<\/code> in each department is a constant and it is the highest salary for that that department. But the bottom_salary is a mess; just seeing multiple values tells you something is wrong.<\/p>\n<p>The <code>FIRST_VALUE<\/code> and<code> LAST_VALUE <\/code>functions support the<code> [ROWS|RANGE] <\/code>subclause and there is a default that makes no sense. Look at the accounting department where the top salary is $5500.00 and bottom salary is $3000.00. The department partition is being scanned in the order given by the<code> ORDER BY <\/code>clause. This fine for the <code>FIRST_VALUE<\/code> function with a <code>DESC<\/code> order, but it makes the<code> LAST_VALUE() <\/code>behave as &#8220;last value so far&#8221; function. This can be handled with the <code>[ROWS|RANGE] <\/code>subclause applied to the entire partition.<\/p>\n<pre>SELECT emp_name, dept_name, salary_amt, \r\n\u00a0\u00a0\u00a0\u00a0 FIRST_VALUE(salary_amt) \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 OVER(PARTITION BY dept_name\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ORDER BY salary_amt DESC\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ROWS BETWEEN UNBOUNDED PRECEDING\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 AND UNBOUNDED FOLLOWING) \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 AS top_salary, \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 LAST_VALUE(salary_amt) \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 OVER(PARTITION BY dept_name\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ORDER BY salary_amt DESC \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ROWS BETWEEN UNBOUNDED PRECEDING\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 AND UNBOUNDED FOLLOWING) \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 AS bottom_salary \r\n\u00a0 FROM Personnel_Assignments;\r\n<\/pre>\n<p>The first <code>ROWS<\/code> subclause is redundant, but it documents the query.<\/p>\n<p>The remaining two analytic functions are<code> LAG() <\/code>and <code>LEAD()<\/code>. These functions extend the<code> [ROWS|RANGE]<\/code> concept of a <code>CURRENT_ROW<\/code> to let us assume the rows are in a sorted order and we can move the imaginary cursor from the current row to a preceding row, <code>LAG()<\/code>, or from the current row to a following row, <code>LEAD().<\/code><\/p>\n<p>The<code> LEAD(&lt;column&gt;, [&lt;step size&gt;], [&lt;default&gt;]) <\/code>function retrieves a value from a row that is <code>&lt;step  size&gt;<\/code> rows following the current one. The<code> LAG(&lt;column&gt;, [&lt;step size&gt;], [&lt;default&gt;])<\/code> function retrieves a value from a row that is <code>&lt;step size&gt;<\/code> rows preceding the current one.<\/p>\n<p>Obviously, the first row in a partition has no preceding row and the last row in a partition has no following row. As expected in SQL, these missing values are returned as<code> NULL<\/code>s. Well, not quite. If you specify a <code>&lt;default&gt;<\/code>, then it is used.<\/p>\n<pre>SELECT dept_name, salary_amt, emp_name, \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 LAG(salary_amt, 1, 0.00) \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 OVER(PARTITION BY dept_name\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0ORDER BY salary_amt DESC)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 AS preceding_salary, \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 LEAD(salary_amt, 1, 0.00) \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 OVER(PARTITION BY dept_name\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ORDER BY salary_amt DESC)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 AS following_salary\r\n\u00a0 FROM Personnel_Assignments;\r\n<\/pre>\n<table>\n<tbody>\n<tr>\n<td valign=\"top\">\n<p>dept_name<\/p>\n<\/td>\n<td valign=\"top\">\n<p>salary_amt<\/p>\n<\/td>\n<td valign=\"top\">\n<p>\u00a0emp_name<\/p>\n<\/td>\n<td valign=\"top\">\n<p>preceding_salary<\/p>\n<\/td>\n<td valign=\"top\">\n<p>following_salary<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>acct<\/p>\n<\/td>\n<td valign=\"top\">\n<p>5500<\/p>\n<\/td>\n<td valign=\"top\">\n<p>Absalom\u00a0<\/p>\n<\/td>\n<td valign=\"top\">\n<p>0<\/p>\n<\/td>\n<td valign=\"top\">\n<p>3500<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>acct<\/p>\n<\/td>\n<td valign=\"top\">\n<p>3500<\/p>\n<\/td>\n<td valign=\"top\">\n<p>Abel\u00a0\u00a0\u00a0\u00a0<\/p>\n<\/td>\n<td valign=\"top\">\n<p>5500<\/p>\n<\/td>\n<td valign=\"top\">\n<p>3000<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>acct<\/p>\n<\/td>\n<td valign=\"top\">\n<p>3000<\/p>\n<\/td>\n<td valign=\"top\">\n<p>Aaron\u00a0\u00a0\u00a0\u00a0<\/p>\n<\/td>\n<td valign=\"top\">\n<p>3500<\/p>\n<\/td>\n<td valign=\"top\">\n<p>3000<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>acct<\/p>\n<\/td>\n<td valign=\"top\">\n<p>3000<\/p>\n<\/td>\n<td valign=\"top\">\n<p>Abaddon\u00a0\u00a0<\/p>\n<\/td>\n<td valign=\"top\">\n<p>3000<\/p>\n<\/td>\n<td valign=\"top\">\n<p>3000<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>acct<\/p>\n<\/td>\n<td valign=\"top\">\n<p>3000<\/p>\n<\/td>\n<td valign=\"top\">\n<p>Abbott\u00a0\u00a0\u00a0<\/p>\n<\/td>\n<td valign=\"top\">\n<p>3000<\/p>\n<\/td>\n<td valign=\"top\">\n<p>0<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>HR<\/p>\n<\/td>\n<td valign=\"top\">\n<p>2300<\/p>\n<\/td>\n<td valign=\"top\">\n<p>Hamilton<\/p>\n<\/td>\n<td valign=\"top\">\n<p>0<\/p>\n<\/td>\n<td valign=\"top\">\n<p>1200<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>HR<\/p>\n<\/td>\n<td valign=\"top\">\n<p>1200<\/p>\n<\/td>\n<td valign=\"top\">\n<p>Hamlet\u00a0<\/p>\n<\/td>\n<td valign=\"top\">\n<p>2300<\/p>\n<\/td>\n<td valign=\"top\">\n<p>1000<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>HR<\/p>\n<\/td>\n<td valign=\"top\">\n<p>1000<\/p>\n<\/td>\n<td valign=\"top\">\n<p>Hamish\u00a0<\/p>\n<\/td>\n<td valign=\"top\">\n<p>1200<\/p>\n<\/td>\n<td valign=\"top\">\n<p>1000<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>HR<\/p>\n<\/td>\n<td valign=\"top\">\n<p>1000<\/p>\n<\/td>\n<td valign=\"top\">\n<p>Hanan\u00a0\u00a0<\/p>\n<\/td>\n<td valign=\"top\">\n<p>1000<\/p>\n<\/td>\n<td valign=\"top\">\n<p>800<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>HR<\/p>\n<\/td>\n<td valign=\"top\">\n<p>800<\/p>\n<\/td>\n<td valign=\"top\">\n<p>Hammond\u00a0<\/p>\n<\/td>\n<td valign=\"top\">\n<p>1000<\/p>\n<\/td>\n<td valign=\"top\">\n<p>700<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>HR<\/p>\n<\/td>\n<td valign=\"top\">\n<p>700<\/p>\n<\/td>\n<td valign=\"top\">\n<p>Hamuel\u00a0<\/p>\n<\/td>\n<td valign=\"top\">\n<p>800<\/p>\n<\/td>\n<td valign=\"top\">\n<p>600<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>HR<\/p>\n<\/td>\n<td valign=\"top\">\n<p>600<\/p>\n<\/td>\n<td valign=\"top\">\n<p>Hanael\u00a0<\/p>\n<\/td>\n<td valign=\"top\">\n<p>700<\/p>\n<\/td>\n<td valign=\"top\">\n<p>0<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>ship<\/p>\n<\/td>\n<td valign=\"top\">\n<p>5500<\/p>\n<\/td>\n<td valign=\"top\">\n<p>Sheldon\u00a0<\/p>\n<\/td>\n<td valign=\"top\">\n<p>0<\/p>\n<\/td>\n<td valign=\"top\">\n<p>4500<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>ship<\/p>\n<\/td>\n<td valign=\"top\">\n<p>4500<\/p>\n<\/td>\n<td valign=\"top\">\n<p>Shelby\u00a0<\/p>\n<\/td>\n<td valign=\"top\">\n<p>5500<\/p>\n<\/td>\n<td valign=\"top\">\n<p>4000<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>ship<\/p>\n<\/td>\n<td valign=\"top\">\n<p>4000<\/p>\n<\/td>\n<td valign=\"top\">\n<p>Sheamus\u00a0<\/p>\n<\/td>\n<td valign=\"top\">\n<p>4500<\/p>\n<\/td>\n<td valign=\"top\">\n<p>3000<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>ship<\/p>\n<\/td>\n<td valign=\"top\">\n<p>3000<\/p>\n<\/td>\n<td valign=\"top\">\n<p>Shelah\u00a0<\/p>\n<\/td>\n<td valign=\"top\">\n<p>4000<\/p>\n<\/td>\n<td valign=\"top\">\n<p>3000<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>ship<\/p>\n<\/td>\n<td valign=\"top\">\n<p>3000<\/p>\n<\/td>\n<td valign=\"top\">\n<p>Shaquille<\/p>\n<\/td>\n<td valign=\"top\">\n<p>3000<\/p>\n<\/td>\n<td valign=\"top\">\n<p>2000<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>ship<\/p>\n<\/td>\n<td valign=\"top\">\n<p>2000<\/p>\n<\/td>\n<td valign=\"top\">\n<p>Shannon\u00a0<\/p>\n<\/td>\n<td valign=\"top\">\n<p>3000<\/p>\n<\/td>\n<td valign=\"top\">\n<p>1000<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>ship<\/p>\n<\/td>\n<td valign=\"top\">\n<p>1000<\/p>\n<\/td>\n<td valign=\"top\">\n<p>Shannen\u00a0<\/p>\n<\/td>\n<td valign=\"top\">\n<p>2000<\/p>\n<\/td>\n<td valign=\"top\">\n<p>0<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>While the parameters can be expressions, the usual values are a step size of one and perhaps zero for the default, if <code>NULL<\/code> would make computations or display problems.<\/p>\n<p>As the results show, each row displays <code>salary_amt<\/code> values from the previous and following rows within each partition, unless it is a first or last row, in which case <code>NULL<\/code> is returned.<\/p>\n<h2>Charles Babbage and the Difference Engine<\/h2>\n<p>If you do not know who Charles Babbage is, It is time to Google! What you probably do not know is the underlying math of his &#8220;difference engine&#8221; which is based on difference equations not differential equations. If you like to get the math in painful details, look at<\/p>\n<ol>\n<li>&#8220;Schaum&#8217;s Outline of Calculus of Finite Differences and Difference Equations&#8221; by Murray Spiegel; ISBN: 978-0070602182.<\/li>\n<li>&#8220;Introduction to Difference Equations&#8221; by Samuel Goldberg; ISBN: 978-0486650845.<\/li>\n<li>&#8220;Finite Difference Equations&#8221; by H. Levy; ISBN: 978-0486672601.<\/li>\n<\/ol>\n<p>This is an older, pre-calculus method to interpolate or tabulate functions by using a small set of polynomial coefficients. Both logarithmic and trigonometric functions can be approximated by polynomials, so this was a handy way to generate mathematical tables by teams of people doing simple operations. It took a long time, and those people were not error free. Before you laugh at this technology, it was used for scientific computation well into the 1950&#8217;s. We did not have cheap digital computers and analog computers were rare and expensive to configure because you had to hardwire them (but they were very fast once they were &#8220;programmed&#8221;). The\u00a0 improvements were mimeographed work sheets and the use of Smith-Corona Marchant mechanical calculators (they had multiplication and a government contract).<\/p>\n<p>\u00a0Babbage&#8217;s goal was to automate the process with a programmable digital computer in the 1870&#8217;s! Talk about being ahead of your time.<\/p>\n<p>The math is simple. Given a series of values of a polynomial, {x1, x2, x3, x4, .. }, we can subtract consecutive values and get what is called the first difference or first delta. This is written as \u00cexn = (xn+1 &#8211; xn). As expected, the second delta is \u00ce2xn = \u00ce(\u00cexn) and so forth.<\/p>\n<p>This is easier to see with an example of a simple polynomial, p(x) = 2x\u00c2\u00b2 &#8211; 3x + 2.<\/p>\n<table>\n<tbody>\n<tr>\n<td valign=\"top\">\n<p>x<\/p>\n<\/td>\n<td valign=\"top\">\n<p>p(x)<\/p>\n<\/td>\n<td valign=\"top\">\n<p>\u00cep(x)<\/p>\n<\/td>\n<td valign=\"top\">\n<p>\u00ce\u00c2\u00b2p(x)<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>0<\/p>\n<\/td>\n<td>\n<p>2<\/p>\n<\/td>\n<td rowspan=\"5\" valign=\"top\">\n<p>-1<\/p>\n<p>&#8212;&#8212;<\/p>\n<p>3<\/p>\n<p>&#8212;&#8212;<\/p>\n<p>7<\/p>\n<p>&#8212;&#8212;<\/p>\n<p>11<\/p>\n<\/td>\n<td rowspan=\"5\">\n<p>4<\/p>\n<div>\u00a0<\/div>\n<p>4<\/p>\n<div>\u00a0<\/div>\n<p>4<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>1<\/p>\n<\/td>\n<td>\n<p>1<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>2<\/p>\n<\/td>\n<td>\n<p>4<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>3<\/p>\n<\/td>\n<td>\n<p>11<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>4<\/p>\n<\/td>\n<td>\n<p>22<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>To calculate p(5) use the pattern from the bottom cells to get (4 +11 + 22) = 37. You can verify this with algebra: (2*5*5) &#8211; (3*5) + 2 = 37. The advantage is that we did not use anything but addition and subtraction. Those operations are a lot easier to implement in mechanical devices than multiplication and division.<\/p>\n<p>Those of you who still remember freshman calculus will see that this is the basis for derivatives of polynomials. We just need to take limits instead of deltas and we have calculus! Obviously, I have used extremely small data set.<\/p>\n<p>We can do deltas in SQL now. Consider this generic table for trend data. Since we already worked out a textbook example, let&#8217;s use it.<\/p>\n<pre>CREATE TABLE Data\r\n(time_point INTEGER NOT NULL PRIMARY KEY, \r\n\u00a0DECIMAL(5,2) INTEGER NOT NULL);\r\nINSERT INTO Data\r\nVALUES (0, 2.0), (1, 1.0), (2, 4.0), (3, 11.0), (4, 22.0); \r\n<\/pre>\n<p>This could be reading from a meter, stock prices over time, or whatever. Using SQL to translate the mathematics to SQL is pretty straightforward.<\/p>\n<pre>WITH Delta_1 (time_point, value, d1)\r\nAS\r\n(SELECT time_point, value,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 (value - LAG(value, 1) OVER (ORDER BY time_point))\r\n\u00a0 FROM Data),\r\nDelta_2 -- this is a pattern!\r\nAS\r\n(SELECT time_point, d1,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 (d1 - LAG(d1, 1) OVER (ORDER BY time_point)) AS delta_1\r\n\u00a0 FROM Delta_1)\r\nSELECT * FROM Delta_2;\r\n<\/pre>\n<p>As expected, we get the same results. I can get<code> delta_n<\/code> by repeating the second CTE pattern.<\/p>\n<table>\n<tbody>\n<tr>\n<td valign=\"top\">\n<p>time_point<\/p>\n<\/td>\n<td valign=\"top\">\n<p>delta_1<\/p>\n<\/td>\n<td valign=\"top\">\n<p>delta_2<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>0<\/p>\n<\/td>\n<td valign=\"top\">\n<p>NULL<\/p>\n<\/td>\n<td valign=\"top\">\n<p>NULL<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>1<\/p>\n<\/td>\n<td valign=\"top\">\n<p>-1<\/p>\n<\/td>\n<td valign=\"top\">\n<p>NULL<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>2<\/p>\n<\/td>\n<td valign=\"top\">\n<p>3<\/p>\n<\/td>\n<td valign=\"top\">\n<p>4<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>3<\/p>\n<\/td>\n<td valign=\"top\">\n<p>7<\/p>\n<\/td>\n<td valign=\"top\">\n<p>4<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>4<\/p>\n<\/td>\n<td valign=\"top\">\n<p>11<\/p>\n<\/td>\n<td valign=\"top\">\n<p>4<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>In the real world, the data is not this clean. But the principle still holds. Look at what happens when we &#8220;dirt up&#8221; the data.<\/p>\n<pre>INSERT INTO Data\r\nVALUES\r\n(0, 2.00),\r\n(1, 1.10),\r\n(2, 3.75),\r\n(3, 11.20),\r\n(4, 22.90);\r\n<\/pre>\n<p>The results are in this table:<\/p>\n<table>\n<tbody>\n<tr>\n<td valign=\"top\">\n<p>time_point<\/p>\n<\/td>\n<td valign=\"top\">\n<p>delta_1<\/p>\n<\/td>\n<td valign=\"top\">\n<p>delta_2<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\u00a0<\/td>\n<td valign=\"top\">\n<p>2<\/p>\n<\/td>\n<td valign=\"top\">\n<p>NULL<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>1<\/p>\n<\/td>\n<td valign=\"top\">\n<p>1.1<\/p>\n<\/td>\n<td valign=\"top\">\n<p>NULL<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>2<\/p>\n<\/td>\n<td valign=\"top\">\n<p>3.75<\/p>\n<\/td>\n<td valign=\"top\">\n<p>3.55<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>3<\/p>\n<\/td>\n<td valign=\"top\">\n<p>11.2<\/p>\n<\/td>\n<td valign=\"top\">\n<p>4.8<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>4<\/p>\n<\/td>\n<td valign=\"top\">\n<p>22.9<\/p>\n<\/td>\n<td valign=\"top\">\n<p>4.25<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>If you have graphing calculator, you can show the original polynomial and overlay these slightly off data points.<\/p>\n<p>The third delta gives you -1.80 instead of zero, but that is measure of the goodness of fit of this data to a second degree polynomial. If I had rounded the values, this would be smaller. In fact, I can find the average of the second deltas and subtract it from each delta, ((3.55 &#8211; 4.20) + (4.80 &#8211; 4.20) + (4.25 &#8211; 4.20))\/3.0 is virtually zero. This is a better measurement of the fit.<\/p>\n<p>There has been no advanced math used. This is also fast enough to use\u00a0 in real time to monitor a process. Bet you did not think SQL could be used for time series this way.<\/p>\n<h2>Filling in Gaps<\/h2>\n<p>This is not the only use for <code>LAG()<\/code> and <code>LEAD()<\/code>. Imagine you have a test that is automatically conducts every x-minutes, but sometimes we do not get a reading, so it shows up as a <code>NULL<\/code>.<\/p>\n<pre>\tCREATE TABLE Tests \r\n(test_id INTEGER NOT NULL, \r\n\u00a0test_timestamp DATETIME2(0) NOT NULL,\r\n\u00a0PRIMARY KEY (test_id, test_timestamp),\r\n\u00a0test_score INTEGER);\r\nINSERT INTO Tests\r\nVALUES \r\n(2, '2012-01-01 15:15:00', 15),\r\n(2, '2012-01-01 15:25:00', NULL),\r\n(2, '2012-01-01 15:55:00', NULL),\r\n(2, '2012-01-01 16:10:00', 12),\r\n(2, '2012-01-01 16:30:00', 12),\r\n(2, '2012-01-01 16:45:00', 9),\r\n(2, '2012-01-01 16:55:00', NULL),\r\n(2, '2012-01-01 17:12:00', 10);\r\n<\/pre>\n<p>Our business requirement is to calculate the delta on a newer row (based on the<code> test_timestamp)<\/code> from a value in the preceding row, except when the preceding row is <code>NULL<\/code> and then I need to look back until I can find a value.<\/p>\n<pre>CREATE VIEW Deltas\r\n(test_id, test_timestamp, test_score, delta_test_score)\r\nAS\r\n(SELECT test_id, test_timestamp, test_score,\r\n\u00a0\u00a0\u00a0 (test_score -\r\n\u00a0\u00a0\u00a0\u00a0 LAG(test_score)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 OVER (PARTITION BY test_id\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ORDER BY test_id, test_timestamp))\r\n\u00a0 FROM Tests\r\n\u00a0WHERE test_score IS NOT NULL\r\n\u00a0UNION \r\n\u00a0SELECT test_id, test_timestamp, test_score, 0 \r\n\u00a0 FROM Tests\r\n\u00a0WHERE test_score IS NULL);\r\n<\/pre>\n<table>\n<tbody>\n<tr>\n<td valign=\"top\">\n<p>test_id<\/p>\n<\/td>\n<td valign=\"top\">\n<p>test_timestamp<\/p>\n<\/td>\n<td valign=\"top\">\n<p>test_score<\/p>\n<\/td>\n<td valign=\"top\">\n<p>test_delta<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>2<\/p>\n<\/td>\n<td valign=\"top\">\n<p>&#8216;2012-01-01 15:15:00&#8217;<\/p>\n<\/td>\n<td valign=\"top\">\n<p>15<\/p>\n<\/td>\n<td valign=\"top\">\n<p>NULL<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>2<\/p>\n<\/td>\n<td valign=\"top\">\n<p>&#8216;2012-01-01 16:10:00&#8217;<\/p>\n<\/td>\n<td valign=\"top\">\n<p>12<\/p>\n<\/td>\n<td valign=\"top\">\n<p>-3<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>2<\/p>\n<\/td>\n<td valign=\"top\">\n<p>&#8216;2012-01-01 16:30:00&#8217;<\/p>\n<\/td>\n<td valign=\"top\">\n<p>12<\/p>\n<\/td>\n<td valign=\"top\">\n<p>0<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>2<\/p>\n<\/td>\n<td valign=\"top\">\n<p>&#8216;2012-01-01 16:45:00&#8217;<\/p>\n<\/td>\n<td valign=\"top\">\n<p>9<\/p>\n<\/td>\n<td valign=\"top\">\n<p>-3<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>2<\/p>\n<\/td>\n<td valign=\"top\">\n<p>&#8216;2012-01-01 17:12:00&#8217;<\/p>\n<\/td>\n<td valign=\"top\">\n<p>10<\/p>\n<\/td>\n<td valign=\"top\">\n<p>1<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>2<\/p>\n<\/td>\n<td valign=\"top\">\n<p>&#8216;2012-01-01 15:25:00&#8217;<\/p>\n<\/td>\n<td valign=\"top\">\n<p>NULL<\/p>\n<\/td>\n<td valign=\"top\">\n<p>0<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>2<\/p>\n<\/td>\n<td valign=\"top\">\n<p>&#8216;2012-01-01 15:55:00&#8217;<\/p>\n<\/td>\n<td valign=\"top\">\n<p>NULL<\/p>\n<\/td>\n<td valign=\"top\">\n<p>0<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>2<\/p>\n<\/td>\n<td valign=\"top\">\n<p>&#8216;2012-01-01 16:55:00&#8217;<\/p>\n<\/td>\n<td valign=\"top\">\n<p>NULL<\/p>\n<\/td>\n<td valign=\"top\">\n<p>0<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2>Looking at Old Code<\/h2>\n<p>We have to maintain old code, so it is worth looking at how we did analytic functions without the current syntax. The short answer is self-joins from hell. When you find this pattern, it is worth the effort to replace it.<\/p>\n<p>The trick was to build a CTE that adds a row number to the original data. This number is then used in outer self-joins with a step size, usually one. If you wanted to get a default, you can add a <code>COALESCE()<\/code>\u00a0 to the appropriate columns. Here is a skeleton that will need a lot of work.<\/p>\n<pre>WITH Sequenced_Data \r\nAS \r\n(SELECT *, \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ROW_NUMBER() OVER (ORDER BY sequencing_value) AS rn \r\n\u00a0 FROM Data)\r\nSELECT Preceding_Rows.*, Current_Rows.*, Following_Rows.*\r\n\u00a0 FROM Sequenced_Data AS Current_Rows\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 LEFT OUTER JOIN \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Sequenced_Data AS Preceding_Rows\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ON Preceding_Rows.rn = Current_Rows.rn - @step_size\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 LEFT OUTER JOIN \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Sequenced_Data AS Following_Rows\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ON Following_Rows.rn = Current_Rows.rn + @step_size;\r\n<\/pre>\n<p>The <code>FIRST() <\/code>and <code>LAST()<\/code> are done in a similar fashion. This skeleton uses the<code> ASC\/DESC<\/code> option in the <code>ROW_NUMBER() t<\/code>o get a value of one in the first and last positions of the original data.<\/p>\n<pre>WITH Sequenced_Data \r\nAS \r\n(SELECT *, \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ROW_NUMBER() OVER (ORDER BY sequencing_value ASC) AS up,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ROW_NUMBER() OVER (ORDER BY sequencing_value DESC) AS dn\r\n\u00a0\u00a0 FROM Data)\r\nSELECT * \r\n\u00a0 FROM Sequenced_Data \r\n\u00a0WHERE 1 IN (up, dn); \r\n<\/pre>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>SQL&#8217;s windowing functions are surprisingly versatile, and allow us to cut out all those self-joins and  explicit cursors. Joe Celko explains how they are used, and shows a few tricks such as calculating deltas in a time series, and filling in gaps.&hellip;<\/p>\n","protected":false},"author":96214,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143531],"tags":[4150,4252],"coauthors":[6781],"class_list":["post-1719","post","type-post","status-publish","format-standard","hentry","category-t-sql-programming-sql-server","tag-sql","tag-t-sql-programming"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/1719","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/96214"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=1719"}],"version-history":[{"count":8,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/1719\/revisions"}],"predecessor-version":[{"id":71924,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/1719\/revisions\/71924"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=1719"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=1719"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=1719"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=1719"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}