{"id":1764,"date":"2014-02-14T00:00:00","date_gmt":"2014-02-14T00:00:00","guid":{"rendered":"https:\/\/test.simple-talk.com\/uncategorized\/calculating-gaps-between-overlapping-time-intervals-in-sql\/"},"modified":"2021-09-29T16:21:39","modified_gmt":"2021-09-29T16:21:39","slug":"calculating-gaps-between-overlapping-time-intervals-in-sql","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/t-sql-programming-sql-server\/calculating-gaps-between-overlapping-time-intervals-in-sql\/","title":{"rendered":"Gaps in Overlapping Time Intervals in SQL Server: Packing Intervals and Finding Inactivity"},"content":{"rendered":"<div id=\"pretty\">\n<p>When building a web site that requires login access it is normal practice, and even a statutory requirement in some places, that you keep a record of user logins.\u00a0 This log can be invaluable for high-usage web sites in order to identify periods when the web site is infrequently used, so that a suitable time for maintenance such as operating system patching can be scheduled.<\/p>\n<p>Since you have no control over when authorized users log into your web sites, you&#8217;ll always have periods when more than one are logged in at the same time.\u00a0 Login and logout times will be staggered, and so the intervals when users are logged in will be overlapping.\u00a0 Today we&#8217;ll ultimately be showing you how to find the gaps of inactivity, but to do so we will also show you a neat way to identify those periods where there is activity.<\/p>\n<p>Although this may seem like a slightly contrived problem, there are plenty of times that you are faced with the sort of reporting where special SQL techniques are required. For example:<\/p>\n<ul>\n<li>If you have a fleet of vehicles where events represent departure and return, you may wish to ask during what periods are at least one of my vehicles out and when are they all not being used?<\/li>\n<li>In a manufacturing plant with multiple assembly lines where events are line starts and line stops, you can find out during what periods at least one line is in use or the periods when all lines are idle.<\/li>\n<li>By adding a bit of PARTITIONing to these queries, you can handle the case of a hotel that has multiple types of rooms.\u00a0 Then you can ask the questions, during what times is at least one room of a type occupied or when are all rooms of a particular type unoccupied.<\/li>\n<\/ul>\n<p>New features in SQL 2012 have been developed to make short work of some of these gaps problems, so we will need SQL 2012 to run many of the examples, specifically those that relate to data clean-up.\u00a0 These solutions are not limited to login\/logout intervals; actually any defined interval between specific events will do.<\/p>\n<p>The accompanying resource file: <b><i>Sample Queries (2012).sql<\/i><\/b> contains all of the sample queries shown in the following sections.\u00a0 All SQL resource files are appended with either (2008) or (2012) to indicate which version of SQL they can be run in.<\/p>\n<h2>Problem Setup and Explanation<\/h2>\n<p>First we&#8217;ll create a table with some basic set-up data that represents three user logins.\u00a0 The login periods will overlap and contain a gap to illustrate the essentials of what we are trying to do.<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">IF OBJECT_ID('tempdb.dbo.#Events', 'U') IS NOT NULL\r\nDROP TABLE #Events;\r\n\u00a0\r\nCREATE TABLE #Events\r\n(\r\n\u00a0\u00a0\u00a0 ID BIGINT\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 IDENTITY PRIMARY KEY\r\n\u00a0\u00a0\u00a0 ,UserID \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0INT\r\n\u00a0\u00a0\u00a0 ,EventTime \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0DATETIME\r\n\u00a0\u00a0\u00a0 ,[Event] \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0TINYINT -- 0=Login, 1=Logout\r\n);\r\n\u00a0\r\nINSERT INTO #Events\r\nSELECT 1, '2013-09-01 15:33', 0\r\nUNION ALL SELECT 2, '2013-09-01 16:15', 0\r\nUNION ALL SELECT 2, '2013-09-01 17:00', 1\r\nUNION ALL SELECT 3, '2013-09-01 17:10', 0\r\nUNION ALL SELECT 1, '2013-09-01 18:20', 1\r\nUNION ALL SELECT 3, '2013-09-01 19:10', 1\r\nUNION ALL SELECT 1, '2013-09-02 11:05', 0\r\nUNION ALL SELECT 1, '2013-09-02 11:45', 1;\r\n\u00a0\r\nSELECT * FROM #Events ORDER BY UserID, EventTime;\r\n<\/pre>\n<p>The raw sample data displayed by the final SELECT are as follows:<\/p>\n<pre>ID\u00a0\u00a0\u00a0 UserID\u00a0 EventTime\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Event\r\n1\u00a0\u00a0\u00a0\u00a0 1\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 2013-09-01 15:33:00.000\u00a0\u00a0\u00a0\u00a0 0\r\n5\u00a0\u00a0\u00a0\u00a0 1\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 2013-09-01 18:20:00.000\u00a0\u00a0\u00a0\u00a0 1\r\n7\u00a0\u00a0\u00a0\u00a0 1\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 2013-09-02 11:05:00.000\u00a0\u00a0\u00a0\u00a0 0\r\n8\u00a0\u00a0\u00a0\u00a0 1\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a02013-09-02 11:45:00.000\u00a0\u00a0\u00a0\u00a0 1\r\n2\u00a0\u00a0\u00a0\u00a0 2\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 2013-09-01 16:15:00.000\u00a0\u00a0\u00a0\u00a0 0\r\n3\u00a0\u00a0\u00a0\u00a0 2\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 2013-09-01 17:00:00.000\u00a0\u00a0\u00a0\u00a0 1\r\n4\u00a0\u00a0\u00a0\u00a0 3\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 2013-09-01 17:10:00.000\u00a0\u00a0\u00a0\u00a0 0\r\n6\u00a0\u00a0\u00a0\u00a0 3\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 2013-09-01 19:10:00.000\u00a0\u00a0\u00a0\u00a0 1\r\n<\/pre>\n<p>We can see that logins ([Event]=0) and logouts ([Event]=1) are evenly matched, meaning the logout time was recorded for each login.\u00a0 Let&#8217;s take a look graphically at these four events.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1942-clip_image002.jpg\" alt=\"1942-clip_image002.jpg\" width=\"584\" height=\"143\" \/><\/p>\n<p>Each login ID is represented by a different color.\u00a0 You can see that there are overlaps between the grouping of logins on the afternoon of 01 Sep and the single login by UserID=1 on 02 Sep.\u00a0 From the data, we ultimately seek to expose the gap between the logout of UserID=3 (blue) and the second login of UserID=1 (2013-09-01 19:10 to 2013-09-02 11:05).<\/p>\n<h2>Pairing of Different, Sequential Events<\/h2>\n<p>The following solutions assume that concurrent logins by a single account are not allowed.\u00a0 When your login\/logout events are equally paired, i.e., there are no missing logout endpoints, a solution such as I&#8217;ll be describing will group these events so that your query can place start\/end times of the events on a single row.<\/p>\n<p class=\"caption\">Sample Query #1: How to Pair up Logins with Logouts<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT UserID, LoginTime=MIN(EventTime), LogoutTime=MAX(EventTime)\r\nFROM \r\n(\r\n\u00a0\u00a0\u00a0 -- Construct a row number (rn) that is 1, 1, 2, 2, 3, 3, ... for each row\r\n\u00a0\u00a0\u00a0 SELECT UserID, EventTime\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ,rn=(ROW_NUMBER() OVER (PARTITION BY UserID ORDER BY EventTime)-1)\/2\r\n\u00a0\u00a0\u00a0 FROM #Events\r\n) a\r\n-- When we group by rn we have paired sessions\r\nGROUP BY UserID, rn \r\nORDER BY UserID, LoginTime;\r\n<\/pre>\n<p class=\"caption\">Results from Sample Query #1:<\/p>\n<pre>UserID\u00a0 LoginTime\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 LogoutTime\r\n1\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 2013-09-01 15:33:00.000\u00a0\u00a0 2013-09-01 18:20:00.000\r\n1\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 2013-09-02 11:05:00.000\u00a0\u00a0 2013-09-02 11:45:00.000\r\n2\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 2013-09-01 16:15:00.000\u00a0\u00a0 2013-09-01 17:00:00.000\r\n3\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 2013-09-01 17:10:00.000\u00a0\u00a0 2013-09-01 19:10:00.000\r\n<\/pre>\n<div class=\"note\">\n<p class=\"note\">Note how, in the derived table, we&#8217;ve created a grouping column as <b>(ROW_NUMBER()+1)\/2<\/b>, which is a well-known trick for pairing consecutive rows.\u00a0 Here are the results in rn for the first four rows:<\/p>\n<pre>1.\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 (1+1)\/2 = 1\r\n2.\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 (2+1)\/2 = 1 (integer division truncates the result)\r\n3.\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 (3+1)\/2 = 2\r\n4.\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 (4+1)\/2 = 2 (integer division truncates the result)\r\n5.\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ...\r\n<\/pre>\n<\/div>\n<p>Many times, usually due to incomplete implementation, web sites will not record or generate a logout event.\u00a0 This clutters the data with spurious, unpaired logins.\u00a0 Let&#8217;s add a couple of these to our data.<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">INSERT INTO #Events\r\nSELECT 1, '2013-09-01 18:00', 0\r\nUNION ALL SELECT 2,\u00a0 '2013-09-01 18:01', 0;\r\n\u00a0\r\nSELECT * FROM #Events ORDER BY UserID, EventTime;\r\n<\/pre>\n<p>The sample data now appears as follows, where IDs 1 and 10 represent a login that is not paired with a corresponding logout event.<\/p>\n<pre>ID\u00a0\u00a0 UserID\u00a0\u00a0 EventTime\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Event\r\n1\u00a0\u00a0\u00a0 1\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 2013-09-01 15:33:00.000\u00a0\u00a0\u00a0 0\r\n9\u00a0\u00a0\u00a0 1\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 2013-09-01 18:00:00.000\u00a0\u00a0\u00a0 0\r\n5\u00a0\u00a0\u00a0 1\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 2013-09-01 18:20:00.000\u00a0\u00a0\u00a0 1\r\n7\u00a0\u00a0\u00a0 1\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 2013-09-02 11:05:00.000\u00a0\u00a0\u00a0 0\r\n8\u00a0\u00a0\u00a0 1\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 2013-09-02 11:45:00.000\u00a0\u00a0\u00a0 1\r\n2\u00a0\u00a0\u00a0 2\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 2013-09-01 16:15:00.000\u00a0\u00a0\u00a0 0\r\n3\u00a0\u00a0\u00a0 2\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 2013-09-01 17:00:00.000\u00a0\u00a0\u00a0 1\r\n10\u00a0\u00a0 2\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 2013-09-01 18:01:00.000\u00a0\u00a0\u00a0 0\r\n4\u00a0\u00a0\u00a0 3\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 2013-09-01 17:10:00.000\u00a0\u00a0\u00a0 0\r\n6\u00a0\u00a0\u00a0 3\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 2013-09-01 19:10:00.000\u00a0\u00a0\u00a0 1\r\n<\/pre>\n<p>In order to pair logins with logouts in this case a different approach is required.\u00a0 Note the use of OUTER APPLY (rather than a CROSS APPLY) is required to bring into the results set the final, unpaired login for UserID=2.<\/p>\n<p class=\"caption\">Sample Query #2: Pairing Logins with no Corresponding Logout<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT UserID, LoginTime=a.EventTime, LogoutTime=b.EventTime\r\nFROM #Events a\r\n-- Use an OUTER APPLY to retain outer query rows when inner query returns NULL\r\nOUTER APPLY \r\n(\r\n\u00a0\u00a0\u00a0 -- find the first of any events for this user after the login. If a logout\r\n\u00a0\u00a0\u00a0 -- then use its EventTime. If a login return NULL\r\n\u00a0\u00a0\u00a0 -- For logout events (Event=1) pick up the EventTime as the time of logout\r\n\u00a0\u00a0\u00a0 SELECT TOP 1 EventTime=CASE WHEN [Event] = 1 THEN EventTime END\r\n\u00a0\u00a0\u00a0 FROM #Events b\r\n\u00a0\u00a0\u00a0 WHERE a.UserID = b.UserID AND b.EventTime &gt; a.EventTime\r\n\u00a0\u00a0\u00a0 ORDER BY b.EventTime \r\n) b\r\n-- Filter on only login events\r\n\u00a0WHERE [Event] = 0\r\nORDER BY UserID, LoginTime;\r\n<\/pre>\n<p>Since the execution plan of this query shows two Clustered Index Scans, compared to the execution plan of the first showing a single Clustered Index Scan, we&#8217;d expect the second query to be a bit slower than the first.\u00a0 So this data clean-up comes at a cost.\u00a0<\/p>\n<p class=\"caption\">Results from Sample Query #2:<\/p>\n<pre>UserID\u00a0 LoginTime\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 LogoutTime\r\n1\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 2013-09-01 15:33:00.000\u00a0\u00a0\u00a0 NULL\r\n1\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 2013-09-01 18:00:00.000\u00a0\u00a0\u00a0 2013-09-01 18:20:00.000\r\n1\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 2013-09-02 11:05:00.000\u00a0\u00a0\u00a0 2013-09-02 11:45:00.000\r\n2\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 2013-09-01 16:15:00.000\u00a0\u00a0\u00a0 2013-09-01 17:00:00.000\r\n2\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 2013-09-01 18:01:00.000\u00a0\u00a0\u00a0 NULL\r\n3\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 2013-09-01 17:10:00.000\u00a0\u00a0\u00a0 2013-09-01 19:10:00.000\r\n<\/pre>\n<p>Unfortunately, further testing of this pairing method using a large row set identified that it is quite slow.\u00a0 To find a better alternative, we&#8217;ll turn to the new LEAD analytic function in SQL 2012.\u00a0 You&#8217;ll find that the following query returns the exact same row set as the prior, but ultimately performs much better (in fact it takes about 10% of the time to run).<\/p>\n<p class=\"caption\">Sample Query #3: Using LEAD to Improve Performance (where some logins have no logout)<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT UserID, LoginTime, LogoutTime\r\nFROM\r\n(\r\n\u00a0\u00a0\u00a0 SELECT ID, UserID\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ,LoginTime=CASE WHEN [Event] = 0 THEN EventTime END\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ,LogoutTime=CASE \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 -- If the next Event by EventTime is a logout \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 WHEN LEAD([Event], 1) OVER (PARTITION BY UserID ORDER BY EventTime) = 1\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 -- Then we pick up the corresponding EventTime\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 THEN LEAD(EventTime, 1) OVER (PARTITION BY UserID ORDER BY EventTime)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 -- Otherwise logout time will be set to NULL\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 END\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ,[Event]\r\n\u00a0\u00a0\u00a0 FROM #Events\r\n) b\r\n-- Filter only on login events\r\nWHERE [Event] = 0\r\nORDER BY UserID, LoginTime;\r\n<\/pre>\n<p>It is quite easy to eliminate the rows with missing <b>LogoutTimes<\/b> in either of the above queries by adding a check for NULL <b>LogoutTime<\/b> to the WHERE clause.<\/p>\n<p>It would also be possible to implement other business rules for cases where no logout time is recorded for a login event, for example:<\/p>\n<ul>\n<li>Use the average duration of that user&#8217;s logins to predict an approximate LogoutTime.<\/li>\n<li>\u00a0Assume the user is still logged in, so substitute GETDATE() for the LogoutTime.<\/li>\n<\/ul>\n<p>We will leave these as exercises for the interested reader.<\/p>\n<h2>Packing the Overlapping Login Intervals<\/h2>\n<p>Sometime back, a good friend of mine pointed me to an article that describes a way to pack overlapping intervals.\u00a0 That article is called <a href=\"http:\/\/blogs.solidq.com\/en\/sqlserver\/packing-intervals\">Packing Intervals<\/a> and it is by none other than SQL MVP and all-around guru, Mr. Itzik Ben-Gan.\u00a0 When I first read that article, I was impressed by the elegantly simple explanation for how this highly imaginative query works, so I won&#8217;t try to inadequately reproduce it here.\u00a0 What I can tell you is that it took me all of about three milliseconds (coincidentally the resolution of the SQL <b> DATETIME<\/b> data type) to realize that this was something I needed to retain in my tool box.\u00a0 Let&#8217;s look at how we would apply this technique to our sample data, by first putting our Example Query #3 (eliminating the spurious logins) into a Common Table Expression and then applying the Packing Intervals technique.<\/p>\n<p class=\"caption\">Sample Query #4: Pack the Overlapping Intervals down to non-overlapping Events<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">WITH GroupedLogins AS\r\n(\r\n\u00a0\u00a0\u00a0 -- Essentially the same as Sample Query #3 except a different filter\r\n\u00a0\u00a0\u00a0 SELECT UserID, LoginTime, LogoutTime\r\n\u00a0\u00a0\u00a0 FROM\r\n\u00a0\u00a0\u00a0 (\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 SELECT ID, UserID\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ,LoginTime=CASE WHEN [Event] = 0 THEN EventTime END\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ,LogoutTime=CASE \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 -- If the next Event by EventTime is a logout \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 WHEN LEAD([Event], 1) OVER (PARTITION BY UserID ORDER BY EventTime) = 1\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 -- Then we pick up the corresponding EventTime\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 THEN LEAD(EventTime, 1) OVER (PARTITION BY UserID ORDER BY EventTime)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 -- Otherwise logout time will be set to NULL\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 END\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ,[Event]\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 FROM #Events\r\n\u00a0\u00a0\u00a0 ) a\r\n\u00a0\u00a0\u00a0 -- Filter only on login events that are not \"open\" (no logout)\r\n\u00a0\u00a0\u00a0 WHERE [Event] = 0 AND LogoutTime IS NOT NULL\r\n),\r\nC1 AS \r\n(\r\n\u00a0\u00a0\u00a0 -- Since the CTE above produces rows with a start and end date, we'll first unpivot\r\n\u00a0\u00a0\u00a0 -- those using CROSS APPLY VALUES and assign a type to each.\u00a0 The columns e and s will\r\n\u00a0\u00a0\u00a0 -- either be NULL (s NULL for an end date, e NULL for a start date) or row numbers\r\n\u00a0\u00a0\u00a0 -- sequenced by the time.\u00a0 UserID has been eliminated because we're looking for overlapping\r\n\u00a0\u00a0\u00a0 -- intervals across all users.\r\n\u00a0\u00a0\u00a0 SELECT ts, [Type]\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ,e=CASE [Type] \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 WHEN 1 THEN NULL \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ELSE ROW_NUMBER() OVER (PARTITION BY [Type] ORDER BY LogoutTime) END\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ,s=CASE [Type] \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 WHEN -1 THEN NULL \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ELSE ROW_NUMBER() OVER (PARTITION BY [Type] ORDER BY LoginTime) END\r\n\u00a0\u00a0\u00a0 FROM GroupedLogins\r\n\u00a0\u00a0\u00a0 CROSS APPLY (VALUES (1, LoginTime), (-1, LogoutTime)) a([Type], ts)\r\n),\r\nC2 AS \r\n(\r\n\u00a0\u00a0\u00a0 -- Add a row number ordered as shown\r\n\u00a0\u00a0\u00a0 SELECT C1.*, se=ROW_NUMBER() OVER (ORDER BY ts, [Type] DESC)\r\n\u00a0\u00a0\u00a0 FROM C1\r\n),\r\nC3 AS \r\n(\r\n\u00a0\u00a0\u00a0 -- Create a grpnm that pairs the rows\r\n\u00a0\u00a0\u00a0 SELECT ts, grpnm=FLOOR((ROW_NUMBER() OVER (ORDER BY ts)-1) \/ 2 + 1)\r\n\u00a0\u00a0\u00a0 FROM C2\r\n\u00a0\u00a0\u00a0 -- This filter is the magic that eliminates the overlaps\r\n\u00a0\u00a0\u00a0 WHERE COALESCE(s-(se-s)-1, (se-e)-e) = 0\r\n),\r\nC4 AS\r\n(\r\n\u00a0\u00a0\u00a0 -- Grouping by grpnm restores the records to only non-overlapped intervals\r\n\u00a0\u00a0\u00a0 SELECT StartDate=MIN(ts), EndDate=MAX(ts)\r\n\u00a0\u00a0\u00a0 FROM C3\r\n\u00a0\u00a0\u00a0 GROUP BY grpnm\r\n)\r\nSELECT *\r\nFROM C4\r\nORDER BY StartDate;\r\n<\/pre>\n<p>It is instructive to see how each successive CTE (ignoring the first because we&#8217;ve already seen what it does) returns a row set that brings us closer to our target of non-overlapping intervals.\u00a0 Rather than being repetitive, I&#8217;ll refer you to the comments at each step (above).<\/p>\n<p class=\"caption\">Results from C1:<\/p>\n<pre>ts\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Type\u00a0\u00a0 e\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 s\r\n2013-09-01 17:00:00.000\u00a0 -1\u00a0\u00a0\u00a0\u00a0 1\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 NULL\r\n2013-09-01 19:10:00.000\u00a0 -1\u00a0\u00a0\u00a0\u00a0 3\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 NULL\r\n2013-09-01 18:20:00.000\u00a0 -1\u00a0\u00a0\u00a0\u00a0 2\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 NULL\r\n2013-09-02 11:45:00.000\u00a0 -1\u00a0\u00a0\u00a0\u00a0 4\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 NULL\r\n2013-09-01 16:15:00.000\u00a0 1\u00a0\u00a0\u00a0\u00a0\u00a0 NULL\u00a0\u00a0\u00a0 1\r\n2013-09-01 17:10:00.000\u00a0 1\u00a0\u00a0\u00a0\u00a0\u00a0 NULL\u00a0\u00a0\u00a0 2\r\n2013-09-01 18:00:00.000\u00a0 1\u00a0\u00a0\u00a0\u00a0\u00a0 NULL\u00a0\u00a0\u00a0 3\r\n2013-09-02 11:05:00.000\u00a0 1\u00a0\u00a0\u00a0\u00a0\u00a0 NULL\u00a0\u00a0\u00a0 4\r\n<\/pre>\n<p class=\"caption\">Results from C2:<\/p>\n<pre>ts\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Type\u00a0\u00a0 e\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 s\u00a0\u00a0\u00a0\u00a0\u00a0 se\r\n2013-09-01 16:15:00.000\u00a0 1\u00a0\u00a0\u00a0\u00a0\u00a0 NULL\u00a0\u00a0\u00a0 1\u00a0\u00a0\u00a0\u00a0\u00a0 1\r\n2013-09-01 17:00:00.000\u00a0 -1\u00a0\u00a0\u00a0\u00a0 1\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 NULL\u00a0\u00a0 2\r\n2013-09-01 17:10:00.000\u00a0 1\u00a0\u00a0\u00a0\u00a0\u00a0 NULL\u00a0\u00a0\u00a0 2\u00a0\u00a0\u00a0\u00a0\u00a0 3\r\n2013-09-01 18:00:00.000\u00a0 1\u00a0\u00a0\u00a0\u00a0\u00a0 NULL\u00a0\u00a0\u00a0 3\u00a0\u00a0\u00a0\u00a0\u00a0 4\r\n2013-09-01 18:20:00.000\u00a0 -1\u00a0\u00a0\u00a0\u00a0 2\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 NULL\u00a0\u00a0 5\r\n2013-09-01 19:10:00.000\u00a0 -1\u00a0\u00a0\u00a0\u00a0 3\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 NULL\u00a0\u00a0 6\r\n2013-09-02 11:05:00.000\u00a0 1\u00a0\u00a0\u00a0\u00a0\u00a0 NULL\u00a0\u00a0\u00a0 4\u00a0\u00a0\u00a0\u00a0\u00a0 7\r\n2013-09-02 11:45:00.000\u00a0 -1\u00a0\u00a0\u00a0\u00a0 4\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 NULL\u00a0\u00a0 8\r\n<\/pre>\n<p class=\"caption\">Results from C3:<\/p>\n<pre>ts\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 grpnm\r\n2013-09-01 16:15:00.000\u00a0 1\r\n2013-09-01 17:00:00.000\u00a0 1\r\n2013-09-01 17:10:00.000\u00a0 2\r\n2013-09-01 19:10:00.000\u00a0 2\r\n2013-09-02 11:05:00.000\u00a0 3\r\n2013-09-02 11:45:00.000\u00a0 3\r\n<\/pre>\n<p>The results produced by this fascinating method are shown below.<\/p>\n<p class=\"caption\">Final Results from Sample Query #4:<\/p>\n<pre>StartDate\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 EndDate\r\n2013-09-01 16:15:00.000\u00a0\u00a0 2013-09-01 17:00:00.000\r\n2013-09-01 17:10:00.000\u00a0\u00a0 2013-09-01 19:10:00.000\r\n2013-09-02 11:05:00.000\u00a0\u00a0 2013-09-02 11:45:00.000\r\n<\/pre>\n<p>If we return to look at Sample Query Results #2, we find that these represent the packed intervals across all of the <b>UserID<\/b>s.\u00a0 You can also think of these as &#8220;islands&#8221; of time where users are logged into our system.<\/p>\n<p>Before we move on to finding the gaps in these islands, let&#8217;s take a closer look at what&#8217;s happening in CTEs <b>GroupedLogins<\/b> and C1.\u00a0 The only reason we have <b>GroupedLogins<\/b> to begin with is to remove the spurious logins.\u00a0 Looking at the work being done by C1, it is ungrouping that which we just grouped.\u00a0 This leads us to believe that if the spurious logins were not present it might be possible to simplify the query, so let&#8217;s try to do this after first deleting the two rows we added to our original sample data.<\/p>\n<p class=\"caption\">Sample Query #5: Simplify the Packing Method by using our logins\/logouts directly<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">DELETE FROM #Events WHERE ID IN (9,10);\r\n\u00a0\r\nWITH\u00a0 C1 AS \r\n(\r\n\u00a0\u00a0\u00a0 -- It is no longer necessary to CROSS APPLY VALUES to UNPIVOT.\u00a0 The columns e and s will\r\n\u00a0\u00a0\u00a0 -- either be NULL (s NULL for an end date, e NULL for a start date) or row numbers\r\n\u00a0\u00a0\u00a0 -- sequenced by the time.\u00a0 UserID has been eliminated because we're looking for overlapping\r\n\u00a0\u00a0\u00a0 -- intervals across all users.\u00a0 Now we're using the native event type [Event] instead of\r\n\u00a0\u00a0\u00a0 -- creating one during the UNPIVOT.\r\n\u00a0\u00a0\u00a0 SELECT EventTime, [Event]\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ,e=CASE [Event] \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 WHEN 0 THEN NULL \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ELSE ROW_NUMBER() OVER (PARTITION BY [Event] ORDER BY EventTime) END\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ,s=CASE [Event] \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 WHEN 1 THEN NULL \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ELSE ROW_NUMBER() OVER (PARTITION BY [Event] ORDER BY EventTime) END\r\n\u00a0\u00a0\u00a0 FROM #Events\r\n),\r\nC2 AS \r\n(\r\n\u00a0\u00a0\u00a0 -- Add a row number ordered as shown\r\n\u00a0\u00a0\u00a0 SELECT C1.*, se=ROW_NUMBER() OVER (ORDER BY EventTime, [Event] DESC)\r\n\u00a0\u00a0\u00a0 FROM C1\r\n),\r\nC3 AS \r\n(\r\n\u00a0\u00a0\u00a0 -- Create a grpnm that pairs the rows\r\n\u00a0\u00a0\u00a0 SELECT EventTime, grpnm=FLOOR((ROW_NUMBER() OVER (ORDER BY EventTime)-1) \/ 2 + 1)\r\n\u00a0\u00a0\u00a0 FROM C2\r\n\u00a0\u00a0\u00a0 -- This filter is the magic that eliminates the overlaps\r\n\u00a0\u00a0\u00a0 WHERE COALESCE(s-(se-s)-1, (se-e)-e) = 0\r\n),\r\nC4 AS\r\n(\r\n\u00a0\u00a0\u00a0 -- Grouping by grpnm restores the records to only non-overlapped intervals\r\n\u00a0\u00a0\u00a0 SELECT StartDate=MIN(EventTime), EndDate=MAX(EventTime)\r\n\u00a0\u00a0\u00a0 FROM C3\r\n\u00a0\u00a0\u00a0 GROUP BY grpnm\r\n)\r\nSELECT *\r\nFROM C4\r\nORDER BY StartDate;\r\nHere we have used the native event type, instead of constructing one using CROSS APPLY VALUES as Mr. Ben-Gan did in his original C1 CTE.\u00a0 The results from this query are shown below.\r\n<\/pre>\n<p class=\"caption\">Results from Sample Query #5:<\/p>\n<pre>StartDate\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 EndDate\r\n2013-09-01 15:33:00.000\u00a0 2013-09-01 19:10:00.000\r\n2013-09-02 11:05:00.000\u00a0 2013-09-02 11:45:00.000\r\n<\/pre>\n<p>These correspond to the packed intervals in our original data, including the depiction in our diagram.\u00a0 Note that it is slightly different than the packed results we got from Sample Query #4, because the &#8220;spurious logins&#8221; that we removed resulted in slightly different overlapping intervals.<\/p>\n<h2>Converting Packed Intervals to the Corresponding Gaps<\/h2>\n<p>Once we have our packed intervals (&#8220;islands&#8221;) it is simple enough to add just a little bit of code to convert these into gaps.\u00a0 We draw upon a previous Simple Talk article called <a href=\"https:\/\/www.simple-talk.com\/sql\/t-sql-programming\/the-sql-of-gaps-and-islands-in-sequences\/\">The SQL of Gaps and Islands in Sequence Numbers<\/a> and look for the <b>CROSS APPLY VALUES<\/b> method to convert islands to gaps.\u00a0 We will add this technique to Sample Query #5 to get to:<\/p>\n<p class=\"caption\">Sample Query #6: Convert Packed Intervals into Gaps Between them<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">WITH C1 AS \r\n(\r\n\u00a0\u00a0\u00a0 -- The columns e and s will either be NULL (s NULL for an end date, e NULL for a start date) \r\n\u00a0\u00a0\u00a0 -- or row numbers sequenced by the time.\u00a0 UserID has been eliminated because we're looking \r\n\u00a0\u00a0\u00a0 -- for overlapping intervals across all users.\u00a0 Now we're using the native event type [Event] \r\n\u00a0\u00a0\u00a0 -- instead of creating one during the UNPIVOT.\r\n\u00a0\u00a0\u00a0 SELECT EventTime, [Event]\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ,e=CASE [Event] \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 WHEN 0 THEN NULL \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ELSE ROW_NUMBER() OVER (PARTITION BY [Event] ORDER BY EventTime) END\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ,s=CASE [Event] \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 WHEN 1 THEN NULL \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ELSE ROW_NUMBER() OVER (PARTITION BY [Event] ORDER BY EventTime) END\r\n\u00a0\u00a0\u00a0 FROM #Events\r\n),\r\nC2 AS \r\n(\r\n\u00a0\u00a0\u00a0 -- Add a row number ordered as shown\r\n\u00a0\u00a0\u00a0 SELECT C1.*, se=ROW_NUMBER() OVER (ORDER BY EventTime, [Event] DESC)\r\n\u00a0\u00a0\u00a0 FROM C1\r\n),\r\nC3 AS \r\n(\r\n\u00a0\u00a0\u00a0 -- Create a grpnm that pairs the rows\r\n\u00a0\u00a0\u00a0 SELECT EventTime, grpnm=FLOOR((ROW_NUMBER() OVER (ORDER BY EventTime)-1) \/ 2 + 1)\r\n\u00a0\u00a0\u00a0 FROM C2\r\n\u00a0\u00a0\u00a0 -- This filter is the magic that eliminates the overlaps\r\n\u00a0\u00a0\u00a0 WHERE COALESCE(s-(se-s)-1, (se-e)-e) = 0\r\n),\r\nC4 AS\r\n(\r\n\u00a0\u00a0\u00a0 -- Grouping by grpnm restores the records to only non-overlapped intervals\r\n\u00a0\u00a0\u00a0 SELECT StartDate=MIN(EventTime), EndDate=MAX(EventTime)\r\n\u00a0\u00a0\u00a0 FROM C3\r\n\u00a0\u00a0\u00a0 GROUP BY grpnm\r\n)\r\n-- CROSS APPLY VALUES to convert islands to gaps\r\nSELECT StartTime=MIN(EventTime), EndTime=MAX(EventTime)\r\nFROM\r\n(\r\n\u00a0\u00a0\u00a0 SELECT EventTime\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ,rn=ROW_NUMBER() OVER (ORDER BY EventTime)\/2\r\n\u00a0\u00a0\u00a0 FROM C4 a\r\n\u00a0\u00a0\u00a0 CROSS APPLY (VALUES (StartDate), (EndDate)) b(EventTime)\r\n\u00a0\u00a0\u00a0 ) a\r\nGROUP BY rn\r\nHAVING COUNT(*) = 2\r\nORDER BY StartTime;\r\nComparing the results of this query with the Results from Sample Query #5, we can see by inspection that the row returned is the gap between the two islands.\r\n<\/pre>\n<p class=\"caption\">Results from Sample Query #6:<\/p>\n<pre>StartTime\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 EndTime\r\n2013-09-01 19:10:00.000\u00a0\u00a0 2013-09-02 11:05:00.000\r\n<\/pre>\n<p>Obviously, if we needed to handle spurious logins, the little bit of code we added after C4 could be applied just as easily to Sample Query #4.<\/p>\n<h2>Building a Test Harness<\/h2>\n<p>Now that we have a way to calculate gaps between overlapping intervals, we want to make sure that the code we&#8217;ve built will scale to large row sets in a reasonable fashion.\u00a0 To do this, we must build a test harness with a realistic number of rows.<\/p>\n<p>We can simulate the duration of a login by starting the login at a particular time of day and then assume that the user was logged in for a random period of time.\u00a0 In SQL, we can generate uniform random numbers like this:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT 10+ABS(CHECKSUM(NEWID()))%120<\/pre>\n<p>This will generate a uniformly-distributed random integer between 10 and 130, which could for example represent the minutes the user was logged in.\u00a0 But we&#8217;re going to have a little fun and say that our logins are not based on a uniform distribution, rather the time logged in will behave more like a normal distribution where the mean is 240 seconds and the standard deviation is 60 seconds.\u00a0 To do this, we need a way to generate normally distributed random numbers.<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">CREATE FUNCTION dbo.RN_NORMAL \r\n(\r\n\u00a0\u00a0\u00a0 @Mean\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 FLOAT\r\n\u00a0\u00a0\u00a0 ,@StDev\u00a0\u00a0\u00a0\u00a0 FLOAT\r\n\u00a0\u00a0\u00a0 ,@URN1\u00a0\u00a0\u00a0\u00a0\u00a0 FLOAT\r\n\u00a0\u00a0\u00a0 ,@URN2\u00a0\u00a0\u00a0\u00a0\u00a0 FLOAT\r\n)\r\nRETURNS FLOAT WITH SCHEMABINDING \r\nAS\r\nBEGIN\r\n\u00a0\u00a0\u00a0 -- Based on the Box-Muller Transform\r\n\u00a0\u00a0\u00a0 RETURN (@StDev * SQRT(-2 * LOG(@URN1))*COS(2*ACOS(-1.)*@URN2)) + @Mean\r\nEND\r\n<\/pre>\n<p>The function presented above does this.\u00a0 <a href=\"http:\/\/www.sqlservercentral.com\/articles\/SQL+Uniform+Random+Numbers\/91103\/\">Generating Non-uniform Random Numbers with SQL<\/a> has an empirical proof that it does, and you will also find several other random number generators for other distributions contained therein.\u00a0 To use this function, you must pass it two uniform random numbers on the interval 0 to 1.<\/p>\n<p>The test data (1,000,000 rows) can be generated based on 500 days and 1000 UserIds as follows:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">DECLARE @StartDate\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 DATETIME\u00a0\u00a0\u00a0 = '2005-01-01'\r\n\u00a0\u00a0\u00a0 -- 500 days x 1000 users x 2 endpoints = 1,000,000 rows\r\n\u00a0\u00a0\u00a0 ,@UserIDs\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 INT\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 = 1000\r\n\u00a0\u00a0\u00a0 ,@Days\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 INT\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 = 500\u00a0 \r\n\u00a0\u00a0\u00a0 ,@MeanLoginSeconds\u00a0\u00a0\u00a0\u00a0\u00a0 INT\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 = 240\r\n\u00a0\u00a0\u00a0 ,@StdDevLoginSeconds\u00a0\u00a0\u00a0 INT\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 = 60;\u00a0\u00a0\u00a0 \r\n\u00a0\r\nWITH Tally (n) AS\r\n(\r\n\u00a0\u00a0\u00a0 SELECT TOP\r\n\u00a0\u00a0\u00a0 (\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 SELECT CASE WHEN @UserIDs &gt; @Days THEN @UserIDs ELSE @Days END\r\n\u00a0\u00a0\u00a0 )\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))\r\n\u00a0\u00a0\u00a0 FROM sys.all_columns a CROSS JOIN sys.all_columns b\r\n)\r\nINSERT INTO #Events\r\nSELECT UserID=f.n, EventTime, [Event]\r\n-- Create 2 uniform random numbers\r\n\u00a0\r\n\u00a0FROM (SELECT RAND(CHECKSUM(NEWID()))) a(URN1)\r\nCROSS APPLY (SELECT RAND(CHECKSUM(NEWID()))) b(URN2)\r\n-- Create 1 Normally-distributed random number from the 2 URNs using the\r\n-- using the Box-Muller transform function\r\n\u00a0CROSS APPLY \r\n(\r\n\u00a0\u00a0\u00a0 SELECT dbo.RN_NORMAL(@MeanLoginSeconds, @StdDevLoginSeconds, URN1, URN2)\r\n) c(NRN) \r\n-- Use our Tally table to generate the required number of @Days then convert to a date\r\nCROSS APPLY (SELECT n FROM Tally WHERE n BETWEEN 1 AND @Days) d\r\nCROSS APPLY (SELECT DATEADD(day, d.n-1, @StartDate)) e([Date])\r\n-- Use our Tally table to generate the required number of @users and logins\/logouts\r\n\u00a0CROSS APPLY (SELECT n FROM Tally WHERE n BETWEEN 1 AND @UserIDs) f\r\nCROSS APPLY (SELECT DATEADD(second, ABS(CHECKSUM(NEWID()))%84000, e.[Date])) g(LoginStart)\r\nCROSS APPLY \r\n(\r\n\u00a0\u00a0\u00a0 VALUES (0, LoginStart),(1, DATEADD(second, NRN, LoginStart))\r\n) h([Event], EventTime);\r\n<\/pre>\n<p>Each UserID logs in once per day.\u00a0 Rows resulting from the query can be increased in increments of 1,000,000 rows by changing @Days to 1000, 1500 and 2000.\u00a0 Using Sample Query #6 with the above table generates about 30,000 packed intervals and consequently nearly the same number of gaps (actually exactly one less).\u00a0 This can be demonstrated by running the SQL code stored in the resource file: <b> <i>Check Test Harness (2008).sql<\/i><\/b>.<\/p>\n<p>Since each of our queries is built upon a series of steps, we&#8217;ll break each step down and get timings on each.\u00a0 Two additional test harness SQL files are provided:<\/p>\n<ul>\n<li><code>Test Harness 1 (2008).sql<\/code> &#8211; tests 2 queries which are like Sample Query #5 plus the additional code to convert those islands to gaps (Sample Query #6).<\/li>\n<li><code>Test Harness 2 (2012).sql<\/code> &#8211; tests 3 queries which are like Sample Query #3 (removing the logins with missing logouts), Sample Query #4 (to calculate the packed intervals) and finally with the additional code to convert those islands into gaps.\u00a0<\/li>\n<\/ul>\n<p>The DELETE statement below appears in<b><i> Test Harness 2 (2012).sql<\/i><\/b> to remove a few logout events.<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">WITH Logouts AS\r\n(\r\n\u00a0\u00a0\u00a0 SELECT *, rn=ROW_NUMBER() OVER (ORDER BY EventTime)\r\n\u00a0\u00a0\u00a0 FROM #Events\r\n\u00a0\u00a0\u00a0 WHERE [Event] = 1\r\n)\r\nDELETE FROM Logouts\r\nWHERE rn % 300 = 2;\r\n<\/pre>\n<p>This deletes about 1,667 logout rows out of 1,000,000 (about 0.17%).<\/p>\n<h2>Performance Results<\/h2>\n<p>Using <b><i>Test Harness 1 (2008).sql<\/i><\/b> (run in SQL 2008) we can demonstrate that the additional time for CPU and Elapsed run times when adding the CROSS APPLY VALUES Islands to Gaps method to the interval packing is nearly insignificant.\u00a0 Together the solutions both scale quite linearly and are fairly reasonable when there are no missing logouts.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1942-clip_image004.gif\" alt=\"1942-clip_image004.gif\" width=\"624\" height=\"313\" \/><\/p>\n<p>The result below at 4M rows for elapsed time (Interval Gaps less than Pack Intervals) can only be explained by an increasing amount of parallelism introduced by SQL Server into the query.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1942-clip_image006.gif\" alt=\"1942-clip_image006.gif\" width=\"624\" height=\"313\" \/><\/p>\n<p>We see a similar story when reviewing the results when there are missing logouts (using <b><i>Test Harness 2 (2012).sql<\/i><\/b> (run in SQL 2012).\u00a0 Simply needing to remove the missing logouts takes slightly more than 50% of the CPU, but when we layer calculating the gaps on top of the packing algorithm, that additional work is quite minimal.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1942-clip_image008.gif\" alt=\"1942-clip_image008.gif\" width=\"624\" height=\"249\" \/><\/p>\n<p>Elapsed times tell a similar story except that the amount of time for removing the open-ended logins does not comprise nearly as large a percentage of the overall time. \u00a0Nonetheless, both interval packing and calculating the gaps between them scales quite linearly.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1942-clip_image010.gif\" alt=\"1942-clip_image010.gif\" width=\"624\" height=\"249\" \/><\/p>\n<p>Note that test results for the cases of no vs. missing logouts were run on different machines due to the need for SQL 2012 for the latter, so the CPU\/Elapsed times recorded may not be comparable in terms of raw magnitude.\u00a0 Test machines:<\/p>\n<ul>\n<li>No Missing Logouts: Windows 7 Professional and SQL 2008 R2 (both 64 bit), Dell Inspiron laptop w-Intel Core i5-2430M @ 2.4 GHz, 8GB RAM<\/li>\n<li>\u00a0Missing Logouts: Windows 7 Enterprise and SQL 2012 (both 64 bit), Lenovo laptop w-Intel Core i5-3230M @ 2.60 GHz, 12GB RAM<\/li>\n<\/ul>\n<h2>Conclusion<\/h2>\n<p>The important concepts we have demonstrated in this article are:<\/p>\n<ul>\n<li>Combining rows where login and logout events are paired.<\/li>\n<li>\u00a0A fast solution using the SQL 2012 analytic function LEAD for combining rows where login and logout events are paired, but some of the logout events are missing.<\/li>\n<li>The excellent solution proposed by SQL MVP Itzik Ben-Gan for packing intervals, with modifications to support both of the above cases.<\/li>\n<li>\u00a0Using the CROSS APPLY VALUES approach to convert islands (packed intervals) to gaps (the periods of no login activity).<\/li>\n<li>\u00a0Finally, and very importantly, that these solutions seem linearly scalable to multi-million row sets.<\/li>\n<\/ul>\n<h2>Further Reading<\/h2>\n<p>Dealing with event intervals is a fascinating and challenging problem in SQL.\u00a0 If you are interested in pursuing this further, I strongly recommend that you take a look at some of the links below.\u00a0 They provide even more advanced, high-performing techniques for dealing with time intervals.<\/p>\n<ul class=\"reference-list\">\n<li><a href=\"http:\/\/blogs.solidq.com\/en\/sqlserver\/packing-intervals\">Packing Intervals<\/a> by Itzik Ben-Gan<\/li>\n<li><a href=\"http:\/\/sqlmag.com\/t-sql\/sql-server-interval-queries\">Interval Queries in SQL Server<\/a> by Itzik Ben-Gan<\/li>\n<li><a href=\"http:\/\/sqlmag.com\/t-sql\/intervals-and-counts-part-1\"> Intervals and Counts, Part 1<\/a> by Itzik Ben-Gan<\/li>\n<li><a href=\"http:\/\/sqlmag.com\/t-sql\/intervals-and-counts-part-2\"> Intervals and Counts, Part 2<\/a> by Itzik Ben-Gan<\/li>\n<li><a href=\"http:\/\/sqlmag.com\/t-sql\/intervals-and-counts-part-3\"> Intervals and Counts, Part 3<\/a> by Itzik Ben-Gan<\/li>\n<li><a href=\"http:\/\/blogs.solidq.com\/en\/sqlserver\/static-relational-interval-tree\">A Static Relational Interval Tree<\/a> by Laurent Martin<\/li>\n<li><a href=\"http:\/\/blogs.solidq.com\/en\/businessanalytics\/advanced-interval-queries-static-relational-interval-tree\"> Advanced interval queries with the Static Relational Interval Tree<\/a> by Laurent Martin<\/li>\n<li><a href=\"http:\/\/blogs.solidq.com\/en\/businessanalytics\/using-static-relational-interval-tree-time-intervals\"> Using the Static Relational Interval Tree with time intervals<\/a> by Laurent Martin<\/li>\n<\/ul>\n<p>Thank you for your attention and we hope you found the information in this article interesting and ultimately useful.<\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Finding gaps of inactivity between overlapping time intervals in SQL Server &#8211; packing overlapping login, event, or occupancy periods into continuous ranges, then computing the gaps between them. Covers the algorithm, a complete T-SQL implementation, and a performance test harness for production-scale datasets.&hellip;<\/p>\n","protected":false},"author":221942,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143531],"tags":[4150,4252],"coauthors":[6800],"class_list":["post-1764","post","type-post","status-publish","format-standard","hentry","category-t-sql-programming-sql-server","tag-sql","tag-t-sql-programming"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/1764","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/221942"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=1764"}],"version-history":[{"count":10,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/1764\/revisions"}],"predecessor-version":[{"id":75096,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/1764\/revisions\/75096"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=1764"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=1764"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=1764"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=1764"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}