{"id":85951,"date":"2020-01-02T15:11:23","date_gmt":"2020-01-02T15:11:23","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=85951"},"modified":"2026-03-09T14:44:15","modified_gmt":"2026-03-09T14:44:15","slug":"introduction-to-gaps-and-islands-analysis","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/t-sql-programming-sql-server\/introduction-to-gaps-and-islands-analysis\/","title":{"rendered":"Gaps and Islands in SQL: Techniques &#038; Examples"},"content":{"rendered":"\n<p>Gaps and islands analysis is a SQL technique for finding consecutive sequences (islands) and missing values (gaps) in ordered data. An island is a contiguous run of values with no breaks &#8211; like dates January 1\u20135. A gap is the space between islands &#8211; like the missing January 6\u20138 before the next island starts on January 9. In T-SQL, you solve this by comparing each row\u2019s value against a generated sequence number: where the difference stays constant, you have an island; where it changes, you have a boundary. This guide explains the concept, walks through integer and date-based examples, and provides ready-to-use query patterns.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-introduction\">Introduction<\/h2>\n\n\n\n<p>One of the most significant challenges we face when analyzing data is pattern recognition. We seek to find ways in which our data deviates from the norm or conforms to a given norm. The goal is to identify tools that can be used to predict future behavior and make sense out of large volumes of data.<\/p>\n\n\n\n<p>Understanding boundaries and where a pattern begins or ends allows us to draw meaningful conclusions regarding our data. In terms of data, boundaries are more often seen as gaps or islands within any data set. Being able to efficiently locate gaps and islands enables us to use this data to gain meaningful insight into a system. We can identify winning and losing streaks, measure the strength of a system over time, find missing or duplicate data, and a variety of other interesting metrics.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-defining-gaps-and-islands\">Defining Gaps and Islands<\/h3>\n\n\n\n<p>Within a data set, an island of data is any ordered sequence where each row is in close proximity to the rows around it. For some data types and analysis, \u201cclose proximity\u201d will mean consecutive. Dates, integers, and letters of the alphabet can be ordered sequentially where two adjacent values will not be able to have additional values in between them.<\/p>\n\n\n\n<p>For example, there are no dates between October 23rd and October 24<sup>th<\/sup>. Similarly, there are no integers between 17 and 18 and no English letters between E and F. For these examples, an island of data could be defined as a sequence of consecutive values. A gap can be defined as a sequence of missing values.<\/p>\n\n\n\n<p>Without rounding or data modification, there are many data types for which \u201cconsecutive\u201d has no meaning. Decimals can be ordered, but more can always be placed in between. 12.4 comes before 12.5, but the value 12.45 can be inserted in between. 5:25 occurs after 5:24, but there are many more granular times in between.<\/p>\n\n\n\n<p>In theory, we can use the limits of precision on a data type to define what consecutive means, but this is typically impractical. More realistically, we will consider data points that are close together as sequential for the purposes of analysis. For example, two events that occur at 5:24 and 5:25 may be close enough in proximity that we may wish to highlight them as related. With knowledge of our data and how it works, we can easily make and test rules such as these.<\/p>\n\n\n\n<p>A gap is defined as an absence of data in between islands. Gaps require islands and islands require gaps. For example, if we were evaluating a set of letters of the alphabet and found E, M, and P missing, we could structure a set of islands of this data as follows:<\/p>\n\n\n<div class=\"block-core-list\">\n<ol class=\"wp-block-list\">\n<li>A-D<\/li>\n\n\n\n<li>F-L<\/li>\n\n\n\n<li>N-O<\/li>\n\n\n\n<li>Q-Z<\/li>\n<\/ol>\n<\/div>\n\n\n<p>The alphabet has been divided up into 4 islands of data. The gaps (all comprised of single letters) are as follows:<\/p>\n\n\n<div class=\"block-core-list\">\n<ol class=\"wp-block-list\">\n<li>E<\/li>\n\n\n\n<li>M<\/li>\n\n\n\n<li>P<\/li>\n<\/ol>\n<\/div>\n\n\n<p>A data set with N islands of data must have N-1 gaps in between them. We will typically ignore the edges of the data set and not consider them gaps, even if there are missing values. Testing for specific missing values is a different exercise and not a part of gaps\/islands analysis.<\/p>\n\n\n\n<p>A key component to gaps\/islands analysis is that the result set is dynamic. Mathematically, we are applying a clustering algorithm to our data, and for any data set, we may end up with a wide variety of results that range anywhere from:<\/p>\n\n\n<div class=\"block-core-list\">\n<ul class=\"wp-block-list\">\n<li>One giant island with no gaps.<\/li>\n\n\n\n<li>Lots of small islands surrounded by gaps.<\/li>\n\n\n\n<li>No islands, and one giant gap (an empty set).<\/li>\n<\/ul>\n<\/div>\n\n\n<p>If the list of gaps or islands is our result set, then its size and contents will vary organically based on the data we apply our analysis to. This provides flexibility and insight that may not be available via more rigid analysis.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-grouping-integers\">Grouping Integers<\/h3>\n\n\n\n<p>The simplest example of what gaps and islands are is to create a list of sequential, nonrepeating integers:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"679\" height=\"50\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/01\/word-image.jpeg\" alt=\"\" class=\"wp-image-85952\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>When we consider this list, we note the numbers range from 1 to 15 and that there are no missing integers. These fifteen integers comprise one big island with zero gaps. Let\u2019s remove some of the numbers from the data set above:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"517\" height=\"59\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/01\/word-image-1.jpeg\" alt=\"\" class=\"wp-image-85953\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>We have removed the numbers 3, 7, 8, and 13 from the list. If we were to perform a gaps\/islands analysis on this data, we would find the following islands:<\/p>\n\n\n<div class=\"block-core-list\">\n<ol class=\"wp-block-list\">\n<li>1-2<\/li>\n\n\n\n<li>4-6<\/li>\n\n\n\n<li>9-12<\/li>\n\n\n\n<li>14-15<\/li>\n<\/ol>\n<\/div>\n\n\n<p>In between those islands would be the following gaps:<\/p>\n\n\n<div class=\"block-core-list\">\n<ol class=\"wp-block-list\">\n<li>3<\/li>\n\n\n\n<li>7-8<\/li>\n\n\n\n<li>13<\/li>\n<\/ol>\n<\/div>\n\n\n<p>How do we solve this problem programmatically? We can model this simple data set in SQL Server like this:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">CREATE TABLE dbo.integers\n    (integer_id INT NOT NULL);\nINSERT INTO dbo.integers\n    (integer_id)\nVALUES\n    (1), (2), (4), (5), (6), (9), (10), (11), (12), (14), (15);<\/pre>\n\n\n\n<p>One way to identify gaps is to observe the overall row count and compare it to the expected row count, assuming no gaps existed. Consider the following query and results:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">SELECT\n    integer_id,\n    ROW_NUMBER() OVER (ORDER BY integer_id) AS rownum\nFROM dbo.integers;<\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"152\" height=\"232\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/01\/word-image-2.jpeg\" alt=\"\" class=\"wp-image-85954\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>Note that the first column contains what should be an unbroken list of integers, assuming none are missing. The second column contains the row number, also ordered by the integer ID. The result is that we can compare the left column to the right and in any scenario in which the row number falls further behind our ID column, we know another value is missing. The details of what we do with the above numbers as as follows:<\/p>\n\n\n<div class=\"block-core-list\">\n<ol class=\"wp-block-list\">\n<li>When integer_id increases by more than rownum, that provides the start of an island.<\/li>\n\n\n\n<li>The prior row to the missing values is the end of the previous island.<\/li>\n\n\n\n<li>Add 1 to the end of the previous island to obtain the start of the gap.<\/li>\n\n\n\n<li>Subtract 1 from the start of the next island to obtain the end of the gap.<\/li>\n<\/ol>\n<\/div>\n\n\n<p>We can then use T-SQL to demonstrate how we can crunch the data above into a set of gaps or islands:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">WITH CTE_ISLANDS AS (\n    SELECT\n        integer_id,\n\tinteger_id - ROW_NUMBER() \n            OVER (ORDER BY integer_id) AS island_quantity\n    FROM dbo.integers)\nSELECT\n    MIN(integer_id) AS island_start,\n    MAX(integer_id) AS island_end\nFROM CTE_ISLANDS\nGROUP BY island_quantity;<\/pre>\n\n\n\n<p>When we execute the above code, we get the following results:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"175\" height=\"120\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/01\/word-image-3.jpeg\" alt=\"\" class=\"wp-image-85955\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>By subtracting the row number from the integer ID, we were able to determine the difference and knew that whenever that number increments, a new gap has been passed in the data set. The <code>ROW_NUMBER<\/code> window function provides a sequential numbering for our integer list, ranging from 1 to the number of integers in our list (11).<\/p>\n\n\n\n<p>We can calculate gaps using the <code>LEAD<\/code> window function:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">WITH CTE_GAPS AS (\n    SELECT\n        integer_id,\n        LEAD(integer_id) \n            OVER (ORDER BY integer_id) as next_integer\n    FROM dbo.integers\n)\nSELECT\n    integer_id + 1 AS gap_start,\n    next_integer - 1 AS gap_end\nFROM CTE_GAPS\nWHERE next_integer &lt;&gt; integer_id + 1;<\/pre>\n\n\n\n<p>By comparing the next expected integer ID with the actual next integer ID, we can determine when a gap will start and end. If the current integer is 6 and the next is 9, we immediately know that 7 and 8 are skipped and that there is an upcoming gap comprised of those two numbers.<\/p>\n\n\n\n<p>There are many ways to calculate gaps and islands using a variety of window functions, subqueries, and CTE structures. Our goal in this article is to keep the T-SQL as simple as possible and avoid obfuscated or overly long code.<\/p>\n\n\n\n<p><strong>Read also:<\/strong> <a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/t-sql-programming-sql-server\/temporary-tables-in-sql-server\/\" target=\"_blank\" rel=\"noreferrer noopener\">Temporary tables for storing intermediate gap analysis results<\/a><\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-managing-duplicates\">Managing Duplicates<\/h3>\n\n\n\n<p>Consider the following alteration to our data set above:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">INSERT INTO dbo.integers\n    (integer_id)\nVALUES\n    (2), (12), (12), (13);<\/pre>\n\n\n\n<p>If we run our islands query from earlier, we get the following results:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"175\" height=\"98\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/01\/word-image-4.jpeg\" alt=\"\" class=\"wp-image-85956\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>These results are nonsense, and we can see that the discrepancies introduced by duplicate values need to be dealt with in order to get meaningful results. The <code>ROW_NUMBER<\/code> window function is useful, but it counts each row as a new value, even if they are the same. Ideally, we would see multiple rows for the same integer as one and the same, and be assigned a single row number, rather than multiple.<\/p>\n\n\n\n<p>The <code>DENSE_RANK<\/code> window function provides exactly what we are looking for. This function will rank each row in a data set based on the number of distinct values prior to the current row. As a result, duplicates will not impact the results. We can adjust our query from earlier by replacing <code>ROW_NUMBER<\/code> with <code>DENSE_RANK<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">WITH CTE_ISLANDS AS (\n    SELECT\n        integer_id,\n        integer_id - \n        DENSE_RANK() OVER \n           (ORDER BY integer_id) AS island_quantity\n    FROM dbo.integers)\nSELECT\n    MIN(integer_id) AS island_start,\n    MAX(integer_id) AS island_end\nFROM CTE_ISLANDS\nGROUP BY island_quantity;<\/pre>\n\n\n\n<p>The results are as follows:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"228\" height=\"122\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/01\/word-image-5.jpeg\" alt=\"\" class=\"wp-image-85957\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>After inserting four additional values, the result was to repeat three existing values, as well as introduce a new integer that was previously missing. One less gap means one less island. If we were interested in how many values (duplicate or not) were within each range, we could add that using an added <code>COUNT<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">WITH CTE_ISLANDS AS (\n    SELECT\n        integer_id,\n        integer_id - DENSE_RANK() \n            OVER (ORDER BY integer_id) AS island_quantity\n    FROM dbo.integers)\nSELECT\n    MIN(integer_id) AS island_start,\n    MAX(integer_id) AS island_end,\n    MAX(integer_id) - MIN(integer_id) + 1 AS island_length,\n    COUNT(*) AS row_count\nFROM CTE_ISLANDS\nGROUP BY island_quantity;<\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"321\" height=\"86\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/01\/word-image-6.jpeg\" alt=\"\" class=\"wp-image-85958\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>The <em>row_count<\/em> column tells us how many rows are included within each island, including duplicates, which allows us to better understand how prevalent duplicates are within our data set. <em>Island_length<\/em> was also added as a measure of how many consecutive and unique values exist within each island.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-finding-islands-within-decimal-data-sets\">Finding Islands Within Decimal Data Sets<\/h3>\n\n\n\n<p>The world is rarely composed of a nice, neat set of consecutive, nonzero, non-NULL integers. Most real data is large, complicated, and ever-changing. Islands can be found within any data set. The key to doing this is establishing rules as to how data relates and is significant. These rules establish two key parameters for analysis:<\/p>\n\n\n<div class=\"block-core-list\">\n<ul class=\"wp-block-list\">\n<li>Proximity<\/li>\n\n\n\n<li>Boundaries<\/li>\n<\/ul>\n<\/div>\n\n\n<p>Proximity allows us to correlate data to itself, resulting in groups of data. When data transitions from being correlated to uncorrelated, boundaries are created that represent the ends of each island and the ends of its corresponding gaps.<\/p>\n\n\n\n<p>Let\u2019s create a sample data set similar to earlier, but this time using decimals, rather than integers:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">CREATE TABLE dbo.decimals\n    (decimal_id DECIMAL(14,8) NOT NULL);\nINSERT INTO dbo.decimals\n    (decimal_id)\nVALUES\n    (-9999.9999), (-17.9597), (-17.9596), (-17.953), (-17.825), \n    (-5.6), (-5.2), (-4), (-2.68741), (-0.0000001), (0), \n    (1.00056), (2.6), (2.77777), (26.948), (27.1), (17000.17);<\/pre>\n\n\n\n<p>This data set has a wide variety of values that includes positive, negative, decimal, and whole numbers. To define islands over this data, we need to establish rules that guide adjacency and create correlated data points that comprise those islands. The simplest and most common rule to apply to data like this would be to consider any two rows where the values of <em>decimal_id<\/em> are within some arbitrary amount.<\/p>\n\n\n\n<p>For testing, let\u2019s define a rule that all rows where <em>decimal_id<\/em> is within 1 unit of each other are related and comprise an island. Our T-SQL is going to have to evolve to manage data that is no longer consecutive. From this point on, we will use multiple CTEs to accomplish each task needed to crunch this data:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">WITH CTE_DECIMAL_DATA AS (\n    SELECT\n        decimal_id,\n        LAG(decimal_id) OVER \n            (ORDER BY decimal_id) AS previous_decimal_id,\n        LEAD(decimal_id) OVER \n            (ORDER BY decimal_id) AS next_decimal_id,\n        ROW_NUMBER() OVER \n            (ORDER BY decimals.decimal_id) AS island_location\n    FROM dbo.decimals),\nCTE_ISLAND_START AS (\n    SELECT\n        ROW_NUMBER() OVER (ORDER BY decimal_id) AS island_number,\n        decimal_id AS island_start_decimal_id,\n        island_location AS island_start_location\n    FROM CTE_DECIMAL_DATA\n    WHERE decimal_id - previous_decimal_id &gt; 1\n        OR CTE_DECIMAL_DATA.previous_decimal_id IS NULL),\nCTE_ISLAND_END AS (\n    SELECT\n        ROW_NUMBER() \n             OVER (ORDER BY decimal_id) AS island_number,\n        decimal_id AS island_end_decimal_id,\n        island_location AS island_end_location\n    FROM CTE_DECIMAL_DATA\n    WHERE next_decimal_id - decimal_id &gt; 1\n        OR CTE_DECIMAL_DATA.next_decimal_id IS NULL)\nSELECT\n    CTE_ISLAND_START.island_start_decimal_id,\n    CTE_ISLAND_END.island_end_decimal_id,\n    (SELECT COUNT(*) \n     FROM CTE_DECIMAL_DATA \n     WHERE CTE_DECIMAL_DATA.decimal_id BETWEEN \n        CTE_ISLAND_START.island_start_decimal_id \n        AND CTE_ISLAND_END.island_end_decimal_id) \n     AS island_row_count\nFROM CTE_ISLAND_START\nINNER JOIN CTE_ISLAND_END\nON CTE_ISLAND_END.island_number = CTE_ISLAND_START.island_number;<\/pre>\n\n\n\n<p>This T-SQL can be broken into 4 steps:<\/p>\n\n\n<div class=\"block-core-list\">\n<ol class=\"wp-block-list\">\n<li>Collect the data to be processed. In this example, it is in <code>CTE_DECIMAL_DATA<\/code>.<\/li>\n\n\n\n<li>Find all island starting points based on lack of proximity to a previous row. In this example, it is in <code>CTE_ISLAND_START<\/code>.<\/li>\n\n\n\n<li>Find all island ending points based on lack of proximity to a next row. In this example, it is in <code>CTE_ISLAND_END<\/code>.<\/li>\n\n\n\n<li>Join island starting points to island ending points and report on the results.<\/li>\n<\/ol>\n<\/div>\n\n\n<p>Since the number of island starting points is guaranteed to match ending points, we know that we will get a clean join at the end of our T-SQL. The results of this query are as follows:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"380\" height=\"211\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/01\/word-image-7.jpeg\" alt=\"\" class=\"wp-image-85959\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>Our set of 17 decimals has been crunched down into ten islands containing anywhere from one to four rows. An island can span more than 1 unit if each consecutive row happens to fall within one unit of the next. For example, consider the following number sequence:<\/p>\n\n\n\n<p>1\u20261.5\u20262.25\u20263\u20264\u20265\u20266<\/p>\n\n\n\n<p>Apply our algorithm to these numbers, and we will get a single island with all seven numbers in it as each decimal is within one unit of the next one. What if we add some duplicate values into our table:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">INSERT INTO dbo.decimals\n    (decimal_id)\nVALUES\n    (-9999.9999), (-17.825), (0), (27.1);<\/pre>\n\n\n\n<p>When we run our analysis query, we get the following results:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"384\" height=\"236\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/01\/word-image-8.jpeg\" alt=\"\" class=\"wp-image-85961\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>Our new algorithm protects us against bad\/nonsensical data resulting from duplicate values. Either a decimal has a value within one unit of it, or it does not. The count at the end of the query will include any number of contained values, regardless of their frequency. While this T-SQL is longer than what we used previously, we will use it going forward in our analysis for several reasons:<\/p>\n\n\n<div class=\"block-core-list\">\n<ol class=\"wp-block-list\">\n<li>Each CTE is relatively simple and easy to understand and modify.<\/li>\n\n\n\n<li>We can copy and paste the syntax for other types of data with few modifications.<\/li>\n\n\n\n<li>The final join will always be the same for any data type.<\/li>\n\n\n\n<li>Adding metrics via subselects, CTEs, or other added queries is easy.<\/li>\n<\/ol>\n<\/div>\n\n\n<p>This is a scenario where longer and more explicit code is beneficial and will buy us scalability for future queries we write.<\/p>\n\n\n\n<p>The benefit of this sort of analysis, in general, is that it allows us to crunch together rows of data into groupings based on any metric. While we started with integers and expanded into decimals, we could very easily apply the latest query to any other data type that we dream up.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-finding-islands-in-date-time-data\">Finding Islands in Date\/Time Data<\/h3>\n\n\n\n<p>The most practical application of gaps\/islands analysis is to apply it to dates and times. We often want to know if events are related to each other, and one of the top ways they may be related is if they occur near each other in time. Consider some of these scenarios:<\/p>\n\n\n<div class=\"block-core-list\">\n<ul class=\"wp-block-list\">\n<li>Marketing campaigns send our advertisements regularly. People view and may click on these ads. Were clicks and purchases based on the ads or not?<\/li>\n\n\n\n<li>A development team manages a variety of servers that support an application. If apps, services, or hardware fails in sequence, were those outages related?<\/li>\n\n\n\n<li>The CDC tracks search terms, looking for internet searches that involve medical symptoms, such as a runny nose, headache, or muscle pain. The CDC also tracks occurrences of infectious diseases. When are symptom searches and disease incidences close enough to be related?<\/li>\n\n\n\n<li>When is a stock investment team on a winning streak? How many consecutive trades resulted in more than a specified amount of gains?<\/li>\n\n\n\n<li>In sports, what qualifies as a winning streak? Can we measure streaks in other metrics, such as ice time in hockey, at-bats in baseball, or shots in basketball? What if we filter by metrics to look for patterns based on details, such as handedness, time of day, location, temperature, and more\u2026?<\/li>\n<\/ul>\n<\/div>\n\n\n<p>This is a rabbit hole that can take us down a significant analytics journey. Let\u2019s take our previous query and extend it to time data:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">CREATE TABLE dbo.datetimes\n    (datetime_data DATETIME2(3) NOT NULL);\nINSERT INTO dbo.datetimes\n    (datetime_data)\nVALUES\n    ('7\/17\/2019 00:05:00.157'), ('7\/17\/2019 00:07:25.000'), \n    ('7\/17\/2019 00:30:00.777'), ('7\/18\/2019 12:00:00.565'), \n    ('7\/19\/2019 12:00:00.980'), ('7\/20\/2019 12:00:00.098'),\n    ('7\/21\/2019 12:00:00.332'), ('7\/15\/2019 21:46:15.197'), \n    ('7\/15\/2019 21:42:00.000'), ('7\/1\/2019 14:12:06.674'),\t\n    ('7\/1\/2019 14:12:06.986'),  ('7\/3\/2019 01:11:02.001'), \n    ('7\/6\/2019 09:58:58.840'), ('7\/10\/2019 16:33:00.702'), \n    ('7\/12\/2019 23:19:00.411');<\/pre>\n\n\n\n<p>This table contains fifteen datetime values with precision up to 3 decimal places. Let\u2019s say we are looking at software exceptions and want to group together any events that occur close together as potentially related. We decide that any events that occur within 5 minutes of another event are related. Using this single rule, we can modify our T-SQL used for decimals to analyze this data as well:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">WITH CTE_DATETIME_DATA AS (\n    SELECT\n        datetime_data,\n        LAG(datetime_data) \n            OVER (ORDER BY datetime_data) AS previous_datetime,\n        LEAD(datetime_data) \n            OVER (ORDER BY datetime_data) AS next_datetime,\n        ROW_NUMBER() OVER (ORDER BY datetimes.datetime_data) \n        AS island_location \n    FROM dbo.datetimes),\nCTE_ISLAND_START AS (\n    SELECT\n        ROW_NUMBER() OVER (ORDER BY datetime_data) AS island_number,\n        datetime_data AS island_start_datetime,\n        island_location AS island_start_location\n    FROM CTE_DATETIME_DATA\n    WHERE DATEDIFF(MINUTE, previous_datetime, datetime_data) &gt; 5\n        OR CTE_DATETIME_DATA.previous_datetime IS NULL),\nCTE_ISLAND_END AS (\n    SELECT\n        ROW_NUMBER() \n            OVER (ORDER BY datetime_data) AS island_number,\n        datetime_data AS island_end_datetime,\n        island_location AS island_end_location\n    FROM CTE_DATETIME_DATA\n    WHERE DATEDIFF(MINUTE, datetime_data, next_datetime) &gt; 5\n        OR CTE_DATETIME_DATA.next_datetime IS NULL)\nSELECT\n    CTE_ISLAND_START.island_start_datetime,\n    CTE_ISLAND_END.island_end_datetime,\n    (SELECT COUNT(*) \n     FROM CTE_DATETIME_DATA \n     WHERE CTE_DATETIME_DATA.datetime_data BETWEEN \n        CTE_ISLAND_START.island_start_datetime AND \n        CTE_ISLAND_END.island_end_datetime) \n    AS island_row_count\nFROM CTE_ISLAND_START\nINNER JOIN CTE_ISLAND_END\nON CTE_ISLAND_END.island_number = CTE_ISLAND_START.island_number;<\/pre>\n\n\n\n<p>Our syntax is identical to earlier with the sole exception of our island definition. Here, an island boundary is defined by a row in which the previous or next datetime value is more than five minutes away from it. The results are as follows:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"407\" height=\"274\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/01\/word-image-9.jpeg\" alt=\"\" class=\"wp-image-85962\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>The results show pairs of related events, with the rest appearing farther apart than the allotted window of 5 minutes. We note in the results that some times were repeated often, for example, there were four days in a row with events at about 12:00.<\/p>\n\n\n\n<p>Islands analysis specifically is not built to capture these, though if we knew with certainty that these were singular events daily, we could adjust our query to look at islands as defined by events that are a day apart, but at similar times of the day (within a 5 minute rolling window):<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">WITH CTE_DATETIME_DATA AS (\n    SELECT\n    datetime_data,\n    LAG(datetime_data) \n        OVER (ORDER BY datetime_data) AS previous_datetime,\n    LEAD(datetime_data) \n        OVER (ORDER BY datetime_data) AS next_datetime,\n    ROW_NUMBER() OVER \n        (ORDER BY datetimes.datetime_data) AS island_location\n    FROM dbo.datetimes),\nCTE_ISLAND_START AS (\n    SELECT\n        ROW_NUMBER() \n            OVER (ORDER BY datetime_data) AS island_number,\n        datetime_data AS island_start_datetime,\n        island_location AS island_start_location\n    FROM CTE_DATETIME_DATA\n    WHERE DATEDIFF(MINUTE, previous_datetime, datetime_data) &gt; 5 \n        AND DATEDIFF(DAY, previous_datetime, datetime_data) &lt;&gt; 1\n        OR CTE_DATETIME_DATA.previous_datetime IS NULL),\nCTE_ISLAND_END AS (\n    SELECT\n        ROW_NUMBER() OVER (ORDER BY datetime_data) AS island_number,\n        datetime_data AS island_end_datetime,\n        island_location AS island_end_location\n    FROM CTE_DATETIME_DATA\n    WHERE DATEDIFF(MINUTE, datetime_data, next_datetime) &gt; 5 \n        AND DATEDIFF(DAY, datetime_data, next_datetime) &lt;&gt; 1\n        OR CTE_DATETIME_DATA.next_datetime IS NULL)\nSELECT\n    CTE_ISLAND_START.island_start_datetime,\n    CTE_ISLAND_END.island_end_datetime,\n    (SELECT COUNT(*) \n     FROM CTE_DATETIME_DATA \n     WHERE CTE_DATETIME_DATA.datetime_data \n         BETWEEN CTE_ISLAND_START.island_start_datetime AND \n         CTE_ISLAND_END.island_end_datetime) \n     AS island_row_count\nFROM CTE_ISLAND_START\nINNER JOIN CTE_ISLAND_END\nON CTE_ISLAND_END.island_number = CTE_ISLAND_START.island_number;<\/pre>\n\n\n\n<p>This is a far more customized query that will group either by events that are within 5 minutes of each other or those that are a day apart and also within 5 minutes:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"410\" height=\"201\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2020\/01\/word-image-10.jpeg\" alt=\"\" class=\"wp-image-85963\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>The results allow us to focus also on those scenarios where the same alerts occurred day after day at similar times. A query like this shows that we can easily tailor our analysis to unusual scenarios. An island of data is defined by the filters used to determine the starting points and endpoints. These filters can be simple or wildly complex, depending on how detailed our use-cases are.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-conclusion\">Conclusion<\/h3>\n\n\n\n<p>Gaps and islands analysis allows us to crunch data based on proximity rather than fixed groupings. This provides greater flexibility and the ability to generate result sets that otherwise could be exceptionally challenging to create.<\/p>\n\n\n\n<p>The basic concept of a data island can be extended from integers to other data types, such as strings, decimals, dates, and times. The T-SQL syntax used can be reused for each iteration of our work with only minimal changes to the details of how it works. This allows for maintainable code that can be reliably implemented and reused as needed.<\/p>\n\n\n\n<p>Ultimately, these analytics allow us to group data organically to identify patterns or related events. These insights can be used for monitoring, alerting, or further analytics and decision-making processes.<\/p>\n\n\n\n<p><a href=\"http:\/\/assets.red-gate.com\/simple-talk\/Gaps-and-Islands\/BaseballStats2019.bak\">Download the dataset for this article<\/a> (750MB).<\/p>\n\n\n\n<p><strong>Read also:<\/strong><br><a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/t-sql-programming-sql-server\/using-a-subquery-in-a-select-statement\/\" target=\"_blank\" rel=\"noreferrer noopener\">Subqueries in SQL SELECT statements<\/a><br><a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/t-sql-programming-sql-server\/efficient-solutions-to-gaps-and-islands-challenges\/\" target=\"_blank\" rel=\"noreferrer noopener\">Efficient solutions to gaps and islands challenges<\/a><\/p>\n\n\n\n<section id=\"my-first-block-block_531f34d09a9f285cc3a4ca7b5532863c\" class=\"my-first-block alignwide\">\n    <div class=\"bg-brand-600 text-base-white py-5xl px-4xl rounded-sm bg-gradient-to-r from-brand-600 to-brand-500 red\">\n        <div class=\"gap-4xl items-start md:items-center flex flex-col md:flex-row justify-between\">\n            <div class=\"flex-1 col-span-10 lg:col-span-7\">\n                <h3 class=\"mt-0 font-display mb-2 text-display-sm\">Fast, reliable and consistent SQL Server development&#8230;<\/h3>\n                <div class=\"child:last-of-type:mb-0\">\n                                            &#8230;with SQL Toolbelt Essentials. 10 ingeniously simple tools for accelerating development, reducing risk, and standardizing workflows.                                    <\/div>\n            <\/div>\n                                            <a href=\"https:\/\/www.red-gate.com\/products\/sql-toolbelt-essentials\/\" class=\"btn btn--secondary btn--lg\" aria-label=\"Learn more &amp; try for free: Fast, reliable and consistent SQL Server development...\">Learn more &amp; try for free<\/a>\n                    <\/div>\n    <\/div>\n<\/section>\n\n\n<section id=\"faq\" class=\"faq-block my-5xl\">\n    <h2>FAQs: SQL gaps and islands analysis<\/h2>\n\n                        <h3 class=\"mt-4xl\">1. What is the gaps and islands problem in SQL?<\/h3>\n            <div class=\"faq-answer\">\n                <p>The gaps and islands problem is about identifying contiguous sequences (islands) and missing ranges (gaps) in ordered datasets. For example, given the numbers 1, 2, 3, 7, 8, 12 &#8211; the islands are {1\u20133}, {7\u20138}, {12} and the gaps are {4\u20136}, {9\u201311}. It\u2019s commonly used for finding missing dates, identifying consecutive streaks, detecting data quality issues, and analyzing time-series patterns.<\/p>\n            <\/div>\n                    <h3 class=\"mt-4xl\">2. How do you find gaps in a sequence in SQL Server?<\/h3>\n            <div class=\"faq-answer\">\n                <p>Use ROW_NUMBER() to generate a sequential counter, then subtract it from the actual value column. Where this difference changes, a new island begins. The gaps are the ranges between the end of one island and the start of the next. You can also use LEAD() and LAG() window functions to compare each row with its neighbors and identify where breaks occur.<\/p>\n            <\/div>\n                    <h3 class=\"mt-4xl\">3. Can you solve gaps and islands with window functions?<\/h3>\n            <div class=\"faq-answer\">\n                <p>Yes &#8211; window functions are the modern approach. Use ROW_NUMBER() OVER (ORDER BY value) to create a running sequence, then group by the difference between the actual value and the row number. LAG() and LEAD() offer alternative approaches for detecting boundaries. These methods are more efficient than self-joins or cursor-based approaches for large datasets.<\/p>\n            <\/div>\n            <\/section>\n","protected":false},"excerpt":{"rendered":"<p>Master gaps and islands analysis in SQL Server with practical T-SQL examples. Learn to identify consecutive sequences, find missing values, and solve classic data pattern problems.&hellip;<\/p>\n","protected":false},"author":329827,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143531],"tags":[95509],"coauthors":[101655],"class_list":["post-85951","post","type-post","status-publish","format-standard","hentry","category-t-sql-programming-sql-server","tag-standardize"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/85951","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/329827"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=85951"}],"version-history":[{"count":7,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/85951\/revisions"}],"predecessor-version":[{"id":109082,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/85951\/revisions\/109082"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=85951"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=85951"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=85951"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=85951"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}