{"id":78223,"date":"2018-04-18T03:59:58","date_gmt":"2018-04-18T03:59:58","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=78223"},"modified":"2019-01-22T17:17:13","modified_gmt":"2019-01-22T17:17:13","slug":"finding-overlapping-ranges-data","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/other\/finding-overlapping-ranges-data\/","title":{"rendered":"Finding overlapping ranges of data"},"content":{"rendered":"<p>This week, I had a problem where I needed to find and eliminate from the results of my query, data with overlapping ranges. I have written about this topic before, in my database design book <a href=\"http:\/\/drsql.org\/publications\">book<\/a>, in regards to building a trigger to avoid overlapping ranges. But even though I have written on the topic there, I still use Google just like you to get quick help (I use books when I want to learn, or expand my knowledge on a topic in depth, blogs when I need a simple answer to a simple or complex question.) <code><\/code><\/p>\n<p>The problem at hand is most often associated with date based data, such as effective dates for a row in a temporal\/type 2 dimension table, or other cases like appointment times, etc. But the algorithm is the same with numbers and is a bit easier to read since we don&#8217;t have the same issues with roundoff and decimal places (the query is complex enough on its own to show in a blog post). From a progression of start and end values in each row, we are going to look at how to check to make sure that there are no two rows that are in conflict (no range should contain another ranges value at all).<\/p>\n<p>In the following sample set, I included a group value, to make the example more typical, as usually you need to do this will tens of thousands or rows, looking for overlaps in subsets of that data. For example, consider a <span style=\"float: none;background-color: transparent;color: #333333;cursor: text;font-family: Georgia,'Times New Roman','Bitstream Charter',Times,serif;font-size: 16px;font-style: normal;font-variant: normal;font-weight: 400;letter-spacing: normal;text-align: left;text-decoration: none;text-indent: 0px\">doctor&#8217;s office appointment system. You have <\/span>offices, doctors, and assistants to assign to be in a location for particular times. You do not want to allow (or at least want to know about), where any of these items to have overlapping work scheduled. (The problem that prompted me to write this blog was a similar problem and used the same algorithm with ~50 million rows, taking just minutes on a very moderately spec&#8217;d SQL Server. I will hit a bit on performance a the end of the blog, after establishing the algorithm).<\/p>\n<p>For the example, we will use the following data set:<\/p>\n<p><code>GroupValue \u00a0 Start\u00a0 End<\/code><code><br \/>\n------------ ------ ------ <br \/>\nX \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\u00a0 1 \u00a0 \u00a0\u00a0 2 <br \/>\nX \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\u00a0 3 \u00a0\u00a0\u00a0\u00a0 4 <br \/>\nX \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\u00a0 5 \u00a0\u00a0\u00a0\u00a0 6 <br \/>\nX \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\u00a0 8 \u00a0\u00a0\u00a0\u00a0 10 <br \/>\nX \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\u00a0 9 \u00a0\u00a0\u00a0\u00a0 15 <br \/>\nX \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\u00a0 11 \u00a0 \u00a0 14 <br \/>\nX \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\u00a0 16 \u00a0 \u00a0 16 <br \/>\nY \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\u00a0 9 \u00a0\u00a0\u00a0\u00a0 20 <br \/>\nY \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\u00a0 22 \u00a0 \u00a0 24<\/code><\/p>\n<p>There are two ways to implement a range of data, either with an endpoint that is in the range (inclusive), or one that is out of it (exclusive). Take the first two rows, they mean that the first range is from 1-2, inclusively, and 3-4.<\/p>\n<p><code><span style=\"color: #222222;font-family: Consolas\">GroupValue \u00a0  Start\u00a0 End<br \/>\n<\/span>------------ ------ ------ <\/code><br \/>\n <code>X \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\u00a0 1 \u00a0 \u00a0\u00a0 2 <\/code><br \/>\n <code>X \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\u00a0 3 \u00a0\u00a0\u00a0\u00a0 4<\/code><\/p>\n<p>For integers this is simple. But if you were doing numeric(10,3), the endpoint would need to be 2.999 and 4.999. Dates are more weird, particularly if you are still needing to deal with the datatime datatype, where the fractional time range goes from .997 to .000.\u00a0<\/p>\n<p>This type of implementation is what will naturally occur if you are working with &#8220;real&#8221; data. Like if you want to know the range of values for a given group, you might say:<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true \"><code>SELECT MIN(value) AS min_value, MAX(value) AS max_value<\/code><\/pre>\n<p>How you define the rest of the query is important to the problem you are solving, but the point here is that the max_value will end up inclusive by its nature, and adding a value to it to make it just outside of the range often makes the data more confusing to work with as you won&#8217;t be able to intuitively join to the row(s) that match the max_value anymore.<\/p>\n<p><span style=\"float: none;background-color: transparent;color: #333333;cursor: text;font-family: Georgia,'Times New Roman','Bitstream Charter',Times,serif;font-size: 16px;font-style: normal;font-variant: normal;font-weight: 400;letter-spacing: normal;text-align: left;text-decoration: none;text-indent: 0px\">The other way to go is with an end value that is the smallest value that is out of the range. For our example, 1 to 2 would be represented as 1 and 3:<\/span><\/p>\n<p><code><span style=\"color: #222222;font-family: Consolas\">GroupValue \u00a0  Start\u00a0 End<br \/>\n<\/span>------------ ------ ------ <\/code><br \/>\n <code>X \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\u00a0 1 \u00a0 \u00a0\u00a0 3 <\/code><br \/>\n <code>X \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\u00a0 3 \u00a0 \u00a0\u00a0 5<\/code><\/p>\n<p>So in queries, instead of using BETWEEN, you would use &gt;= Start and &lt; End. This is a LOT easier to implement when you are coding a system to add a new row and expire the previous since you don&#8217;t have to deal with any precision of the datatype used, Just copy the expiring row&#8217;s end value to be the start of the new row.\u00a0<\/p>\n<p>To demonstrate, I will start with the following table, and data, which is the data in the original table, with an identity primary key added, and unique constraint on the GroupValue, StartValue, and EndValue to avoid one very common overlapping scenario.<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true \">USE tempdb;\r\nGO\r\n\r\nIF NOT EXISTS ( SELECT *\r\n                FROM sys.schemas\r\n                WHERE name = 'Demo')\r\n EXEC('CREATE SCHEMA Demo;'); --Doing this in TempDb \r\nGO\r\n\r\nDROP TABLE IF EXISTS Demo.GroupValueRange;\r\nGO\r\n\r\nCREATE TABLE Demo.GroupValueRange\r\n(\r\n     GroupValueRangeId int IDENTITY CONSTRAINT PKGroupValueRange PRIMARY KEY,\r\n     GroupValue char(1) NOT NULL,\r\n     StartValue int,\r\n     EndValue int,\r\n CONSTRAINT AKGroupValueRange UNIQUE\r\n (\r\n     GroupValue,\r\n     StartValue,\r\n     EndValue),\r\n CONSTRAINT CHKGroupValueRange CHECK(StartValue &lt;= EndValue)\r\n);\r\n\r\nINSERT INTO Demo.GroupValueRange(GroupValue, StartValue, EndValue)\r\nVALUES('X', 1, 2),\r\n      ('X', 3, 4),\r\n      ('X', 5, 6),\r\n      ('X', 8, 10),\r\n      ('X', 9, 15),\r\n      ('X', 11, 14),\r\n      ('X', 16, 16),\r\n      ('Y', 9, 20),\r\n      ('Y', 22, 24);<\/pre>\n<p>The algorithm for finding\u00a0overlapping ranges is a matter of looking to see if one of two conditions exists in your data. We will define the conditions here in terms of our sample set:<\/p>\n<p>1. Is a StartValue in one row in a Group is located between the StartValue and EndValue of another, in the same Group<br \/>\n 2. Same as 1, but see if the EndValue of one row is between the StartValue and EndValue<\/p>\n<p>If one range completely engulfs another; such as a row with values 10,20 would be engulfed by 1,100; then both of these criteria will be met. The query will look like this, with a fairly messy looking CASE expression to allow you to see what the difference is between the rows. Note that I don&#8217;t use BETWEEN in the query, as you could, because when you need to convert to the non-inclusive type of implementation, the only real difference is in the magnitude comparison operators and so you can simply change a few characters and it works (which we will do a bit later on).\u00a0 Where the two conditions are implemented are tagged with \/* # *\/.<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true \">SELECT GroupValueRange.*,\r\n       '-------&gt;' AS Conflict,\r\n       ToCompare.*,\r\n       CASE WHEN ToCompare.StartValue &gt;= GroupValueRange.StartValue\r\n             AND ToCompare.StartValue &lt;= GroupValueRange.EndValue\r\n             AND ToCompare.EndValue &gt;= GroupValueRange.StartValue\r\n             AND ToCompare.EndValue &lt;= GroupValueRange.EndValue\r\n       THEN 'Engulfed'\r\n       WHEN ToCompare.StartValue &gt;= GroupValueRange.StartValue\r\n            AND ToCompare.StartValue &lt;= GroupValueRange.EndValue\r\n       THEN 'Start Between'\r\n       WHEN ToCompare.EndValue &gt;= GroupValueRange.StartValue\r\n            AND ToCompare.EndValue &lt;= GroupValueRange.EndValue\r\n       THEN 'End Between'\r\n       ELSE 'Something Wrong'\r\n       END AS Issue\r\nFROM Demo.GroupValueRange\r\n        JOIN Demo.GroupValueRange AS ToCompare\r\n            ON ToCompare.GroupValue = GroupValueRange.GroupValue --every row overlaps itself\r\n               AND ToCompare.GroupValueRangeId &lt;&gt; GroupValueRange.GroupValueRangeId\r\n               AND ( ( ToCompare.StartValue &gt;= GroupValueRange.StartValue \/*1*\/\r\n                       AND ToCompare.StartValue &lt;= GroupValueRange.EndValue)\r\n                      OR ( ToCompare.EndValue &gt;= GroupValueRange.StartValue \/*2*\/\r\n                           AND ToCompare.EndValue &lt;= GroupValueRange.EndValue));<\/pre>\n<p>For our data, it returns the following:<\/p>\n<p><code>GroupValueRangeId GroupValue StartValue\u00a0 EndValue \u00a0\u00a0 Conflict<\/code><br \/>\n <code>----------------- ---------- ----------- ----------- --------<\/code><br \/>\n <code>4 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 X \u00a0 \u00a0 \u00a0 \u00a0\u00a0 8 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 10 \u00a0 \u00a0 \u00a0 \u00a0\u00a0 -------&gt; ...<\/code><br \/>\n <code>5 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 X \u00a0 \u00a0 \u00a0 \u00a0\u00a0 9 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 15 \u00a0 \u00a0 \u00a0 \u00a0\u00a0 -------&gt; ...<\/code><br \/>\n <code>5 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 X \u00a0 \u00a0 \u00a0 \u00a0\u00a0 9 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 15 \u00a0 \u00a0 \u00a0 \u00a0\u00a0 -------&gt; ...<\/code><\/p>\n<p><code>GroupValueRangeId GroupValue StartValue\u00a0 EndValue Issue<\/code><br \/>\n <code>----------------- ---------- ----------- -------- ---------------<\/code><br \/>\n <code>5<span style=\"color: #222222;font-family: Consolas\"> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <\/span> X \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0  9<span style=\"color: #222222;font-family: Consolas\"> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <\/span> 15 \u00a0 \u00a0 \u00a0 Start Between<\/code><br \/>\n <code>4<span style=\"color: #222222;font-family: Consolas\"> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <\/span> X \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0  8<span style=\"color: #222222;font-family: Consolas\"> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <\/span> 10 \u00a0 \u00a0 \u00a0 End Between<\/code><br \/>\n <code>6<span style=\"color: #222222;font-family: Consolas\"> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <\/span> X \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0  11<span style=\"color: #222222;font-family: Consolas\"> \u00a0 \u00a0 \u00a0 \u00a0\u00a0 <\/span>14 \u00a0 \u00a0 \u00a0 Engulfed<\/code><\/p>\n<p>You can see in the set that when only one condition is met, the rows will show up twice, since we joined the table to itself. When both conditions are met, the row will only show up once, because the start and end values will be between the other copy of the data&#8217;s start and end value, but not vice versa.<\/p>\n<p>Looking at the rows in output, you can see for the first row, that for X,9,15, the 9 is in the range of X,8,10, so it is returned. Naturally, the opposite is true, in that the endpoint of X,8,10 is between the start and end of X,9,15. This query is good enough if your goal is to determine if there is an error in your data. For example if you are trying to write a trigger to prevent overlaps, the query can simply be put in an IF EXISTS (the query) THROW construct. If the query returns data, there is an issue. But if you want a list of all of the rows that have issues, the duplication of data can be confusing. So we can change the query to return one more row by looking for the inverse of the engulfed criteria:<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true \">SELECT GroupValueRange.*,\r\n       CASE WHEN ToCompare.StartValue &gt;= GroupValueRange.StartValue\r\n             AND ToCompare.StartValue &lt;= GroupValueRange.EndValue\r\n             AND ToCompare.EndValue &gt;= GroupValueRange.StartValue\r\n             AND ToCompare.EndValue &lt;= GroupValueRange.EndValue\r\n            THEN 'Engulfs'\r\n            WHEN GroupValueRange.StartValue &gt;= ToCompare.StartValue\r\n             AND GroupValueRange.StartValue &lt;= ToCompare.EndValue\r\n             AND GroupValueRange.EndValue &gt;= ToCompare.StartValue\r\n             AND GroupValueRange.EndValue &lt;= ToCompare.EndValue\r\n            THEN 'Engulfed'\r\n            WHEN ToCompare.StartValue &gt;= GroupValueRange.StartValue\r\n             AND ToCompare.StartValue &lt;= GroupValueRange.EndValue\r\n            THEN 'Start Between'\r\n            WHEN ToCompare.EndValue &gt;= GroupValueRange.StartValue\r\n             AND ToCompare.EndValue &lt;= GroupValueRange.EndValue\r\n            THEN 'End Between'\r\n            ELSE 'Something Wrong'\r\n        END AS Issue\r\nFROM Demo.GroupValueRange\r\n        JOIN Demo.GroupValueRange AS ToCompare\r\n            ON ToCompare.GroupValue = GroupValueRange.GroupValue \r\n               --every row overlaps itself \r\n               AND ToCompare.GroupValueRangeId &lt;&gt; GroupValueRange.GroupValueRangeId\r\n               AND ( ( ToCompare.StartValue &gt;= GroupValueRange.StartValue\r\n                       AND ToCompare.StartValue &lt;= GroupValueRange.EndValue)\/*1*\/\r\n                  OR ( ToCompare.EndValue &gt;= GroupValueRange.StartValue\r\n                      AND ToCompare.EndValue &lt;= GroupValueRange.EndValue)\/*2*\/\r\n                  --add the opposite two conditions, but don't need both independently \r\n                  --this will cause the additional 'Engulfed' row to be returned.\r\n                  OR ( GroupValueRange.StartValue &gt;= ToCompare.StartValue\r\n                       AND GroupValueRange.StartValue &lt;= ToCompare.EndValue \/*1&amp;2*\/\r\n                       AND GroupValueRange.EndValue &gt;= ToCompare.StartValue\r\n                       AND GroupValueRange.EndValue &lt;= ToCompare.EndValue));<code>\r\n<\/code><\/pre>\n<p><code><\/code><span style=\"float: none;background-color: transparent;color: #333333;cursor: text;font-family: Georgia,'Times New Roman','Bitstream Charter',Times,serif;font-size: 16px;font-style: normal;font-variant: normal;font-weight: 400;letter-spacing: normal;text-align: left;text-decoration: none;text-indent: 0px\"> Now the output is all of the rows that have an issue with overlapping values, without the data that they are in conflict with.\u00a0<\/span><\/p>\n<p><code>GroupValueRangeId GroupValue StartValue\u00a0 EndValue \u00a0\u00a0 Issue<\/code><br \/>\n <code> ----------------- ---------- ----------- ----------- ---------------<\/code><br \/>\n <code> 4 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 X \u00a0 \u00a0 \u00a0 \u00a0\u00a0 8 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 10 \u00a0 \u00a0 \u00a0 \u00a0\u00a0 Start Between<\/code><br \/>\n <code> 5 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 X \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0  9 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 15 \u00a0 \u00a0 \u00a0 \u00a0\u00a0 End Between<\/code><br \/>\n <code> 5 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 X \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0  9 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 15 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Engulfs<\/code><br \/>\n <code> 6 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 X \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0  11 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0  14 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Engulfed<\/code><\/p>\n<p><span style=\"float: none;background-color: transparent;color: #333333;cursor: text;font-family: Georgia,'Times New Roman','Bitstream Charter',Times,serif;font-size: 16px;font-style: normal;font-variant: normal;font-weight: 400;letter-spacing: normal;text-align: left;text-decoration: none;text-indent: 0px\">This sort of output would be useful in cases where you just want to not use any of the bad rows, but do want to use all of the rows that are okay. So you could use this query (probably without the issue column,) in a CTE, and in our query just exclude these rows using a NOT IN expression (if you have a surrogate key, it would likely fit into the output to make that exclusion easier.<\/span><\/p>\n<p>The problem of overlapping rows when you are dealing with inclusive ranges is not a complex one, but it is tricky. I never can quite remember the basic algorithm right off the top of my head, and adding the output as to where the issues is messy.\u00a0 With exclusive ranges, the main difference is that we have to change a few mathematical comparison operators, but it takes a bit of thought. Taking our example table, I will add an\u00a0<span style=\"float: none;background-color: transparent;color: #333333;cursor: text;font-family: Georgia,'Times New Roman','Bitstream Charter',Times,serif;font-size: 16px;font-style: normal;font-variant: normal;font-weight: 400;letter-spacing: normal;text-align: left;text-decoration: none;text-indent: 0px\">ExclusiveExclusiveEndValue <\/span>column as a computed column by just adding 1 to the value, since for integers, the next value that is not in range is as simple as that:<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">ALTER TABLE Demo.GroupValueRange\r\n  ADD ExclusiveExclusiveEndValue AS (ExclusiveEndValue + 1) PERSISTED;<\/pre>\n<p>Taking a look at the data:<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true  \">SELECT GroupValue, StartValue, EndValue, ExclusiveEndValue\r\nFROM Demo.GroupValueRange;<\/pre>\n<p>You can see it is the same data, with just an extra column:<\/p>\n<p><code>GroupValue StartValue\u00a0 EndValue <strong>ExclusiveEndValue<\/strong><\/code><br \/>\n <code>---------- ----------- -------- <strong>---------------------------<\/strong><\/code><br \/>\n <code>X \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 1 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 2 \u00a0 \u00a0 \u00a0\u00a0 3<\/code><br \/>\n <code>X \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 3 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 4 \u00a0 \u00a0 \u00a0\u00a0 5<\/code><br \/>\n <code>X \u00a0 \u00a0 \u00a0 \u00a0\u00a0 5 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 6 \u00a0 \u00a0 \u00a0\u00a0 7<\/code><br \/>\n <code>X \u00a0 \u00a0 \u00a0 \u00a0\u00a0 8 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 10 \u00a0 \u00a0 \u00a0 11<\/code><br \/>\n <code>X \u00a0 \u00a0 \u00a0 \u00a0\u00a0 9 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 15 \u00a0 \u00a0 \u00a0 16<\/code><br \/>\n <code>X \u00a0 \u00a0 \u00a0 \u00a0\u00a0 11 \u00a0 \u00a0 \u00a0 \u00a0\u00a0 14 \u00a0 \u00a0 \u00a0 15<\/code><br \/>\n <code>X \u00a0 \u00a0 \u00a0 \u00a0\u00a0 16 \u00a0 \u00a0 \u00a0 \u00a0\u00a0 16 \u00a0 \u00a0 \u00a0 17<\/code><br \/>\n <code>Y \u00a0 \u00a0 \u00a0 \u00a0\u00a0 9 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 20 \u00a0 \u00a0 \u00a0 21<\/code><br \/>\n <code>Y \u00a0 \u00a0 \u00a0 \u00a0\u00a0 22 \u00a0 \u00a0 \u00a0 \u00a0\u00a0 24 \u00a0 \u00a0 \u00a0 25<\/code><\/p>\n<p>For the StartValue test, we need to do &gt;= for the comparison set&#8217;s StartValue, but &lt; for their EndValue. For the EndValue test (or in the code, ExclusiveEndValue), we do the opposite &gt; the StartValue, and &lt;= the EndValue; since we are comparing in two different directions. So the code changes to:<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true \">SELECT GroupValueRange.*,'-------&gt;' AS 'Conflict',ToCompare.*,\r\n\u00a0 \u00a0 \u00a0\u00a0 CASE WHEN ToCompare.StartValue &gt;= GroupValueRange.StartValue \r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\u00a0 AND ToCompare.StartValue &lt; GroupValueRange.ExclusiveEndValue \r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\u00a0 AND ToCompare.ExclusiveEndValue &gt; GroupValueRange.StartValue \r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\u00a0 AND ToCompare.ExclusiveEndValue &lt;= GroupValueRange.ExclusiveEndValue \r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 THEN 'Engulfed'\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 WHEN ToCompare.StartValue &gt;= GroupValueRange.StartValue \r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\u00a0 AND ToCompare.StartValue &lt; GroupValueRange.ExclusiveEndValue\u00a0\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 THEN 'Start Between'\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 WHEN ToCompare.ExclusiveEndValue &gt; GroupValueRange.StartValue \r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\u00a0 AND ToCompare.ExclusiveEndValue &lt;= GroupValueRange.ExclusiveEndValue \r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 THEN 'End Between'\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 ELSE 'Something Wrong' END AS Issue\r\nFROM Demo.GroupValueRange\r\n\u00a0 \u00a0 \u00a0 \u00a0 JOIN Demo.GroupValueRange AS ToCompare\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 ON ToCompare.GroupValue = GroupValueRange.GroupValue \r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\u00a0 --every row overlaps itself\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\u00a0 AND ToCompare.GroupValueRangeId &lt;&gt; GroupValueRange.GroupValueRangeId \r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\u00a0 AND ((ToCompare.StartValue &gt;= GroupValueRange.StartValue \r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\u00a0 AND ToCompare.StartValue &lt; GroupValueRange.ExclusiveEndValue ) \r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 OR (ToCompare.ExclusiveEndValue &gt; GroupValueRange.StartValue \r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 AND ToCompare.ExclusiveEndValue &lt;= \u00a0 \u00a0 \u00a0\u00a0 \r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\u00a0 GroupValueRange.ExclusiveEndValue ) \/*2*\/\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 );<\/pre>\n<p>Not significant changes at all, but certainly enough to trip you up (I got it wrong several times when I was putting this together, for sure.)\u00a0 Now if I (and hopefully you, the reader) am faced with this kind of issues, I can simply take the query and cut and paste the names of the table and columns from the table and use the code directly.<\/p>\n<p>As an example that is fairly common with data warehousing folks, a Type 2 dimension will have a time range that the data is valid. So I will adapt this range query to the Employee Dimension in the WideWorldImportersDW database, which you can find <a href=\"https:\/\/github.com\/Microsoft\/sql-server-samples\/releases\/tag\/wide-world-importers-v1.0\">here<\/a>. This will be something that, if you have ever worked with Type 2, slowly changing dimensions, you will likely have need to do to make sure your table is correct.<\/p>\n<p>You can see the similarities to the previous examples immediately, only in a table that has the dreaded &#8220;names with spaces&#8221;, so you will see a lot more square brackets than I usually like to include:<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true \">SELECT TOP 5 Employee.[Employee Key], Employee.Employee, \r\n       Employee.[Valid From], Employee.[Valid To] \r\nFROM  [WideWorldImportersDW].[Dimension].[Employee] \r\nORDER BY Employee, [Valid From]<\/pre>\n<p>This returns the following rows, which show that the end points are inclusive. These Valid To and From values are used to determine which row to apply to a fact table at a given time period. (note, decimal points removed for clarity) :<\/p>\n<pre>Employee Key Employee       Valid From           Valid To\r\n------------ -------------- -------------------- -------------------- \r\n17           Alica Fatnowna 2013-01-01 00:00:00. 2013-11-16 08:00:00. \r\n92           Alica Fatnowna 2013-11-16 08:00:00. 2014-05-08 08:00:00. \r\n112          Alica Fatnowna 2014-05-08 08:00:00. 2014-07-19 08:00:00. \r\n121          Alica Fatnowna 2014-07-19 08:00:00. 2014-10-27 08:00:00. \r\n128          Alica Fatnowna 2014-10-27 08:00:00. 2014-11-26 08:00:00.<\/pre>\n<p><code><\/code>Now I just replace the corresponding table and column names, and get the following query:<\/p>\n<pre>SELECT Employee.[Employee Key],\r\n       Employee.Employee,\r\n       Employee.[Valid From],\r\n       Employee.[Valid To],\r\n       '-------&gt;' AS Conflict,\r\n       ToCompare.[Employee Key],\r\n       ToCompare.Employee,\r\n       ToCompare.[Valid From],\r\n       ToCompare.[Valid To],\r\n       CASE WHEN ToCompare.[Valid From] &gt;= Employee.[Valid From]\r\n             AND ToCompare.[Valid From] &lt; Employee.[Valid To]\r\n             AND ToCompare.[Valid To] &gt; Employee.[Valid From]\r\n             AND ToCompare.[Valid To] &lt;= Employee.[Valid To]\r\n            THEN 'Engulfed'\r\n            WHEN ToCompare.[Valid From] &gt;= Employee.[Valid From]\r\n             AND ToCompare.[Valid From] &lt; Employee.[Valid To]\r\n            THEN 'Start Between'\r\n            WHEN ToCompare.[Valid To] &gt; Employee.[Valid From]\r\n                 AND ToCompare.[Valid To] &lt;= Employee.[Valid To]\r\n            THEN 'End Between'\r\n            ELSE 'Something Wrong'\r\n       END AS Issue\r\nFROM WideWorldImportersDW.Dimension.Employee\r\n        JOIN WideWorldImportersDW.Dimension.Employee AS ToCompare\r\n            ON ToCompare.[WWI Employee ID] = Employee.[WWI Employee ID]  \r\n              --every row overlaps itself\r\n              AND ToCompare.[Employee Key] &lt;&gt; Employee.[Employee Key]\r\n              AND ( ( ToCompare.[Valid From] &gt;= Employee.[Valid From]\r\n                      AND ToCompare.[Valid From] &lt; Employee.[Valid To]) \/*1*\/\r\n                 OR ( ToCompare.[Valid To] &gt; Employee.[Valid From]\r\n                      AND ToCompare.[Valid To] &lt;= Employee.[Valid To])\/*2*\/);<\/pre>\n<p>This returns:<\/p>\n<pre>Employee Key Employee \u00a0 \u00a0 \u00a0 \u00a0 Valid From \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 Valid To \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\u00a0 Conflict \r\n------------ ---------------- -------------------- --------------------------- -------- \r\n95 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 Anthony Grosse \u00a0 2013-11-25 08:00:00. 2014-04-14 08:00:00. -------&gt; Cont'd\r\n94 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 Anthony Grosse \u00a0 2013-11-25 08:00:00. 2014-04-14 08:00:00. -------&gt;\r\n\r\nEmployee Key Employee \u00a0 \u00a0 \u00a0 \u00a0 Valid From \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 Valid To Issue\r\n------------ ---------------- -------------------- -------------------- ---------------\r\n94 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 Anthony Grosse \u00a0 2013-11-25 08:00:00. 2014-04-14 08:00:00. Engulfed\r\n95 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 Anthony Grosse \u00a0 2013-11-25 08:00:00. 2014-04-14 08:00:00. Engulfed<\/pre>\n<p>You can check to see if this data is actually conflicting by running the following code:<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true \">SELECT * FROM\u00a0 [WideWorldImportersDW].[Dimension].[Employee]\r\nWHERE Employee = 'Anthony Grosse'\u00a0 \r\n  AND [Valid From] &lt;= '2014-01-01'\u00a0 \r\n  AND [Valid To] &gt; '2014-01-01'<\/pre>\n<p>This returns 2 rows, which should never occur in a dimension, but I am assuming this could be in there for some sort of demo. Either way, it was fortuitous for this example that the version of WideWorldImportersDW I downloaded had this issue.<\/p>\n<p>Lastly, one minor point about performance. The algorithm&#8217;s performance is largely dependent on the number rows per grouping value. Take our original set of data:<\/p>\n<p><span style=\"color: #222222;font-family: Consolas\">GroupValue \u00a0 Start\u00a0 End<br \/>\n <\/span> <code>------------ ------ ------<\/code><br \/>\n <code>X \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\u00a0  1 \u00a0 \u00a0\u00a0 2<\/code><br \/>\n <code>X  \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 3 \u00a0\u00a0\u00a0\u00a0 4<\/code><br \/>\n <code>X  \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 5 \u00a0\u00a0\u00a0\u00a0 6<\/code><br \/>\n <code>X  \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 8 \u00a0\u00a0\u00a0\u00a0 10<\/code><br \/>\n <code>X  \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 9 \u00a0\u00a0\u00a0\u00a0 15<\/code><br \/>\n <code>X  \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 11 \u00a0 \u00a0 14<\/code><br \/>\n <code>X  \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 16 \u00a0 \u00a0 16<\/code><br \/>\n <code>Y  \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 9 \u00a0\u00a0\u00a0\u00a0 20<\/code><br \/>\n <code>Y  \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 22 \u00a0 \u00a0 24<\/code><\/p>\n<p>When you do the query, you are joining all of the rows on the group value, which effectively leaves the cross product of the rows to be processed in a non-equi-join criteria. If you have a tremendous number of values per group, the process may be very slow. For example, I tried to adapt the query to the Warehouse.ColdRoomTemperatures_Archive table in the WideWorldImporters database (which you can get <a href=\"https:\/\/github.com\/Microsoft\/sql-server-samples\/releases\/tag\/wide-world-importers-v1.0\">here<\/a>). This table is the temporal archive table for the Warehouse.ColdRoomTemperatures table, which has 4 rows in it. The archive as over 3 million rows (which I didn&#8217;t notice when I started the query on Sunday night on my Surface, and found out when I stopped the query on Monday morning. I also didn&#8217;t check for indexes, which it also didn&#8217;t have.)\u00a0<\/p>\n<p>It will behoove you to have\u00a0 indexes on the GroupValue, the primary key (usually in the clustered index), and include the start and end times, particularly if this is a wide table.<\/p>\n<p>Summary<\/p>\n<p>Overlapping ranges are not something you need to deal with very often, but when you do, the way they are validated is something I frequently don&#8217;t remember. Just remember that you are looking for the start or end of one row in a group to be inside the start and end of another row, and the process is relatively easy. \u00a0<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This week, I had a problem where I needed to find and eliminate from the results of my query, data with overlapping ranges. I have written about this topic before, in my database design book book, in regards to building a trigger to avoid overlapping ranges. But even though I have written on the topic&#8230;&hellip;<\/p>\n","protected":false},"author":56085,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[2],"tags":[5134,4183],"coauthors":[19684],"class_list":["post-78223","post","type-post","status-publish","format-standard","hentry","category-other","tag-sql-prompt","tag-t-sql"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/78223","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/56085"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=78223"}],"version-history":[{"count":35,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/78223\/revisions"}],"predecessor-version":[{"id":78819,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/78223\/revisions\/78819"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=78223"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=78223"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=78223"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=78223"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}