{"id":92816,"date":"2021-11-15T20:47:40","date_gmt":"2021-11-15T20:47:40","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=92816"},"modified":"2021-11-15T20:47:40","modified_gmt":"2021-11-15T20:47:40","slug":"between-the-two-of-us","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/t-sql-programming-sql-server\/between-the-two-of-us\/","title":{"rendered":"BETWEEN the two of us"},"content":{"rendered":"<p>This past August, I was looking at an online SQL tutorial. One of the sessions in it featured the<\/p>\n<p><code>BETWEEN<\/code> predicate, which brought back some memories. In the early days of SQL on the ANSI X3H2 Database Standards Committee, we created the <code>BETWEEN<\/code> predicate. We wanted the SQL language to sound a bit like English and have shorthand terms for common coding situations. Many of our keywords were stolen from other programming languages or deliberately chosen so that they were not likely to be confused with data elements. The original syntax for the between predicate looks like this:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">BETWEEN::=\r\n&lt;test_expression&gt; [NOT] BETWEEN\r\n &lt;begin_expression&gt; AND &lt;end_expression&gt;<\/pre>\n<p>The <code>BETWEEN<\/code> predicate specifies the inclusive range to test the expression values. The range is defined by boundary expressions with the <code>AND<\/code> keyword <code>BETWEEN<\/code> them. Naturally, all the expressions in <code>BETWEEN<\/code> predicate must be the same data type or cast to it, as in the case of any comparison predicate. This predicate Is defined as a shorthand for:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">(&lt;test_expression&gt; &gt;= &lt;begin_expression&gt; \r\nAND &lt;test_expression&gt; &lt;= &lt;end_expression&gt;)<\/pre>\n<p>And the negated version of the predicate<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">&lt;test_expression&gt; NOT BETWEEN &lt;begin_expression&gt; AND &lt;end_expression&gt;<\/pre>\n<p>is equal to:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">NOT (&lt;test_expression&gt; BETWEEN &lt;begin_expression&gt; AND &lt;end_expression&gt;)<\/pre>\n<p>The grammar for SQL is deliberately picked to be <code>LALR(1)<\/code>. If you don\u2019t remember that from your compiler writing classes, don\u2019t feel bad. It means that SQL can have a little more complicated grammar than many programming languages to sprinkle keywords In places a little more like natural English. It\u2019s important to notice that the <code>BETWEEN<\/code> predicate uses a closed interval, which includes the endpoints of the range. However, the <code>NOT<\/code> <code>BETWEEN<\/code> <em>excludes<\/em> them.<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">&lt;test_expression&gt; NOT BETWEEN &lt;begin_expression&gt; AND &lt;end_expression&gt;<\/pre>\n<p>is equivalent to<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">(&lt;test_expression&gt; &lt; &lt;begin_expression&gt; \r\nOR &lt;test_expression&gt; &gt; &lt;end_expression&gt;)<\/pre>\n<p>At some point in these early days, one of the committee members proposed changing the syntax and creating what would now be called a \u201csymmetric <code>BETWEEN<\/code>\u201d as the default definition. This proposal passed because committees love proposals. Only Microsoft implemented this feature in their Access tabletop database. All the other vendors ignored it, and the proposal was rescinded at the next committee meeting.<\/p>\n<p>But proposals with extended features seem to keep coming back to life. The current ANSI\/ISO standard syntax is:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">&lt;test_expression&gt; [NOT] BETWEEN [SYMMETRIC | ASYMMETRIC] \r\n&lt;begin_expression&gt; AND &lt;end_expression&gt;<\/pre>\n<p>The keyword <code>ASYMMETRIC<\/code> has the original functionality, and it is optional. The <code>BETWEEN<\/code> <code>SYMMETRIC<\/code> syntax is like <code>BETWEEN<\/code> except that there is no requirement that the argument to the left of the <code>AND<\/code> be less than or equal to the argument on the right. Well, not entirely: officially, the <code>&lt;begin_expression&gt;<\/code> is the minimum, and the <code>&lt;end_expression&gt;<\/code> is the maximum. This transformation converts a <code>SYMMETRIC<\/code> <code>BETWEEN<\/code> into a regular old vanilla <code>BETWEEN<\/code>.<\/p>\n<h2>Intervals in the ISO data model<\/h2>\n<p>Several ISO standards deal with the concepts of intervals. From a mathematical viewpoint, the kinds of intervals you can have are (1) Closed, (2) Opened, (3) Half open high, and (4) Half open low. A closed interval includes both the endpoints, like the range in the <code>BETWEEN<\/code> predicate. An open interval excludes both the endpoints of the range, like the <code>NOT<\/code> <code>BETWEEN<\/code> predicate. The half open intervals are open on either the high-end or the low end of the interval range.<\/p>\n<p>A half open interval on the high-end is how ISO models time. We talk about \u201c24-hour time\u201d Or \u201cmilitary time,\u201d but the truth is a day is defined as an interval from 00:00:00 up to 23:59:59.999.. at whatever precision can be measured. If you try and put in \u201c24:00:00 Hrs\u201d, DB2 and other databases will automatically convert it to 00:00:00 Hrs of the next day. Think of it as being like converting a person\u2019s height from 5\u201918\u201d to 6\u20196\u201d instead. These conventions get even stranger when you look at how different countries and cultures handle times greater than one day. If an event in Japan runs past midnight, they simply add more hours to the event. For example, an event that ran past midnight might be shown as \u201c25:15:00 Hrs.\u201d<\/p>\n<p>The advantage of the half open interval is easy to see in the ISO 8601 standards, which define how temporal data is represented. You are always sure when an event starts, even if you\u2019re not sure when it will end, so you can use a <code>NULL<\/code> to mark the end of an event that is in process. This <code>NULL<\/code> can be coalesced to a meaningful value. For example, sometimes it might make sense to use <code>COALESCE(interval_final_timestamp, CURRENT_TIMESTAMP)<\/code> To figure out the duration of the interval at exactly the moment the query is invoked. Other times, you might want to use <code>COALESCE(interval_final_timestamp, legally_defined_stop_timestamp)<\/code>.<\/p>\n<p>The <code>BETWEEN<\/code> predicate is not just used for timestamps. It works perfectly well for numeric ranges and text, too. Numeric ranges can be used to throw things into buckets, which looks reasonably obvious until the three parameter values are of different numeric types. Now you have to consider rounding and casting errors. Even worse, if the parameters are character data with different correlations. As a generalization, you really need to make sure that all three parameters are of the same type. In fact, ranges and text data can get so complicated, I\u2019m just going to ignore them. Let\u2019s just look at numeric ranges.<\/p>\n<h2>Report cards<\/h2>\n<p>A classic example of reducing values into ranges is converting grades from numeric totals or percentages to a letter grade. The usual convention is that a score in the 90s is an \u201cA\u201d, a score in the 80s is a \u201cB\u201d, a score in the 70s is a \u201cC\u201d, a score in the 60s is a \u201cD\u201d and anything below that is an \u201cF\u201d. I\u2019m choosing to ignore plus or minus options on the letters.<\/p>\n<p>The <code>CASE<\/code> expression in SQL is executed from left to right, and the first <code>WHEN<\/code> clause that tests <code>TRUE<\/code> returns the value in its <code>THEN<\/code> clause. This means that the order in which you write your tests will control how it executes; not all programming languages work this way. In effect, we have hard-coded half open intervals.<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">CASE\r\nWHEN score &gt;= 90.000 THEN \u2018A\u2019\r\nWHEN score &gt;= 80.000 THEN \u2018B\u2019\r\nWHEN score &gt;= 70.000 THEN \u2018C\u2019\r\nWHEN score &gt;= 60.000 THEN \u2018D\u2019\r\nELSE \u2018F\u2019 END<\/pre>\n<p>It is important to notice that this expression will handle somebody who has more than 100 points to qualify as an \u201cA\u201d student. In this example, that\u2019s probably what was intended for extra credit, but this might indicate an error in the data in other schemes. Likewise, a score of zero might be a data error. Then, of course, because this is SQL, what would a <code>NULL<\/code> mean? Perhaps it indicates an incomplete? A general rule of thumb is to design for the extreme cases but tuning for the most expected cases.<\/p>\n<h2>The OVERLAPS() predicate<\/h2>\n<p>The <code>OVERLAPS<\/code> predicate is part of the SQL Standards but not part of SQL Server. This predicate is defined only for temporal data and is based on temporal intervals. Yes, there is a temporal interval type in Standard SQL. Before getting into it, we need to back up and discuss something known as Allen\u2019s operators. They are named after J. F. Allen, who defined them in a 1983 research paper on temporal intervals. The basic model has two temporal intervals, expressed as ordered pairs of start and termination timestamps (S1, T1) and (S2, T2).<\/p>\n<p>Here are the base relations between two intervals, as timelines.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-92817\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2021\/11\/an-occlusion-calculus-based-on-an-interval-algebra.png\" alt=\"An images showing base relations between two intervals, as timelines.\" width=\"392\" height=\"312\" \/><\/p>\n<p>SQL did not add all 13 relationships, but we decided that an overlaps predicate would be the most useful.<\/p>\n<p>The result of the <code>&lt;OVERLAPS predicate&gt;<\/code> is formally defined as the result of the following expression:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\"> (S1 &gt; S2 AND NOT (S1 &gt;= T2 AND T1 &gt;= T2))\r\n OR (S2 &gt; S1 AND NOT (S2 &gt;= T1 AND T2 &gt;= T1))\r\n OR (S1 = S2 AND (T1 &lt;&gt; T2 OR T1 = T2))<\/pre>\n<p>where S1 and S2 are the starting times of the two time periods and T1 and T2 are their termination times. The rules for the <code>OVERLAPS()<\/code> predicate sound like they should be intuitive, but they are not. The principles that we wanted in the Standard were:<\/p>\n<p>1. A time period includes its starting point but does not include its end point. We have already discussed this model and its closure properties.<\/p>\n<p>2. If the time periods are not &#8220;instantaneous,&#8221; they overlap when they share a common time period.<\/p>\n<p>3. If the first term of the predicate is an <code>INTERVAL<\/code>, and the second term is an instantaneous event (a &lt;datetime&gt; data type), they overlap when the second term is in the time period (but is not the end point of the time period). That follows the half-open model.<\/p>\n<p>4. If the first and second terms are instantaneous events, they overlap only when they are equal.<\/p>\n<p>5. If the starting time is <code>NULL<\/code> and the finishing time is a <code>&lt;datetime&gt;<\/code> value, the finishing time becomes the starting time, and we have an event. If the starting time is <code>NULL<\/code> and the finishing time is an <code>INTERVAL<\/code> value, then both the finishing and starting times are <code>NULL<\/code>.<\/p>\n<p>Please consider how your intuition reacts to these results when the granularity is at the YEAR-MONTH-DAY level. Remember that the day begins at 00:00:00 Hrs.<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\"> (today, today) OVERLAPS (today, today) = TRUE\r\n (today, tomorrow) OVERLAPS (today, today) = TRUE\r\n (today, tomorrow) OVERLAPS (tomorrow, tomorrow) = FALSE\r\n (yesterday, today) OVERLAPS (today, tomorrow) = FALSE<\/pre>\n<h2>Contiguous temporal intervals with DDL<\/h2>\n<p>Alexander Kuznetsov wrote this idiom for History Tables in T-SQL, but it generalizes to any SQL. It builds a temporal chain from the current row to the previous row with a self-reference. This is easier to show with code:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">CREATE TABLE Tasks\r\n(task_id INTEGER NOT NULL,  --  makes sense in your data\r\n task_score CHAR(1) NOT NULL,  -- whatever makes sense in your data\r\n previous_end_date DATE, -- null means first task\r\n current_start_date DATE DEFAULT CURRENT_TIMESTAMP NOT NULL,\r\n CONSTRAINT previous_end_date_and_current_start_in_sequence\r\n   CHECK (prev_end_date &lt;= current_start_date)\r\n DEFERRABLE INITIALLY IMMEDIATE,\r\n current_end_date DATE, -- null means unfinished current task\r\n CONSTRAINT current_start_and_end_dates_in_sequence\r\n   CHECK (current_start_date &lt;= current_end_date),\r\n CONSTRAINT end_dates_in_sequence\r\n   CHECK (previous_end_date &lt;&gt; current_end_date),\r\n PRIMARY KEY (task_id, current_start_date),\r\n UNIQUE (task_id, previous_end_date), -- null first task\r\n UNIQUE (task_id, current_end_date), -- one null current task\r\n FOREIGN KEY (task_id, previous_end_date)  -- self-reference\r\n   REFERENCES Tasks (task_id, current_end_date));<\/pre>\n<p>Well, that looks complicated! Let&#8217;s look at it column by column. Task_id explains itself. The previous_end_date will not have a value for the first task in the chain, so it is NULL-able. The current_start_date and current_end_date are the same data elements, temporal sequence, and PRIMARY KEY constraints we had in the simple history table schema.<\/p>\n<p>The two <code>UNIQUE<\/code> constraints will allow one <code>NULL<\/code> in their pairs of columns and prevent duplicates. Remember that <code>UNIQUE<\/code> is <code>NULL<\/code>-able, not like <code>PRIMARY<\/code> <code>KEY<\/code>, which implies <code>UNIQUE<\/code> <code>NOT<\/code> <code>NULL<\/code>.<\/p>\n<p>Finally, the <code>FOREIGN<\/code> <code>KEY<\/code> is the real trick. Obviously, the previous task has to end when the current task started for them to abut, so there is another constraint. This constraint is a self-reference that makes sure this is true. Modifying data in this type of table is easy but requires some thought.<\/p>\n<p>There is just one little problem with that <code>FOREIGN<\/code> KEY constraint. It will not let you put the first task into the table. There is nothing for the constraint to reference. In Standard SQL, we can declare constraints to be <code>DEFERABLE<\/code> with some other options. The idea is that you can turn a constraint <code>ON<\/code> or <code>OFF<\/code> during a session so the database can be in a state that would otherwise be illegal. But at the end of the session, all constraints have to be <code>TRUE<\/code> or <code>UNKNOWN<\/code>.<\/p>\n<p>When a disabled constraint is re-enabled, the database does not check to ensure any existing data meets the constraints. You will want to hide this in a procedure body to get things started.<\/p>\n<p><strong>BETWEEN<\/strong><\/p>\n<p>Please notice that the <code>OVERLAPS<\/code> and <code>BETWEEN<\/code> predicates work with static intervals, but there are also dynamic predicates for data. The <code>LEAD<\/code> and <code>LAG<\/code> operators view the rows as representing points in time or in a sequence, but that is a topic for another article.<\/p>\n<p><em>If you liked this article, you might also like\u00a0<a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/learn\/a-unique-experience\/\">A UNIQUE experience<\/a><\/em><\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>BETWEEN can be used in a SQL WHERE clause to filter on a range. Joe Celko explains the history of BETWEEN and also could be implemented.&hellip;<\/p>\n","protected":false},"author":96214,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[53,143531],"tags":[5134],"coauthors":[6781],"class_list":["post-92816","post","type-post","status-publish","format-standard","hentry","category-featured","category-t-sql-programming-sql-server","tag-sql-prompt"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/92816","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/96214"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=92816"}],"version-history":[{"count":4,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/92816\/revisions"}],"predecessor-version":[{"id":92820,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/92816\/revisions\/92820"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=92816"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=92816"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=92816"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=92816"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}