{"id":101530,"date":"2024-03-29T12:09:18","date_gmt":"2024-03-29T12:09:18","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=101530"},"modified":"2024-03-29T17:29:43","modified_gmt":"2024-03-29T17:29:43","slug":"sql-logic","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/theory-and-design\/sql-logic\/","title":{"rendered":"SQL Logic"},"content":{"rendered":"<p>In this article, I will discuss the history and thinking behind several types of logic that are typically associated with writing relational database code. Because of how relational databases implement NULLs, it definitely can make some processes more complex than expected by may programmers (inexperienced and experienced alike!)<\/p>\n<h2>Three Valued Logic<\/h2>\n<p>Missing or <code>UNKNOWN<\/code> values have been a problem in data modeling and statistics long before there was SQL. One of the best solutions has been to create a category for \u201cmiscellaneous\u201d, \u201cUNKNOWN\u201d, \u201cN\/A (Not applicable), \u201cother\u201d, \u201cTBD (to be determined)\u201d and so forth. When Dr. Codd was designing the relational model, he decided on using the marker <code>NULL<\/code> for all the possible ways that something could be <code>UNKNOWN<\/code> or missing. Later, however, he decided to add markers for \u201cvalues that are missing because the attribute is missing.\u201d and \u201cvalues that are missing because of our ignorance of an existing attribute\u201d, a distinction that was not implemented in ANSI\/ISO Standard SQL.<\/p>\n<p>Other programming languages are usually based on a simple Boolean algebra. Logical values are <code>TRUE<\/code> and <code>FALSE<\/code>, and the three operators are <code>AND<\/code>, <code>OR<\/code> and <code>NOT<\/code>. Unfortunately, SQL is not quite that simple because of the intricacies of representing the data that user needs to store. We had a three valued logic (3VL) that adds the logical value <code>UNKNOWN<\/code> and slightly different rules.<\/p>\n<p>As <a href=\"https:\/\/en.wikipedia.org\/wiki\/David_McGoveran\">Dave McGoveran<\/a> (one of the pioneers of relational databases) pointed out, technically this is not a logic system. To be a logic system, we would need inference rules. This is perhaps something only a math major really cares about, but we have what are called a <code>&lt;search condition&gt;<\/code> in the BNF that defines Standard SQL. While mathematicians may worry about this sort of thing, programmers don\u2019t have to. SQL is going to give you enough problems. The purpose of logical operators is to test the logical value of a condition and return a logical value that can be <code>TRUE<\/code>, <code>FALSE<\/code>, or <code>UNKNOWN<\/code>.<\/p>\n<p>An inference rule in two valued logic uses the symbol \u21d2, which is usually read as \u201c&lt;left expression&gt; Implies &lt;right expression&gt;\u201d or as \u201cif A is TRUE, then B is TRUE.\u201d In English, or perhaps I should say, in more traditional English, \u201ca <code>TRUE<\/code> premise cannot imply a <code>FALSE<\/code> conclusion\u201d when discussing logic. The transformation in two valued, Boolean logic called the Smisteru Rule says that (A \u21d2 B) Is equivalent to (\u00acA \u2227 B). Unfortunately, a direct translation from two valued logic to SQL three valued logic does not work. Remember that in the DDL treats <code>UNKNOWN<\/code> and <code>FALSE<\/code> similar, while the DML treats <code>UNKNOWN<\/code> and <code>TRUE<\/code> as being similar.<\/p>\n<p>The <code>UNKNOWN<\/code> value is the result of using <code>NULLs<\/code> in expressions and in some relational operations, such as <code>OUTER JOIN<\/code>. It can be important to remember that <code>UNKNOWN<\/code> (logical value) and <code>NULL<\/code> (property of an attribute) are not the same thing, but they look a lot alike when you are coding.<\/p>\n<p>Another quote from Dave McGoveran is that a computer language committee never met a feature that it didn\u2019t like. This is also all too <code>TRUE<\/code>. The number of features in a language tend to increase, not shrink.<\/p>\n<p>Here is a quick table of SQL s basic three valued logic operators.<\/p>\n<table>\n<thead>\n<tr>\n<th>\n<p>Op1<\/p>\n<\/th>\n<th>\n<p>Op2<\/p>\n<\/th>\n<th>\n<p>Op1 AND Op2<\/p>\n<\/th>\n<th>\n<p>Op1 OR Op2<\/p>\n<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>\n<p>TRUE<\/p>\n<\/td>\n<td>\n<p>TRUE<\/p>\n<\/td>\n<td>\n<p>TRUE<\/p>\n<\/td>\n<td>\n<p>TRUE<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>TRUE<\/p>\n<\/td>\n<td>\n<p>FALSE<\/p>\n<\/td>\n<td>\n<p>FALSE<\/p>\n<\/td>\n<td>\n<p>TRUE<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>TRUE<\/p>\n<\/td>\n<td>\n<p>UNKNOWN<\/p>\n<\/td>\n<td>\n<p>UNKNOWN<\/p>\n<\/td>\n<td>\n<p>TRUE<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>FALSE<\/p>\n<\/td>\n<td>\n<p>TRUE<\/p>\n<\/td>\n<td>\n<p>FALSE<\/p>\n<\/td>\n<td>\n<p>TRUE<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>FALSE<\/p>\n<\/td>\n<td>\n<p>FALSE<\/p>\n<\/td>\n<td>\n<p>FALSE<\/p>\n<\/td>\n<td>\n<p>FALSE<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>FALSE<\/p>\n<\/td>\n<td>\n<p>UNKNOWN<\/p>\n<\/td>\n<td>\n<p>FALSE<\/p>\n<\/td>\n<td>\n<p>UNKNOWN<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>UNKNOWN<\/p>\n<\/td>\n<td>\n<p>TRUE<\/p>\n<\/td>\n<td>\n<p>UNKNOWN<\/p>\n<\/td>\n<td>\n<p>TRUE<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>UNKNOWN<\/p>\n<\/td>\n<td>\n<p>FALSE<\/p>\n<\/td>\n<td>\n<p>FALSE<\/p>\n<\/td>\n<td>\n<p>UNKNOWN<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>UNKNOWN<\/p>\n<\/td>\n<td>\n<p>UNKNOWN<\/p>\n<\/td>\n<td>\n<p>UNKNOWN<\/p>\n<\/td>\n<td>\n<p>UNKNOWN<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><code>AND<\/code> and <code>OR<\/code> operators are symmetric: you can swap <code>Op1<\/code> and <code>Op2<\/code> without changing the operator outcome.<\/p>\n<table>\n<tbody>\n<tr>\n<td>\n<p>Op1<\/p>\n<\/td>\n<td>\n<p>NOT Op1<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>TRUE<\/p>\n<\/td>\n<td>\n<p>FALSE<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>FALSE<\/p>\n<\/td>\n<td>\n<p>TRUE<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>UNKNOWN<\/p>\n<\/td>\n<td>\n<p>UNKNOWN<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>The final classic 2VL operator is negation or <code>NOT<\/code>&lt;expression&gt;. In 2VL, it switches <code>TRUE<\/code> to <code>FALSE<\/code> and <code>FALSE<\/code> to <code>TRUE<\/code>; no surprises there. However, it doesn\u2019t change <code>UNKNOWN<\/code> in 3VL. We didn\u2019t find out if the expression was <code>TRUE<\/code> or <code>FALSE<\/code>, so it stays UNKNOWN.<\/p>\n<p>There are some other urinary operators and ANSI\/ISO Standard SQL might not want to use them often, but it\u2019s good to know they exist:<\/p>\n<p><code>&lt;search condition&gt; IS [NOT] {TRUE | FALSE | UNKNOWN}<\/code><\/p>\n<p>This is a test for logical value. It is not the same as the more familiar <code>&lt;search condition&gt; IS [NOT] NULL<\/code> that we\u2019ve had in SQL since the beginning. Starting to appreciate how messy 3VL can be?<\/p>\n<h2>Generalized Conjunction and Disjunction<\/h2>\n<p>Conjunction and Disjunction are mathematician\u2019s words for what a programmer would call ANDs and Ors. Most programming languages define them as a binary operation, but we can generalize them to a more set oriented parameter in SQL.<\/p>\n<p>Before modern logic and the use of truth values, classic Aristotelian syllogisms used the forms \u201cAll X are Y\u201d, \u201cSome X are Y\u201d, and \u201cNo X are Y\u201d; find a good write up this in any non-mathematical philosophy book. We have a version of them in SQL using what we call quantifiers. The syntax for the existential quantifier is pretty straightforward;<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">&lt;value expression&gt; [SOME | ANY] &lt;comparison operator&gt; &lt;table expression&gt;<\/pre>\n<p>The keywords <code>SOME<\/code> and <code>ANY<\/code> are synonyms, but you might want to pick the one that makes the most sense is an English sentence you write your code. This is essentially a shorthand for a list of OR-ed predicates built from the expansion of of code<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">(&lt;value expression&gt; &lt;comparison operator&gt; &lt;row-1&gt;\r\nOR &lt;value expression&gt; &lt;comparison operator&gt; &lt;row-2&gt;\r\n...\r\nOR &lt;value expression&gt; &lt;comparison operator&gt; &lt;row-n&gt;)<\/pre>\n<p>You don\u2019t think about it, but if you started with Pascal as your first programming language, you might remember the <code>&lt;exp&gt; IN (expression list&gt;)<\/code> predicate. Yes, this is where SQL stole it from, but we define it as <code>&lt;value expression&gt; = ANY VALUES (exp-1,exp-2,..exp-n&gt;)<\/code>. This is the general disjunction.<\/p>\n<p>Since this is really shorthand for <code>OR<\/code>-ed &lt;search condition&gt;s, a single <code>TRUE<\/code> result will make the whole predicate <code>TRUE<\/code>. But more than that, if the expression list is all <code>NULLs<\/code> or empty, then you can\u2019t get a <code>TRUE<\/code> or <code>FALSE<\/code> result?<\/p>\n<p>Likewise, we can build a generalized conjunction by just changing the <code>OR<\/code> to <code>AND<\/code> in the expansion of the code. This is the SQL version of the universal quantifier.<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">(&lt;value expression&gt; &lt;comparison operator&gt; &lt;row-1&gt;\r\nAND  &lt;value expression&gt; &lt;comparison operator&gt; &lt;row-2&gt;\r\n...\r\nAND &lt;value expression&gt; &lt;comparison operator&gt; &lt;row-n&gt;)<\/pre>\n<p>Everyone uses the <code>IN()<\/code> predicate, but the <code>ANY<\/code> and <code>ALL<\/code> are very handy as well. They can make your SQL more English like an easier to read. Since they are a single construct, your SQL product may have optimizations for it.<\/p>\n<h2>BETWEEN<\/h2>\n<p>The original version of the <code>BETWEEN<\/code> search condition was a very simple shorthand; <code>&lt;value-1&gt; BETWEEN &lt;value-2&gt; AND &lt;value-3&gt;<\/code><\/p>\n<p>This expression is equivalent to the search condition:<\/p>\n<p><code>&lt;value1 &gt;= &lt;value-2&gt; AND &lt;value-1&gt; &lt;= &lt;value-3&gt;<\/code><\/p>\n<p>Like most SQL operators, you also had a negation of the <code>BETWEEN<\/code> predicate:<\/p>\n<p><code> &lt;value-1&gt; [NOT] BETWEEN &lt;value-2&gt; AND &lt;value-3&gt;<\/code><\/p>\n<p>That is equivalent to the search condition:<\/p>\n<p><code>NOT(&lt;value-&gt;1 BETWEEN &lt;value-2&gt; AND &lt;value-3&gt;)<\/code><\/p>\n<p>This concept has some interesting history. Between-ness was left out of Euclidean geometry. It later shows up we get to the calculus and must define convergence series and limits. It\u2019s also one of the few comparison operators that has three parameters.<\/p>\n<p>This operator has an ordering from left to right and we know <code>(&lt;value-1&gt; &lt;= &lt;value-3&gt;<\/code>). The ISO model temporal value is based on half open intervals. That means we know where the interval starts, but the end point is not included in the end range. This is just like how one define a day. It starts at midnight of one day (0:00) and ends on midnight of the next. The starting time of 0:00 is included in the interval, but the ending point time is defined as 0:00, but it actually is 23:59 59:59. The end time is not in the interval to make it easier to specify in a manner that is round off proof. (for a more in depth discussion, see the \u201cISO Half Open Interval Model section in my <a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/t-sql-programming-sql-server\/modeling-time\/\">Modeling Time<\/a> article.)<\/p>\n<p>The advantage of half-open intervals is that they can be concatenated, subsseted and defined in calculus as limits.<\/p>\n<p>Decades ago, when the SQL standards were being worked on in ANSI X3H2, we defined what is now the <code>SYMMETRIC BETWEEN<\/code> predicate one meeting. Microsoft even put it in their Access database product. At the next meeting, the committee backed out this change. Currently, however, we reverse ourselves again and use a new optional keyword, <code>SYMMETRIC<\/code>. This is implemented in PostgreSQL, but not currently in SQL Server.<\/p>\n<p><code>&lt;value-1&gt;[ BETWEEN SYMMETRIC&lt;value-2&gt; AND &lt;value-3&gt;<\/code><\/p>\n<p>as a shorthand for:<\/p>\n<p><code>((&lt;value-1&gt;BETWEEN &lt;value-2&gt; AND &lt;value-3&gt;)<\/code><br \/>\n\u00a0 \u00a0<code>OR &lt;value-1&gt; BETWEEN &lt;value-3&gt; AND &lt;value-2&gt;))<\/code><\/p>\n<p>Notice that we still don\u2019t have the support for open intervals with this operator.<\/p>\n<h2>[NOT] EXIST &lt;subquery&gt;<\/h2>\n<p>The simple <code>EXISTS<\/code>() also, as an interesting history. Technically, it takes a subquery. This means were supposed to have a <code>SELECT FROM<\/code> statement parameter. But the <code>SELECT<\/code> clause is redundant.<\/p>\n<p>The reason that it\u2019s in syntax is that originally the first attempts at SQL would expand the subquery out completely to test for the existence qualified data. But if you think about it, the minute you get just one row back, even if it includes <code>NULLs<\/code> (yes, <code>NULLs<\/code> exists, even if they don\u2019t have any value), this operator will return <code>TRUE<\/code> in fact, it\u2019s one of the few logical operators in SQL that cannot return an <code>UNKNOWN<\/code>.<\/p>\n<h2>Pattern Matching<\/h2>\n<p>The syntax for the original pattern matching predicate is <code>&lt; match value&gt; [NOT] LIKE &lt;pattern expression&gt; [ESCAPE &lt;escape character&gt;]<\/code>. SQL got this version of a general regular expression parser from IBM. Since we never intended the language to be a text handling tool, we picked a simple basic version of grep (general regular expression parser) from UNIX. Later, SQL added \u00a0the <code>SIMILAR TO<\/code> regular expression tool.<\/p>\n<p>There are several wildcards that can be a part of the pattern expression:<\/p>\n<ul>\n<li>The percent sign % represents zero, one, or multiple pattern characters.<\/li>\n<li>The underscore sign _ represents one, single character. This was really a bad choice are part because a string of underscores is hard to count. Modern word processors and printers tend to make them all into one single line with the resolution that we have had since we stopped using old unit record line printer equipment.<\/li>\n<li>Square brackets <code>[<\/code> and <code>] <\/code>allow you to specify a range of characters like \u2018[0-9]\u2019. So, any single character that appear in the character set you are using that are between 0 and 9 would match. In this case, any of 0, 1, 2, 3, 4, 5, 6, 7, 8, or 9.<\/li>\n<li>The <code>ESCAPE<\/code> clause declares a pattern character with the property that what follows it is to be treated as a literal, not as a pattern. This lets you put underscores and percent signs in strings. This feature does not get used a lot. <br \/>\nAs you would expect, a <code>NULL<\/code> in the predicate always returns an <code>UNKNOWN<\/code> result. However, if both pattern and match values are strings of length zero, then the result is <code>TRUE<\/code>.<\/li>\n<\/ul>\n<p>The <code>LIKE<\/code> operator is not used as much as it should be. You can put it in a <code>CHECK()<\/code> constraint in the DDL and save yourself the trouble of writing code to do a quick validation every time you touch a particular column. My favorite is simply having <code>CHECK (foobar_zipcode LIKE \u2018[0-9][0-9][0-9][0-9][0-9]\u2019)<\/code> on US mailing addresses because it\u2019s cheap and easy.<\/p>\n<p>The SQL\u201399 standard added the <code>SIMILAR TO<\/code> operator, this is a form of <code>grep()<\/code> based on POSIX, as found in ISO\/IEC 9945 standards. The US government was interested in these standards at the time, so we added it to SQL for portability.<\/p>\n<p>Pattern symbols are little more extensive in this regular expression tool. The following are some additional patterns that are part of the standards but not necessarily completely implemented in the RDBMS you are using.<\/p>\n<ul>\n<li>| Means alternation (batch to either of the two alternatives separated by the bar)<\/li>\n<li>*means repetition of the previous character zero or more times<\/li>\n<li>+ means repetition of the previous character one or more times.<\/li>\n<li>Parentheses can be used to group substrings into a single unit.<\/li>\n<\/ul>\n<p>There\u2019s also a set of abbreviations that look like this.<\/p>\n<ul>\n<li>[: ALPHA:] match any alphabetic character.<\/li>\n<li>[: UPPER] managing the uppercase alphabetic character.<\/li>\n<li>[: LOWER:] match any lowercase alphabetic character.<\/li>\n<li>[: DIGIT:] match any digit.<\/li>\n<li>[:ALNUM:] match any alpha or numeric character.<\/li>\n<\/ul>\n<p>Vendors will have even more extensions in their products, so you can really write pretty elaborate pattern matches, which frankly is usually a bad idea. SQL was never meant to be a string handler. It\u2019s probably better if you do this in the input tier of your system, using the tool designed for this. It\u2019s also better if you keep the patterns as simple as possible. Think about the five digits of the US ZIP Code system and then compare it to the elaborate pattern recognition that the British Postal Code uses. The British system was based on abbreviations for post offices that no longer exist, so it\u2019s a rather cryptic long string can be expressed as a monstrous, long regular expression. The Royal Mail is gradually abandoning the old system in favor of an all-numeric, fixed length system like ZIP Codes for bulk mail users.<\/p>\n<h2>Conclusion<\/h2>\n<p>Originally, SQL evolved from an IBM product called Sequel. The name stood for \u201cStructured English\u2013like Query Language\u201d which defines the goal the language. Procedural languages tend to have a relatively small set primitive operators. When you want to do something, you code a procedure from them.<\/p>\n<p>However, there is a tendency in declarative languages to create another language feature and hide procedural part inside them. The idea is to abstract operations to a higher level and let the machine do the heavy lifting. For example, if I need to sort a file in a procedural language, I must pick my algorithm. In a higher-level declarative language, I simply tell the language to <code>SORT (&lt; file name&gt;)  <\/code>and the software picks the algorithm, based on an analysis of the file.<\/p>\n<p>In SQL, we want to use the logic operations to define <em>what we want<\/em> and not be concerned with the <em>how we get it. We have a pretty good toolbox for doing this too!<\/em><\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this article, I will discuss the history and thinking behind several types of logic that are typically associated with writing relational database code. Because of how relational databases implement NULLs, it definitely can make some processes more complex than expected by may programmers (inexperienced and experienced alike!) Three Valued Logic Missing or UNKNOWN values&#8230;&hellip;<\/p>\n","protected":false},"author":96214,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[53,143539],"tags":[159072],"coauthors":[6781],"class_list":["post-101530","post","type-post","status-publish","format-standard","hentry","category-featured","category-theory-and-design","tag-database-theory"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/101530","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/96214"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=101530"}],"version-history":[{"count":3,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/101530\/revisions"}],"predecessor-version":[{"id":101533,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/101530\/revisions\/101533"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=101530"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=101530"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=101530"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=101530"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}