{"id":102030,"date":"2024-05-20T18:18:04","date_gmt":"2024-05-20T18:18:04","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=102030"},"modified":"2024-05-23T02:55:33","modified_gmt":"2024-05-23T02:55:33","slug":"the-check-constraint","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/theory-and-design\/the-check-constraint\/","title":{"rendered":"The CHECK() Constraint"},"content":{"rendered":"<p>One of the many ways a relational table differs from the file structures used by pre-relational storage systems is that the tables, rows and columns can have constraints on them. This allows you to reduce the types of bad data that can be loaded into your tables. This lets the database do a lot of the work that you would\u2019ve had to do an application code in the old days (and current times when using file-based storage).<\/p>\n<p>This is an incredible advantage. With COBOL or FORTRAN programs, the data storage platform did nothing to take care of itself. Everything was in the application code. If you talk to an old timer, will find stories about mounting the wrong tape on a tape drive, and then running a program on it that made no sense. This occurred because a lot of times, files were fixed width 80 column card images, so the application program would simply take those images and partition them into fields. Please remember that in SQL, each column has a known data type in and of itself; this datatype is not dependent on the query using that table.<\/p>\n<p>One of my favorite horror stories involves programmers mounting a tape from a medical machine and running the data through the hospital payroll program. The program got remarkably far before failing. Of course, by that time, the printer had destroyed a lot of blank checks.<\/p>\n<h2>The most basic constraint: PRIMARY KEYs<\/h2>\n<p>The problem is that newer SQL programmers (and even some more experienced ones) don\u2019t take advantage of these constraints. Consider this slight rewrite of an actual posting on an SQL forum. In the original, the poster thought that identifiers should be integers and that ISO\u201311179 naming rules do not apply, so I cleaned it up a little, but this will make my point.<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">CREATE TABLE Appointments\r\n(\r\n   patient_id CHAR(10),\r\n   appointment_start_date DATE,\r\n   appointment_end_date DATE\r\n);<\/pre>\n<p>What\u2019s the first thing you notice about this table? It ought to be that this is<em> not <\/em>a table by definition! Remember your very first classes on RDBMS? There\u2019s no key. A key is not an option; this is the definition. Since all the columns in this first posting can be <code>NULL<\/code>, that DDL could never be correct. Most relational products allow this, which has some utility when moving data around, but this is not how a proper table ought to be designed.<\/p>\n<p>Let\u2019s assume a patient can have more than one appointment, just not in the same day. That means the key for this table must be made up of at least two columns, a patient and a date. First, we must make those two columns <code>NOT NULL<\/code>. There\u2019s also an assumption of only whole days being used for appointments, let\u2019s ignore that for now and then worry about time slices later.<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">CREATE TABLE Appointments\r\n(\r\n   patient_id CHAR(10) NOT NULL,\r\n   appointment_start_date DATE NOT NULL,\r\n      PRIMARY KEY (patient_id, appointment_start_date),\r\n   appointment_end_date DATE \r\n);<\/pre>\n<p>All it takes is a simple \u201ccut and paste\u201d to get the column names into a <code>PRIMARY KEY ()<\/code> constraint declaration. Yet if you go on virtually any SQL forum, you\u2019ll find that most of the posters have not done this.<\/p>\n<h2>Checking for more than uniqueness<\/h2>\n<p>There\u2019s nothing in this DDL to keep an appointment from ending before it begins. We can fix that with a <code>CHECK()<\/code> constraint.<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">CREATE TABLE Appointments\r\n(\r\n   patient_id CHAR(10) NOT NULL,\r\n   appointment_start_date DATE NOT NULL,\r\n     PRIMARY KEY (patient_id, appointment_start_date),\r\n   appointment_end_date DATE,\r\n      CHECK (appointment_start_date &lt;= appointment_end_date)\r\n);<\/pre>\n<p>Now the user cannot put in an appointment with a start date greater than the end date. However, the <code>appointment_end_date<\/code> column allows <code>NULL<\/code> values. How will this affect the use of the <code>CHECK<\/code> constraint?<\/p>\n<p>I am always surprised by how many SQL programmers don\u2019t know the answer to this question about <code>UNKNOWN<\/code> results in the DDL. If the Boolean expression of a <code>CHECK<\/code> constraint\u2019s search condition comes back <code>UNKNOWN<\/code> as result of having a <code>NULL<\/code> in a test, then it\u2019s treated as if it were <code>TRUE<\/code>, and the search condition accepts the row. Think of it as a \u201cbenefit of the doubt\u201d feature in SQL. This means that when I have an appointment and I don\u2019t know when it\u2019s going to end, I can use (<code>SET appointment_end_date = NULL<\/code>) to model an open-ended appointment.<\/p>\n<p>If you wish to limit some data based on <code>NULL<\/code>, then you have to specifically include that in your <code>CHECK<\/code> constraints. So if you want to say that appointments after <code>\u20182024-01-01\u2019<\/code> are required to have an appointment end date, you can add another CHECK like: <code>(appointment_end_date is not NULL AND appointment_start_date &gt;= '2024-01-01'). <\/code>This leaves open the <code>appointment_start_date<\/code> to be <code>NULL<\/code> or not <code>NULL<\/code> when <code>appointment_start_date <\/code>is earlier than<code> '2024-01-01')<\/code><\/p>\n<h2>Performance benefits of CHECK constraints<\/h2>\n<p>I am also surprised how many SQL programmers don\u2019t know that the search conditions in the <code>CHECK()<\/code> clauses are used by the optimizers. The more you can put into the DDL, the better the DML will run.<\/p>\n<p>Of course, <code>CHECK<\/code> constraints do have a slightly negative effect on <code>INSERT<\/code> statements but the positive effect they have when queries are executed that contradict them is worth it, speaking only about performance.<\/p>\n<p>But even with no performance gains, it means if we\u2019re applying our business rules in one place and one way. There is no need to worry about the same rule having two or more definitions in the system. The fact that the data matches the conditions is worth a few micro-seconds checking the data.<\/p>\n<h2>Defaults can help you too<\/h2>\n<p>Another feature which is very handy but is way underused is the <code>DEFAULT<\/code> clause. This feature says to use the default value in insertions when no explicit value is given. Normally, the \u201cdefault default\u201d is <code>NULL<\/code> without an explicit default value, but this assumes that the column involved is <code>NULL<\/code>-able.<\/p>\n<p>A lot of the time, this is not the case in your data model. The default can be a zero, a one, the current timestamp, the current user or some special code The most common conventions are \u201cNA\u201d for not available, \u201cTBD\u201d for to be determined, \u201cNMI\u201d for no middle initial and so forth. In some cases it can be difficult to choose a value to mean none, because these values could be confused with initials of Neal Anderson, Texas Belly Dancers, and No More Ice. Having the default set makes it easier for people to meet some <code>CHECK<\/code> constraints that require a certain set of values.<\/p>\n<p>These conventions were derived in the days of paper forms and punch cards because simply leaving a blank space might tell you that you simply had not filled in the form completely or that there <em>really <\/em>was a missing value.<\/p>\n<h2>Don\u2019t forget the moniker<\/h2>\n<p>Finally, a <code>CHECK()<\/code> constraint can (and should) be named. This name will show up in error messages and can be used to reference the constraints in the DDL. Again, this might not be a big deal when you\u2019re posting in a forum or writing test code, but it\u2019s an incredible help when writing production code, ideally not right in your production database. Having the same name in each environment helps you know when something is the same constraint, and when it is not.<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">CREATE TABLE Appointments\r\n(\r\n   patient_id CHAR(10) NOT NULL,\r\n   appointment_start_date DATE NOT NULL,\r\n    PRIMARY KEY (patient_id, appointment_start_date),\r\n   appointment_end_date DATE,\r\n    CONSTRAINT valid_appointment_slot\r\n    CHECK \r\n      (appointment_start_date &lt;= appointment_end_date));<\/pre>\n<p>The <code>CHECK()<\/code> constraint can be attached to the end of a column declaration if the constraint applies only to that column. An example of that might be <code>CHECK (order_qty &gt; 0)<\/code> or <code>CHECK(order_qry &gt;= 0)<\/code> on the order quantity column in a table that models commercial orders.<\/p>\n<p>In fact, in general , you should frequently have <code>CHECK()<\/code> clauses that define the domain for a value. For example, for quantities equal to or greater than zero like this because most SQL implementations do not have an integer data type for natural numbers. There may be reasons to allow negative values for quantity, so be sure and check your requirements first.<\/p>\n<p>Another horror story from the early days of SQL was a schema that incorrectly allowed for negative order quantities. Such things were processed as refunds. One of the programmers found this design flaw, refunded themselves thousands of dollars, and then left the company. A very crooked early retirement plan! These simple <code>CHECK()<\/code> and the <code>NOT NULL<\/code> constraints should be automatic if you\u2019re going to be a good SQL programmer.<\/p>\n<h2>You can have more than one CHECK<\/h2>\n<p>When I taught SQL classes, there was a misunderstanding that the table could have only one <code>CHECK()<\/code> constraint. This is not true, and what you want to have is one constraint for every condition you wish to <code>CHECK()<\/code>. Little harder to see is that a single column can be referenced in more than one <code>CHECK()<\/code> constraint. It is all right to write something like:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">CREATE TABLE Bears_House\r\n(\r\n  porridge_id CHAR(15) NOT NULL PRIMARY KEY,\r\n  ..\r\n  porridge_temperature INTEGER NOT NULL DEFAULT 50\r\n    CHECK (porridge_temperature BETWEEN 0 AND 100),\r\n   CONSTRAINT hot_porridge \r\n       CHECK (porridge_temperature &lt;= 55),\r\n   CONSTRAINT cold_porridge \r\n       CHECK (porridge_temperature &gt;= 45),\r\n .. \r\n);<\/pre>\n<p>The named hot and cold porridge constraints overlap the general porridge temperature constraint. This is not a problem, and it simply means that more than one constraint can be violated in a time. Whether or not you want to use two constraints as shown, or one with an AND like: (<code>porridge_temperature &lt;= 55 AND porridge_temperature &gt;= 45<\/code>) can come down to error messages. Was the value too hot or cold? Combined the name of this constraint might be \u201c<code>improper_temperature_porridge<\/code>\u201d, which works, but doesn\u2019t indicate it as well as two constraints.<\/p>\n<p>At this point you probably need to know something about your particular SQL product and how it\u2019s going to present this information to you. Many products will give you only the first error that occurs, so if you violate 20 of them, it can be interesting to work through them one at a time.<\/p>\n<h2>An argument against constraints<\/h2>\n<p>One argument against using the database to protect data is that it feels like it should be part of an application and not a table declaration. This article did not, nor will it, broach the subject of application code and it\u2019s needed to protect data.<\/p>\n<p>Some logic legitimately needs the user\u2019s feedback. Some are very time based. Some need a manager to allow overrides to the basic business rules. Most of the rules you put in the database ought to be a part of the application as well. For example, if you do have 20 constraints on a table, you will very likely cover most of them in the UI. Username required, Last Name required, etc. You may even have optional rules that you want to ask the user about \u201cYou did not include a First Name, was that on purpose?\u201d<\/p>\n<p>But every bit of data protection you put in the database will work in the first, second and one hundredth UI. Also when the new DBA starts and imports a load of data\u2026 but has the punch cards upside down.<\/p>\n<h2>Give it a try<\/h2>\n<p>A final consideration is that SQL has a lot of useful predicates. I\u2019ve shown you sample code with the <code>BETWEEN<\/code>, but we didn\u2019t get into using string and mathematical functions; nor did we delve too deep into complex Boolean expressions to construct elaborate pieces of logic. In fact, pretty much any expression that returns a Boolean can be put into a <code>CHECK<\/code> constraint, depending on your SQL engine\u2019s implementation.<\/p>\n<p>As an exercise, why don\u2019t you try to write a simple check digit constraint for a UPC code in the DDL of the table? It will be a little messy, but once it\u2019s in place you\u2019ll never have to do it again.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>One of the many ways a relational table differs from the file structures used by pre-relational storage systems is that the tables, rows and columns can have constraints on them. This allows you to reduce the types of bad data that can be loaded into your tables. This lets the database do a lot of&#8230;&hellip;<\/p>\n","protected":false},"author":96214,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[53,143539],"tags":[159072],"coauthors":[6781],"class_list":["post-102030","post","type-post","status-publish","format-standard","hentry","category-featured","category-theory-and-design","tag-database-theory"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/102030","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/96214"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=102030"}],"version-history":[{"count":6,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/102030\/revisions"}],"predecessor-version":[{"id":102454,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/102030\/revisions\/102454"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=102030"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=102030"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=102030"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=102030"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}