{"id":1953,"date":"2015-02-11T00:00:00","date_gmt":"2015-02-11T00:00:00","guid":{"rendered":"https:\/\/test.simple-talk.com\/uncategorized\/the-dri-subject-of-references\/"},"modified":"2021-09-29T16:21:30","modified_gmt":"2021-09-29T16:21:30","slug":"the-dri-subject-of-references","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/t-sql-programming-sql-server\/the-dri-subject-of-references\/","title":{"rendered":"The DRI Subject of References"},"content":{"rendered":"<div id=\"pretty\">\n<p class=\"start\"> In the early days of SQL, it was  assumed that SQL would never be ready for prime time. Oh, it might be useful for ad hoc data analysis of some kind but  not the heavy lifting required for production work. That would always belong to COBOL with ISAM, IDMS. IMS, Total and  other network based models of databases and file systems.  <\/p>\n<p> Well, that turned out not to be  true. But there is more to the story than just &#8220;SQL got bigger tables and faster searches&#8221;; SQL got data integrity  features! This is a major difference between a file system and a database. A file system simply holds the data, without  any regard to what it means, business rules or relationships in the data. That kind of thing was the job of an  application program. Lots and lots of application programs. You could not keep all those programs in synch and could  guarantee data integrity. <\/p>\n<p> A database should enforce and  maintain business rules and relationships in the data. <i>This is not part of any application program; it is part of the  data model. <\/i>By having rules separated from the applications, we know that everyone is playing with the same data and  done one way, in one place, one time. And we do not have to write thousands of line of the same code over and over. And  then maintain them. Yes, it is possible to subvert data integrity, but you have to go out of your way to do it.  <\/p>\n<p> In data modeling, we have the  concept of strong and weak entities. A strong entity exists on its own merits, and a weak entity has to have a strong  entity paired with it. The classic example is the SQL idiom for an Orders database we have all seen. Here is the  simplest version and it is wrong:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql highlight:0 decode:true\">CREATE TABLE Orders -- strong entity\n(order_nbr CHAR(10) NOT NULL PRIMARY KEY,\n&#160;..);\nCREATE TABLE Order_Details -- weak entity\n(order_nbr CHAR(10) NOT NULL,\n&#160;line_nbr INTEGER NOT NULL,\n&#160;sku CHAR(15) NOT NULL,\n&#160;PRIMARY KEY (order_nbr, line_nbr),\n&#160;..);\n<\/pre>\n<p> There are lots of errors here.  The <i>physical<\/i> line number in the order details table is not a<i> logical<\/i> construct. This model copies the  paper order form&#8217;s lines into rows in a table. We do not do that in a valid logical data model! I can order the same  carton of buzzard eggs several times on different lines on a paper form, but in the logical model, I want to remove that  redundancy by having an &#8220;<code>order_qty\"<\/code> column. And if you are a good programmer, you will have a <code>CHECK (order_qty &gt; 0)  <\/code>on  the column. <\/p>\n<p> The concept of a tiered  architecture is strange to old monolithic procedural language programmers. You need an input layer that will aggregate  those buzzard eggs under the same sku (Stock Keeping Unit), as a single order detail. This is also where we can validate  check digits, standardize the street address and phone numbers, etc. The goal is to put only clean data into the  database, not use the base tables as a staging area.  <\/p>\n<p> This first skeleton schema had  no Declarative Referential Integrity (DRI) between these two tables. Look at what happens when an Order is dropped; the  Order Details will be orphaned. I have seen SQL systems where the orphans fill the disk and have to be cleaned out with  homemade garbage collection procedures every few weeks. Performance is awful, of course. Before you ask, the old network  databases had their versions of this problem. But they came with built-in garbage collection procedures that had to be  used on a schedule as part of a defragmentation run. <\/p>\n<p> SQL got smarter, started doing  maintenance functions automatically and added a declarative <code>REFERENCES<\/code> clause on the weak entity table. Here is the  improved version: <\/p>\n<pre class=\"theme:ssms2012 lang:tsql highlight:0 decode:true\">CREATE TABLE Order_Details\n(order_nbr CHAR(10) NOT NULL\n&#160; REFERENCES Orders(order_nbr),\n&#160;sku CHAR(15) NOT NULL,\n&#160;PRIMARY KEY (order_nbr, sku),\n&#160;..);\n<\/pre>\n<p> The Orders table is called the <b>referenced <\/b>table and the <code>Order_Details<\/code> table is called the<b> referencing <\/b>table. Please notice that we do not  have &#8220;parent&#8221; and &#8220;child&#8221; tables in RDBMS. Those terms are from the old network databases. They were implemented by  pointers and had to be between two distinct &#8220;tables&#8221; (actually, files in those days) !  <\/p>\n<p> Too many new SQL programmers  will stop at this point. However, we have more power from the options in SQL. The referenced columns do not have to be  the <code>PRIMARY KEY<\/code>; they just have to be a key, which means that they are declared with  <code>NOT NULL<\/code> <code>UNIQUE<\/code> in the referenced  table. But keys can have more than one column! You can use a <code>FOREIGN KEY<\/code>() clause when you have a multiple column  reference. The syntax is <\/p>\n<pre>[CONSTRAINT &lt;constraint name&gt;]\n&#160;FOREIGN KEY (referencing_col1, .., referencing_col_n)\n&#160;REFERENCES &lt;referenced table name&gt; (referenced_col1, .., referenced_col_n)\n<\/pre>\n<p> The  referenced column list must match, by data type and position, to the referencing column list. This is called &#8220;union  compatible&#8221; in the jargon. Many noobs are so afraid of multi-column keys, they will use all kinds of non-relational  kludges to work around them and get a single column &#8220;surrogate key&#8221; (actually, Dr. Codd said that surrogates are not  exposed to or maintained by the user, but skip that for now). Let&#8217;s explain this with an example, say  (<code>someplace_longitude, someplace_latitude<\/code>), If you assign a GUID, concatenate the values of the pair or whatever kludge  you like, then how do you assure that your &#8220;surrogate&#8221; is properly matched to a real key pair? And why did you want to  increase redundancy? Learn to use cut &amp; paste if you think that the lengths of column lists are a problem.  <\/p>\n<p>When you give names to the  constraints with the optional <code>CONSTRAINT<\/code> <code>&lt;constraint name<\/code><code>&gt;<\/code> clause, the errors messages will use the given name and a human being can read them. These  names are global to the schema and, not just local to a table. But these constraint names also mean you can also turn  the constraint off then back on. The syntax is &#8230;<\/p>\n<pre>ALTER TABLE &lt;table name&gt; [NOCHECK | CHECK] CONSTRAINT &lt;constraint name&gt; \n&#160;Consider this self-reference trick to prevent gaps in a timeline of events:\nCREATE TABLE Events\n(event_id CHAR(10) NOT NULL,\nprevious_event_end_date DATE NOT NULL\nCONSTRAINT Chained_Dates\nREFERENCES Events (event_end_date),\nevent_start_date DATE NOT NULL,\nevent_end_date DATE UNIQUE, -- null means event in progress\nPRIMARY KEY (event_id, event_start_date),\nCONSTRAINT Event_Order_Valid\nCHECK (event_start_date &lt;= event_end_date),\nCONSTRAINT Chained_Dates \nCHECK (DATEADD(DAY, 1, previous_event_end_date) = event_start_date).\n&lt;&lt; other stuff for this event &gt;&gt;\n);\n-- disable the Chained_Dates constraint\nALTER TABLE Events NOCHECK CONSTRAINT Chained_Dates;\n-- insert a starter row\nINSERT INTO Events(event_id, previous_event_end_date, event_start_date, event_end_date)\nVALUES ('Foo Fest', '2010-01-01', '2010-01-02', '2010-01-05');\n-- enable the constraint in the table\nALTER TABLE Events CHECK CONSTRAINT Chained_Dates;\n-- this works\nINSERT INTO Events(event_id, previous_event_end_date, event_start_date, event_end_date)\nVALUES ('Glob Week', '2010-01-05', '2010-01-06', '2010-01-10');\n-- this fails\nINSERT INTO Events(event_id, previous_event_end_date, event_start_date, event_end_date)\nVALUES ('Snoob', '2010-01-09', '2010-01-11', '2010-01-15'); \n<\/pre>\n<p> In the SQL model, we have no  idea how <code>REFERENCES<\/code> is implemented and we do not care. We can have a reference back to the same table, make cycles or  pretty any kind of path you want. It is a more abstract and general concept than pointers. Hashing is another approach,  for example, that other SQL products use. <\/p>\n<p> You can also add a DRI action  clause to the <code>REFERENCES<\/code>. They were meant to replace <code>TRIGGER<\/code>s with declarative code. A  <code>TRIGGER<\/code><code> <\/code> cannot be optimized  because it is procedural code that could do anything in the schema. We noticed that 80-90% of the <code>TRIGGER<\/code>s in the real  world were used to maintain the weak-strong entity relationships, so we made it into a declarative feature. Right now,  T-SQL is not that good about optimizing the DRI, like other SQL products, but there is hope for the future releases.  <\/p>\n<p> In the ANSI\/ISO standard model  for both <code>TRIGGER<\/code>s and DRI, an <code>INSERT <\/code>or <code>DELETE \t<\/code>is called a database event and it causes a response. Please note that an  <code>INSERT <\/code>is not an event in the ANSI\/ISO world, so the T-SQL model is a little different. In fact, T-SQL&#8217;s DRI is minimal,  but very useful. In the full ANSI\/ISO syntax, there are more options about matching rules, and deferring constraints.  The full ANSI\/ISO Standard is also more forgiving about cycles than T-SQL, which disallows any of them.  <\/p>\n<p> The DRI syntax is<\/p>\n<pre>ON [UPDATE| DELETE] &lt;DRI action&gt;<\/pre>\n<p> The actions change the  referencing table when the referenced table has an event.  <\/p>\n<ul>\n<li> \t\t<strong>NO ACTION <\/strong> \t\tis the default action. If a  \t\t<code>DELETE <\/code>or <code>UPDATE \t\t<\/code>is executed on referenced rows, you get an error message about a constraint violation. This is what  happens with just the <code>REFERENCES<\/code> by itself, but it makes things complete.  \t\t \t\t<\/li>\n<li> \t\t<code>CASCADE<\/code> specifies that the referencing column will be updated when the referenced column is updated, and  referencing rows will be deleted when the referenced rows are deleted. When a referenced row is change, this can change  another table with DRI actions. The SQL engine has to keep track of all of this stuff because if anything goes wrong,  the engine has to do a <code>ROLLBACK<\/code>. This can be a lot of overhead and get tricky. Try to keep a simple tree structure in  the DRI network. \t\t<\/li>\n<li> \t\t<strong>SET DEFAULT:<\/strong> The referencing column(s) will be set to their <code>DEFAULT<\/code> values when an <code>UPDATE \t\t<\/code>or <code>DELETE<\/code><code> <\/code>is performed on referenced rows.  Obviously, the referencing column(s) must have <code>DEFAULT<\/code> clauses.  \t\t \t\t<\/li>\n<li> \t\t<strong> \t\tSET NULL:<\/strong> This  action specifies that the column will be set to <code>NULL<\/code> when the referenced column is updated or deleted. Obviously, the  referencing column(s) must be <code>NULL<\/code>-able. \t\t<\/li>\n<\/ul>\n<h1><b> Required Weak Entities  <\/b><\/h1>\n<p><b> &#160;<\/b>Have you noticed that the  referencing table has to find a match in the referenced table, but not vice versa? Thinking about the <code>Orders <\/code>schema,  this says that we cannot take an order for a product we do not have in stock. You will get  a constraint violation message and a <code>ROLLBACK<\/code>. But it is fine to have products in stock that nobody wants to order.  <\/p>\n<p> You might have a business rule  that allows an order to be posted, but not have any items on the order. This would be an abandoned shopping cart on a  website. Now consider a schema that models Personnel and their job skills. We can use the  Standard Occupational Classification (SOC)&#160;for the skills.  <\/p>\n<pre>CREATE TABLE Personnel\n(emp_id CHAR(12) NOT NULL PRIMARY KEY, \n&#160;soc CHAR(7) NOT NULL \n&#160;CHECK (soc LIKE '[0-9][0-9]-[0-9][0-9][0-9][0-9]'),\n&#160;);\nCREATE TABLE Skills\n(soc CHAR(7) NOT NULL \n&#160;CHECK (soc LIKE '[0-9][0-9]-[0-9][0-9][0-9][0-9]'),\n&#160;emp_id CHAR(12) NOT NULL\n&#160;REFERENCES Personnel (emp_id),\n&#160;PRIMARY KEY(soc, emp_id)\n);\n<\/pre>\n<p> This  guarantees that each employee will have at least one skill.<i> But this skill is modeled as an attribute of an employee,  not a weak entity related to an employee.<\/i> This means we need to union the two tables get all the SOC codes together! <\/p>\n<pre>CREATE VIEW SOC_List (emp_id, soc)\nAS\n(SELECT emp_id, soc FROM Skills\n&#160;UNION \n&#160;SELECT emp_id, soc FROM Personnel);\n<\/pre>\n<p> But  this is a mess. I can have the same SOC code in both tables; clearly a redundancy. This skeleton usually occurs when  each employee started with only one required skill and it was put into the Personnel table as if it was an attribute and  not a relationship. When the requirements changed to include multiple skills, the programmer added a new table rather  than more columns. The extra columns would be a repeated group and violate First Normal Form (1NF). But this design  already has a repeated group violation &#8211; the group simply has one member!  <\/p>\n<p> I can  keep the skills unique between the two tables with a trigger and some procedural code. But his means procedural code and  we hate procedural code. It is also trickier than you think. If I delete the skill in the Personnel table, then I have  to look at the Skills table.<\/p>\n<ul>\n<li>If the Skills table has more than one skill left,  then you need a rule for moving a skill to the Personnel table. You will have to turn off the constraints, lock the  tables, move the data and finally turn on the constraints again. <\/li>\n<li>If the Skills table has just one skill left, then  you need to move that skill to the Personnel table. But then you need to get clean out the skills table.  <\/li>\n<li><\/li>\n<\/ul>\n<p>This  code cannot be optimized and probably not ported easily. You really need to go back to the original strong-weak table  model, so the SOC code appears in one and only one place. What happens when we have a cycle between the tables? It is a  formal version of &#8220;You need experience to get work and you need work to get experience&#8221; vicious cycle in employment. \t\t<\/p>\n<p>In  full ANSI\/ISO Standard SQL, we have the <code>CREATE ASSERTION<\/code> statement. This is a schema level  \t\t<code>CHECK() <\/code>constraint. This is  why a constraint name has to be global and not local to a table. <\/p>\n<pre>\t\tCREATE ASSERTION One_Skill_per_Employee\n\t\tCHECK (NOT EXISTS \n\t\t&#160;&#160; (SELECT emp_id FROM Personnel \n\t\t&#160;&#160;&#160; EXCEPT\n\t\t&#160;&#160;&#160; SELECT emp_id FROM Skills));\n<\/pre>\n<p>What this says is that when you remove all the employees in the Skills table from  all the employees in the Personnel table, you should get an empty table. Since this is at the schema level, a violation  will cause an error in your current session. <\/p>\n<p>But SQL Server doe not have the <code>CREATE ASSERTION<\/code> or deferrable constraints. It has  the <code>WITH CHECK OPTION<\/code> on VIEWs and you can &#8220;fake it&#8221; a little bit with this feature. This feature says to re-check the  \t\t<code>WHERE<\/code> clause if the underlying table(s) changes. The <code>VIEW<\/code> has to be on one table and updatable for this to work best. \t\t<\/p>\n<pre class=\"theme:ssms2012 lang:tsql highlight:0 decode:true\">\t\tCREATE VIEW Valid_Skills\n\t\tAS\n\t\tSELECT * \n\t\t&#160; FROM Skills\n\t\t&#160;WHERE \n\t\t&#160;&#160; NOT EXISTS\n\t\t&#160;&#160; (SELECT emp_id FROM Personnel \n\t\t&#160;&#160;&#160; EXCEPT\n\t\t&#160;&#160; SELECT emp_id FROM Skills)\n\t\tWITH CHECK OPTION; \n\t\t&#160;\n<\/pre>\n<p> \t\tAnd likewise, you can copy the constraint into a second view that covers the  \t\t<code>Personnel <\/code>table data.<\/p>\n<pre class=\"theme:ssms2012 lang:tsql highlight:0 decode:true\">\t\tCREATE VIEW Valid_Personnel\n\t\tAS\n\t\tSELECT * \n\t\t&#160; FROM Personnel\n\t\t&#160;WHERE \n\t\t&#160; NOT EXISTS\n\t\t&#160; (SELECT emp_id FROM Personnel \n\t\t&#160;&#160; EXCEPT\n\t\t&#160;&#160; SELECT emp_id FROM Skills)\n\t\tWITH CHECK OPTION; \n<\/pre>\n<p> \t\tLet&#8217;s see how this works. Add a new employee:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql highlight:0 decode:true\">\t\tINSERT INTO Personnel VALUES ('Wilma', '03-0000');<\/pre>\n<p> \t\tBut now the <code>Valid_Skills VIEW i<\/code>s empty! The lack of skills for Wilma empty the  whole <code>VIEW<\/code>. You can add the skills and <code>Valid_Skills<\/code> will return:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql highlight:0 decode:true\">\t\tINSERT INTO Skills VALUES ('Wilma', '03-0001'), ('Wilma', '03-0002');<\/pre>\n<p>We do not really need the <code>with check<\/code> option in this case, but it is a good safety  feature if people use the view for updating. The real trick is to write procedures that updates, inserts into and  deletes from both base tables in a single transaction. I will leave that as a exercise for the reader.  <\/p>\n<\/p><\/div>\n","protected":false},"excerpt":{"rendered":"<p>A database must be able to maintain and enforce the business rules and relationships in data in order to maintain the data model. It does this through referential constraints. They aren&#8217;t complex, but are powerful, especially with the means to attach DRI actions to them. Joe Celko explains all, and pines for the ANSI CREATE ASSERTION  statement&hellip;<\/p>\n","protected":false},"author":96214,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143531],"tags":[4168,4150,4252],"coauthors":[],"class_list":["post-1953","post","type-post","status-publish","format-standard","hentry","category-t-sql-programming-sql-server","tag-database","tag-sql","tag-t-sql-programming"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/1953","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/96214"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=1953"}],"version-history":[{"count":5,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/1953\/revisions"}],"predecessor-version":[{"id":41127,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/1953\/revisions\/41127"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=1953"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=1953"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=1953"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=1953"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}