{"id":101527,"date":"2024-03-18T12:30:34","date_gmt":"2024-03-18T12:30:34","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=101527"},"modified":"2024-03-26T09:27:41","modified_gmt":"2024-03-26T09:27:41","slug":"department-of-redundancy-department","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/theory-and-design\/department-of-redundancy-department\/","title":{"rendered":"Department of Redundancy Department"},"content":{"rendered":"<p>Ever wonder all the reasons that we use databases instead of file systems? While we don\u2019t think of it too much anymore, the first reason that databases came into existence was to remove redundancies. The first source of redundancy back in the dark ages, when I was just beginning the program, was a product called \u201ccarbon paper.\u201d You younger people probably have not seen it. Paper forms came with sheets of this thin black carbon paper between duplicates of the form. As you pressed on the paper form, pressure transferred the image to the duplicates.<\/p>\n<p>If you don\u2019t believe me, <a href=\"https:\/\/www.google.com\/search?q=carbon+paper\">Google it<\/a>. Carbon paper was so ubiquitous that there is a 1950s science fiction short story about a terrible weapon, which was a bacteria or fungus (I cannot remember which) that eats carbon paper. it destroyed the bureaucracies which depended on multiple copies.<\/p>\n<p>Today, even paper forms are becoming rare. For a time, you had to have a paper copy to have a legal signature. In some states the color of the ink which you had to use was also prescribed by law. Today electronic signatures of various kinds are acceptable. I am waiting for the day soon when either facial recognition or DNA will be the acceptable signature.<\/p>\n<p>Very often, each of these copies that were made from the carbon paper was sent to different departments. Each department would keypunch the same data into their files, using a different file layout. There\u2019s a classic horror story about a federal employee who got a reward when he discovered that one of the copies was never used and just thrown away. After giving him his \u201cemployee of the month\u201d reward, the bureaucracy bought filing cabinets to save this redundant copy. After all, you never know when you might need it again.<\/p>\n<p>Just as a man with two wristwatches is never truly sure exactly what time it is, so these early file systems could not produce a consistent, verifiable version of the data. As an actual example, I worked for a state highway department which used a multipart form reporting time on projects. We used one copy for employee\u2019s time, one copy for equipment usage and one copy for project expenses. I could look at the time clock file and figure out how many hours a particular employee had charged each week. Likewise, I could go to the equipment usage file and figure out how many hours on which job a particular piece of equipment had been employed.<\/p>\n<p>But the data was never coordinated, so it was almost impossible to answer a query like \u201chow many hours did Bill Smith charge to digging that drainage ditch for the Owl Creek Bridge, and did he use a shovel or a backhoe?\u201d from these files. You had to go back to the original paper forms and calculate the stuff by hand!<\/p>\n<p>Unlike tables in a schema in RDBMS, paper files do not have to have any relationships among them. What we wanted was one fact, in one place, one time and modeled in only one way. This is what databases were supposed to do. But they could still have problems.<\/p>\n<h2>Row Level Redundancies<\/h2>\n<p>Consider a common business situation, the basic order form, with a skeleton table declaration like this:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">CREATE TABLE Orders\r\n(order_nbr CHAR(10) NOT NULL PRIMARY KEY,\r\n item_GTIN CHAR(15) NOT NULL REFERENCES  \r\n    Inventory(item_GTIN),\r\n order_qty INTEGER NOT NULL CHECK (order_qty&gt; 0),\r\n item_unit_price DECIMAL (5,2) NOT NULL\r\n            CHECK (item_unit_price&gt; = 0.00),\r\n item_extension DECIMAL (10,2) NOT NULL CHECK (item_extension&gt; = 0.00),\r\n..);<\/pre>\n<p>If you have ever worked in retail, you no doubt would know that <code>(order_qty * item_unit_price) = item_extension<\/code>. The input screen on order entry programs will often do this calculation for you. But my point is storing something that can be calculated one and only one way is redundant. You can either hide calculations in a <code>VIEW<\/code> or use a computed column. It is also safer, since the computer probably makes few calculation errors than you do.<\/p>\n<p>The basic syntax for a computed column is <code>&lt;column_name&gt; AS &lt;expression&gt; [PERSISTED]<\/code>. A persisted option says that after the value is computed, it is materialized and physically stored in the table. Since the value is now real instead of virtual, it can be used for indexing. There are limitations on the expression; but basically, it has to be something that can be calculated from the other columns in each row.<\/p>\n<h2>Columns In Different Tables<\/h2>\n<p>Looking at my Orders table, I see I have a reference to the global trade item number (GTIN) of each line item. If you\u2019re not familiar with this, it\u2019s like the UPC code used in North America for retail trade. It\u2019s a little longer, and it lets you determine the country of origin.<\/p>\n<p>What this clause says that the Inventory table uses the GTIN as a key; so it is the referenced table. The Orders table has a relationship such that it\u2019s GTIN values have to be in the inventory table. Orders is called the referencing table. Don\u2019t be fooled just because a reference <em>looks<\/em> like a redundancy.<\/p>\n<p>The <code>FOREIGN KEY<\/code> or <code>REFERENCES<\/code> clause defines all columns explicitly that belong to the foreign key. The <code>REFERENCES<\/code> clause specifies the table name with all columns that build the referenced key. The number and the data types of the columns in the <code>FOREIGN KEY<\/code> clause must match the number and the corresponding data types of columns in the <code>REFERENCES<\/code> clause (and, of course, both of these must match the number and data types of the columns in the key of the referenced table). Usually, but it is not required, we use the <code>PRIMARY<\/code> <code>KEY<\/code> of the referenced table. In fact, you can get really tricky and have a self referencing foreign key constraint! I don\u2019t want to go into details on that, but just keep it in the back of your mind.<\/p>\n<p>Please don\u2019t use the terms \u201cparent\u201d and \u201cchild\u201d tables. These terms refer to pointer chain structures that existed in early non-relational databases. In these older databases, such as IMS, Total, IDMS, et al, your relationships (pointer chains) had to form strict hierarchies. The only restriction on references is that eventually the references must resolve back to a base table.<\/p>\n<p>The real power of the foreign key comes with referential actions that can be added to the clause. The DRI (declarative referential integrity) syntax is<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">REFERENCES &lt;referenced table&gt;\r\nON [UPDATE| DELETE] &lt;DRI action&gt;<\/pre>\n<p>The actions change the referencing table when the referenced table has an event. A database event is an insert, update or delete. We ignore insertions for references. The DRI actions that are possible are:<\/p>\n<ul>\n<li><strong>NO ACTION<\/strong> &#8211; The default action. If a <code>DELETE<\/code> or <code>UPDATE<\/code> is executed on referenced rows, you get an error message about a constraint violation. This is what happens with just the <code>REFERENCES<\/code> by itself, but it makes things complete. If you use the optional constraint clause, your database will be a little better documented.<\/li>\n<li><strong>CASCADE<\/strong> &#8211; specifies that the referencing column will be updated when the referenced column is updated, and referencing rows will be deleted when the referenced rows are deleted. When a referenced row is changed, this can change another table with DRI actions. The SQL engine has to keep track of all of this stuff because if anything goes wrong, the engine has to do a <code>ROLLBACK<\/code>. This can be a lot of overhead and get tricky. Try to keep a simple tree structure in the DRI network.<\/li>\n<li><strong>SET DEFAULT<\/strong> &#8211; The referencing column(s) will be set to their <code>DEFAULT<\/code> values when an <code>UPDATE<\/code> or <code>DELETE<\/code> is performed on referenced rows. Obviously, the referencing column(s) must have <code>DEFAULT<\/code> clauses.<\/li>\n<li><strong>SET NULL<\/strong> &#8211; This action specifies that the column will be set to <code>NULL<\/code> when the referenced column is updated or deleted. Obviously, the referencing column(s) must be <code>NULL<\/code>-able.<\/li>\n<\/ul>\n<p>Let\u2019s try a little different example. You\u2019re given three tables. One is for salesmen. The <code>Sales_Teams<\/code> table models a group of salespersons, who get a T-shirt and a mascot. The <code>Customers<\/code> table explains itself. We have a rule that each customer is assigned to both a unique sales team and to an individual salesman who is a member on that team. Our third table gives us that relationship. Each Sales team is responsible for every customer that any member of the team is assigned to and not responsible for the other team\u2019s customers.<\/p>\n<p>With this business rule, we can find all the customers for which a sales team is responsible for. In one of two ways. The direct way would be to use the relationship between sales teams to customers. A more indirect way would be to find sales team to the salespersons, which would give us all the sales persons the team then look for all of the customers belong to each team member.<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk \">CREATE TABLE Customers\r\n(customer_id CHAR(10) NOT NULL PRIMARY KEY,\r\n assigned_salesman_id CHAR(10) NOT NULL\r\n       REFERENCES Sales_Staff (salesman_id) \r\n         ON DELETE CASCADE\r\n         ON UPDATE CASCADE);\r\n\r\nCREATE TABLE Sales_Staff\r\n(salesman_id CHAR(10) NOT NULL PRIMARY KEY\r\n sales_team_name CHAR(25) NOT NULL,\r\n      REFERENCES Sales_teams\r\n         ON DELETE CASCADE\r\n         ON UPDATE CASCADE);\r\n\r\nCREATE TABLE Sales_Teams\r\n(sales_team_name CHAR(25) NOT NULL PRIMARY KEY,\r\n salesman_id CHAR(10) NOT NULL\r\n      REFERENCES Sales_Staff (salesman_id)\r\n         ON DELETE CASCADE\r\n         ON UPDATE CASCADE,\r\nUNIQUE (sales_team_name, salesman_id))<\/pre>\n<p>With these table structures, one way to find out which customers have which sales team would be a query like this:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">SELECT C.customer_id, T.sales_team_name\r\n FROM Customers AS C, Sales_Teams AS T, Sales_Staff AS S\r\nWHERE C.assigned_salesman_id = S.salesman_id\r\n AND T.salesman_id = S.salesman_id;<\/pre>\n<p>But going through the sales staff is redundant. We\u2019ve already taken care of the relationships of the teams to the employees with various referential constraints. The query can be reduced to simply:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">SELECT C.customer_id, T.sales_team_name\r\nFROM Customers AS C, Sales_Teams AS T\r\nWHERE C.assigned_salesman_id = T.salesman_id;<\/pre>\n<h2>Entire Tables Can Be Redundant<\/h2>\n<p>If you\u2019ve ever worked the book business, then you\u2019ll be familiar with the ISBN (International Standard Book Number). This is an industry standard that has been around for a very long time that identifies a book, with an alpha\u2013numeric code. It was developed in the UK and is based on a mod 11 check digit. Before you ask, the check digit could be zero through nine or X, the Roman numeral for 10.<\/p>\n<p>It is a very strong check digit which prevents a lot of errors. However, it doesn\u2019t fit into the industry standards used for retail barcodes, UPC, and GTIN. We need 13 digits, and the first three digits are always 978, which identifies the trade item as a book. These two systems are referred to as ISBN-10 and ISBN-13 respectively.<\/p>\n<p>Fortunately, it\u2019s easy to convert from ISBN-10 to ISBN-13; add the prefix and replace the original check digit with a simple computation for the new check digit. The problem is that not all books have any kind of ISBN, and a lot of older books that were never reprinted were never given an ISBN-13. Books that did have multiple printings could have both types of ISBNs. Which do you store in your inventory table? Or do you store both? Storing both is redundant and not necessary because they can be computed from each other.<\/p>\n<p>However, imagine that you just bought out your competitor and you\u2019re trying to merger two inventories that use radically different stock numbers. This is the situation in the automobile industry. The same part can be sold under different stock numbers, and generally the cheaper line of cars sells the same parts for less than the more expensive line of cars from the same manufacturer. I think this is called \u201cthe law of supply and crooks\u201d in economics. There\u2019s a good business in providing a database with equivalent part numbers.<\/p>\n<h2>Dr. Codd and COBOL<\/h2>\n<p>When Dr. Codd defined the normal forms, COBOL was really the only language that people did any business programming in. Even today, most business applications are still in COBOL underneath the covers for you can\u2019t see it. Even if you\u2019ve never written COBOL, it\u2019s very much affected the way we do commercial programming. Even today, most of the world\u2019s business data processing is in COBOL under the covers; and even when it isn\u2019t, the COBOL mindset affects the design.<\/p>\n<p>First Normal Form (1NF) is supposed to remove repeated groups. In COBOL, there is a <code>OCCURS (n)<\/code> operation that says make multiple copies of the hierarchical structure to which it is attached. For example, I might create part of a record declaration (this is like the DDL in SQL) and have it occur seven times, once each for the seven days of the week. The number of occurrences does not have to be a fixed, integer; it can be another field the same record which determines how many repetitions there are when the data is read.<\/p>\n<p>The relational model depends on knowing what the structure of the data is, so this approach to data is not desirable. Think about how you have to define a joint operation in this environment.<\/p>\n<p>Technically, this is what Dr. Codd meant by \u201crepeated group\u201d, but it was later extended to include arrays and anything fancier than a simple scalar atomic value.<\/p>\n<p><strong>Conclusion<\/strong><\/p>\n<p>Please notice that what were concerned about is<em> logical <\/em>redundancy. <em>Physical<\/em> redundancy can be very handy in a database. But that\u2019s another topic.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Ever wonder all the reasons that we use databases instead of file systems? While we don\u2019t think of it too much anymore, the first reason that databases came into existence was to remove redundancies. The first source of redundancy back in the dark ages, when I was just beginning the program, was a product called&#8230;&hellip;<\/p>\n","protected":false},"author":96214,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[53,143539],"tags":[159072],"coauthors":[6781],"class_list":["post-101527","post","type-post","status-publish","format-standard","hentry","category-featured","category-theory-and-design","tag-database-theory"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/101527","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/96214"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=101527"}],"version-history":[{"count":4,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/101527\/revisions"}],"predecessor-version":[{"id":101921,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/101527\/revisions\/101921"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=101527"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=101527"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=101527"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=101527"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}