{"id":77411,"date":"2018-02-28T11:56:19","date_gmt":"2018-02-28T11:56:19","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=77411"},"modified":"2021-09-29T16:21:07","modified_gmt":"2021-09-29T16:21:07","slug":"charles-bachman-pointer-chains","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/t-sql-programming-sql-server\/charles-bachman-pointer-chains\/","title":{"rendered":"Charles Bachman and Pointer Chains"},"content":{"rendered":"<p>Charles Bachman died on 2017-07-13 at age 92. I will bet that most of the readers of this article don\u2019t remember him or, maybe, never heard of him. He was one of the early pioneers in databases, but he had the unfortunate habit of putting his money on the wrong horse. He was the principal architect of the navigational, or CODASYL, database model. I thought of him because I was looking in my library and came across my copy of \u2018ANSI X3.133-1986 database language NDL\u2019, the specification for a network model language. It\u2019s dead, having expired in 1998, and even when we created it we were not expecting it to live very long. The main reason for writing and specifying it was to get common terms in a standard for reference.<\/p>\n<p>The relational model became standard, perhaps due to a contest at the ACM event. Two teams were presented with a medium to high complex database business problem. The relational and the navigational solutions were compared, and the navigational solution was wrong! If anyone has a copy of the test question, I would love to get it forwarded to me; I would love to figure out a solution using SQL, with all the additions that have been made to the language after SQL-92.<\/p>\n<p>There were many early navigational database products. None of them followed NDL completely. The most successful one was IMS from IBM, which is still sold today and has a huge installed base. It\u2019s very hierarchical and depends on negotiating tree structures. IDMS is another popular product, but there were also products like TOTAL from Cincom.<\/p>\n<p>I worked with a version of TOTAL on HP\/3000 hardware marketed as IMAGE 3000, which is fairly representative of the products at the time. Each unit of work consisted of a hash table (which had to be a prime number of pre-allocated records because of the weak hashing algorithm), and simple linked lists of subordinate records. Now here\u2019s where we start to get into terminology. At one point this relationship was referred to as \u2018slave and master\u2019 which was a little offensive, so in the NDL we used \u2018owner and record\u2019 which then became \u2018parent and child\u2019 in the literature.<\/p>\n<p>In 1963 Bachman developed the Integrated Data Store (IDS) for General Electric. This was one of the first navigational database products. This product evolved into IDMS from Cullinane Information Systems (later Cullinet) on IBM mainframes. Charlie Bachman innovated a true network DBMS at Honeywell, but it didn\u2019t turn into a serious product at that time. B. F. Goodrich, however, ran a version.<\/p>\n<p>He was not ignored by the academic community. He received the Turing Award from the Association for Computing Machinery (ACM) in 1973 for his \u2018outstanding contributions to database technology\u2019.<\/p>\n<p>He was elected as a Distinguished Fellow of the British Computer Society in 1977 for his pioneering work in database systems. He was awarded a National Medal of Technology and Innovation (2012) for \u2018fundamental inventions in database management, transaction processing, and software engineering\u2019. He was elected to ACM Fellow (2014) for contributions to database technology, notably the integrated data store. In 2015, he was made a Fellow of the Computer History Museum for his early work on developing database systems. Mr. Bachman published dozens of articles and papers in both the trade and the academic press. The most important paper was probably <a href=\"https:\/\/academic.microsoft.com\/#\/detail\/2029098176?FORM=DACADP\">The Programmer as Navigator<\/a>. (1973 ACM Turing Award lecture, Communications of the ACM vol. 16, no. 11, November 1973) in which he explained basic concepts of navigational databases.<\/p>\n<h2><a id=\"post-77411-Logical_Data_Model\"><\/a>Logical Data Model<\/h2>\n<p>Bachman also made another contribution to databases. His <a href=\"https:\/\/cio-wiki.org\/home\/loc\/home?page=bachman-diagram\">Bachman diagrams<\/a> were an early attempt at systematically building logical data models. Before that, we wrote file layouts using code sheets provided by IBM and ANSI X3.5 program flowcharts. Flowcharts are one of the worst ways to define programs because you&#8217;re not sure what&#8217;s flowing; sometimes it&#8217;s data and sometimes it&#8217;s control. That is why that particular standard expired decades ago.<\/p>\n<p>The Bachman diagrams are made up of boxes and arrows. In fact, there was a small-circulation newsletter for DBAs and systems analysts who worked with IDMS and other older mainframe database products entitled \u2018Boxes and Arrows\u2019 based on the appearance of Bachman diagrams.<\/p>\n<p>The main structuring concepts in this model are \u2018records\u2019 and \u2018sets\u2019. Records essentially follow the <a href=\"https:\/\/en.wikipedia.org\/wiki\/COBOL\">COBOL<\/a> definition ordered sequences of fields of different types: this allows complex internal structure such as repeating items and repeating groups. There are no virtual or computed fields. The set concept should not be confused with the mathematical set that used in an RDBMS. A <a href=\"https:\/\/en.wikipedia.org\/wiki\/CODASYL\">CODASYL<\/a> set represents a one-to-many relationship between records: one owner, many members. Unlike a strictly hierarchical data model, a record can be a member of many different sets. But unlike the set concept from RDBMS, these guys are ordered (so they are not really sets) and the records within a set are also in a sequence!<\/p>\n<p>Records have a locator represented by a value known as a <strong>database key<\/strong>. Did you ever wonder where Sybase got the idea of the <strong>IDENTITY<\/strong> table property when they first implemented SQL? In IDMS, as in most other CODASYL implementations, the database key is directly related to the<em> physical <\/em>address of the record on disk, without regard to any of the values in the record. In short, it\u2019s not relational. Database keys are then used as pointers to linked lists and trees. This means that the logical model and the physical implementation look a lot alike. Here\u2019s the trade-off with this approach; Database retrieval can be pretty efficient because it\u2019s really fast to traverse lists. But loading and restructuring these linked lists can be <em>very<\/em> expensive. Also, if a linked list was broken, you needed special utility programs to try and rebuild them. It was not always possible to do these rebuilds. Finally, if you needed to access the data in a particular order, which was not how the sets or records are physically ordered, then you have the overhead of sorting.<\/p>\n<h2>Flavors of Pointers<\/h2>\n<p>Many SQL programmers have never worked with pointers. When I explained the nested set model to programmers in some of my classes, the older programmers have a real hard time with it because they want to build tree structures with pointer chains. This is why you will see an organizational chart done with (<strong>emp_id<\/strong>, <strong>boss_emp_id<\/strong>) pairs to model the head and tail of a pointer chain link. This was so much a part of the mindset at the start of RDBMS, that the Oracle sample database (\u2018Scott\/Tiger\u2019 is the nickname it\u2019s gotten over the decades because of the user and the password in it) is built on this adjacency list model.<\/p>\n<p>However, younger programmers look at this approach in SQL and see it as a badly denormalized table. The same employee identifier appears in multiple places in the same table, in violation of First Normal Form (1NF). It requires very complex constraints to prevents cycles ensure that the hierarchy is a proper tree.<\/p>\n<p>The nested set model doesn\u2019t have the normal form violations and redundancy. New programmers who didn\u2019t grow up with assembly language and low-level machine constructs look at it and say \u201cOh, this is XML tags written in a different language!\u201d instead.<\/p>\n<h2>Simple Linked List<\/h2>\n<p>When people on SQL forums refer to a \u2018link table\u2019, they are using the term that refers to pointer chains. The simplest possible list structure is a single linked list, which would look something like this<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"653\" height=\"252\" class=\"wp-image-77412\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/02\/word-image-37.png\" \/><\/p>\n<p>You enter the chain at a node and either pull out the data attached to it or get the address of the next link in the chain. Eventually will follow the path down to a \u2018nil\u2019 node (not to be confused with the <strong>NULL<\/strong> in SQL). You\u2019ll notice that the structure allows you to move in only one direction.<\/p>\n<p>This limitation on traversal is not as bad as you might first think. If you been working with SQL Server for a few decades, you will remember that the cursors in the very early versions of the product also could only read forward. The reason for this was very simple; the early tape drives from which model of cursors was developed could only read in one direction! This was not just on small machines, but pretty much on all the mainframe tape drives too. You just got in the habit of writing programs that would work with this limitation. Yes, we were very simple back in those days.<\/p>\n<p>If you really needed to go back to a prior record in the sequence, you had to close a cursor then reopen it, remember the count of record you had read, and begin rereading from the front of the cursor all over again. Yuck.<\/p>\n<h2>Double Linked List.<\/h2>\n<p>The NDL Standard included next and prior commands. They could move up and down pointer chains in both directions. This obviously requires another part of the node to have the prior address as well as the next address. This reflects the way magnetic tapes work today. It\u2019s also pretty much how everybody thinks of linked lists today. It costs you one extra pointer per node and gives you quite a bit of control over your data navigation. Most of the time, a pointer\u2019s the same size as an integer; did you ever wonder why people traditionally use the integer datatypes for their identifiers when they\u2019re learning SQL? It is a rule of all advances in technology that a new technology will first imitate the old technology and then eventually find its own voice. This is why your cell phone&#8217;s camera makes a clicking sound like a mechanical shutter. This imitation is called a \u2018skeuomorph\u2019, and it&#8217;s a great crossword puzzle word. Thus, early skyscrapers tried to look as much like Greek and Roman temples as they could, until the fact that skyscrapers are tall finally sank into people\u2019s minds and we began to celebrate that fact.<\/p>\n<h2>Junctions<\/h2>\n<p>The junction refers to a collection of pointers that go to various data nodes. You will see people referring to \u2018junction tables\u2019 in SQL forums, not realizing this concept is not relational at all. What they usually want to do in the relational model would be a multi\u2013table VIEW or query. When you did this with pointer chains, traversing it could be complicated. Each vendor had different implementations, but when we got to SQL, all we had was the REFERENCES clause.<\/p>\n<h2>REFERENCES<\/h2>\n<p>In SQL we have the concept of a reference, and referential integrity. It has nothing whatsoever to do with any kind of traversal. It\u2019s derived from a data modeling concept of weak and strong entities. A weak entity is one that requires a strong entity in order to exist. For example, it doesn\u2019t make any sense to have order items unless you have an order. Strong entity can exist on its own; you might have an order with no items on it yet.<\/p>\n<p>In SQL we have referenced tables and referencing tables. Unlike pointer chains, they can be on the same table. Circular links in the pointer chain is usually a problem. Since pointers are based on traversals, you can wind up in an endless loop when you try to follow the chain.<\/p>\n<p>If you had a good algorithm\u2019s course, you might remember some of the tricks for detecting cycles in linked list structures (create two pointers, hold both on a node in the list, and advance one of them one node at a time; if the two pointers are ever equal again, you have a cycle. Otherwise, repeat the procedure on the next node in the list. If you go through all the nodes in your list, then there was no cycle). This is essentially how a recursive CTE (common table expression) works in SQL Server, but if it finds a cycle it continues running until it reaches an upper limit of repetitions.<\/p>\n<h2>Self-REFERENCE Constraints<\/h2>\n<p>The best example of a self-reference is Kuznetsov\u2019s History Table SQL idiom which builds an unbroken temporal chain from the current row to the previous row. This is easier to show with code:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">  CREATE TABLE Tasks\r\n  (task_id CHAR(5) NOT NULL,\r\n   task_score CHAR(1) NOT NULL,\r\n   previous_end_date DATE, -- null means first task\r\n   current_start_date DATE DEFAULT CURRENT_TIMESTAMP NOT NULL,\r\n   CONSTRAINT previous_end_date_and_current_start_in_sequence\r\n     CHECK (prev_end_date &lt;= current_start_date),\r\n   current_end_date DATE, -- null means unfinished current task\r\n   CONSTRAINT current_start_and_end_dates_in_sequence\r\n     CHECK (current_start_date &lt;= current_end_date),\r\n   CONSTRAINT end_dates_in_sequence\r\n     CHECK (previous_end_date &lt;&gt; current_end_date)\r\n   PRIMARY KEY (task_id, current_start_date),\r\n   UNIQUE (task_id, previous_end_date), -- null first task\r\n   UNIQUE (task_id, current_end_date), -- one null current task\r\n   FOREIGN KEY (task_id, previous_end_date)  -- self-reference\r\n     REFERENCES Tasks (task_id, current_end_date));<\/pre>\n<p>Well, that looks complicated! Let\u2019s look at it column by column. <strong>Task_id<\/strong> explains itself. The <strong>previous_end_date<\/strong> will not have a value for the first task in the chain, so it is <strong>NULL<\/strong>-able. The <strong>current_start_date<\/strong> and <strong>current_end_date<\/strong> are the same data elements, temporal sequence and <strong>PRIMARY KEY<\/strong> constraints we had in the simple history table schema.<\/p>\n<p>The two <strong>UNIQUE<\/strong> constraints will allow one <strong>NULL<\/strong> in their pairs of columns and prevent duplicates. Remember that <strong>UNIQUE<\/strong> is not like <strong>PRIMARY KEY<\/strong>, which implies <strong>UNIQUE NOT NULL<\/strong>.<\/p>\n<p>In ANSI\/ISO Standard SQL a constraint can be executed immediately at the start of a transaction or deferred until the end of the transaction. SQL Server assumes the constraints are immediate, so there&#8217;s no way to validate a constraint that isn&#8217;t already in place when you begin a transaction. The trick is to turn off the constraint, insert a starting row, turn the constraint back on and then self-reference it in the same table. You might want to play with this a little bit. It can be tricky. Fortunately, Alex Kuznetsov has written a <a href=\"https:\/\/www.red-gate.com\/simple-talk\/sql\/t-sql-programming\/modifying-contiguous-time-periods-in-a-history-table\/\">Simple Talk article<\/a> to explain in more detail how it is done.<\/p>\n<p>The database can be in a state that would otherwise be illegal. But at the end of the session all constraints must be <strong>TRUE<\/strong> or <strong>UNKNOWN<\/strong>.<\/p>\n<p>In SQL Server, you can disable constraints and then turn them back on. It actually is restricted to disabling <strong>FOREIGN KEY<\/strong> constraints and <strong>CHECK<\/strong> constraints. <strong>PRIMARY KEY<\/strong>, <strong>UNIQUE<\/strong>, and <strong>DEFAULT<\/strong> constraints are always enforced. The syntax for this is part of the <strong>ALTER TABLE<\/strong> statement. The syntax is simple:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">ALTER TABLE &lt;table name&gt; NOCHECK CONSTRAINT [&lt;constraint name&gt; | ALL];<\/pre>\n<p>This is why you should name the constraints; without user given names, you have to look up what the system gave you and they are always long and messy. The <strong>ALL<\/strong> option will disable all of the constraints in the entire schema. Be careful with it.<\/p>\n<p>To re-enable, the syntax is similar and explains itself:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">ALTER TABLE &lt;table name&gt; CHECK CONSTRAINT [&lt;constraint name&gt; | ALL];<\/pre>\n<p>When a disabled constraint is re-enabled, the database does not check to ensure any of the existing data meets the constraints. So, for this table, the body of a procedure to get things started would look like this:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">BEGIN\r\n  ALTER TABLE Tasks NOCHECK CONSTRAINT ALL;\r\n  INSERT INTO Tasks (task_id, task_score, current_start_date, current_end_date, previous_end_date)\r\n    VALUES (1, 'A', '2010-11-01', '2010-11-03', NULL);\r\n  ALTER TABLE Tasks CHECK CONSTRAINT ALL;\r\nEND;<\/pre>\n<h2>REFERENTIAL Actions<\/h2>\n<p>A <strong>REFERENCES<\/strong> clause has more power than a simple link. You can attach actions to the referencing constraints in one table which will affect matching columns in the referenced table. Think about these in terms of the strong and weak entity model. The syntax is fairly straightforward:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk \">   [FOREIGN KEY (ref_column list)]\r\n   REFERENCES referenced_table_name (ref_column list) \r\n   ON DELETE {NO ACTION | CASCADE | SET NULL | SET DEFAULT} \r\n   ON UPDATE {NO ACTION | CASCADE | SET NULL | SET DEFAULT} <\/pre>\n<p>If you have a single column reference, there is no need to put a list of referencing columns in a <strong>FOREIGN KEY<\/strong> clause, so you don\u2019t need the foreign key clause. Just add the constraint to the single column definition. The referenced columns must be UNIQUE, so they are usually the <strong>PRIMARY KEY<\/strong> of the referenced table. However, it\u2019s perfectly all right to reference a <strong>UNIQUE()<\/strong> constraint in the referenced table.<\/p>\n<p>It is also possible to add a constraint name to the <strong>REFERENCES<\/strong> clause. If you have the name, then you can use it to alter the table, by dropping it or by putting it in a <strong>NO CHECK<\/strong> state. This is local syntax, and not ANSI\/ISO Standard. The standards have a deferred option which describes how constraint is handled at various times in the execution of a statement. It can get a little tricky and I\u2019m not going to discuss it here.<\/p>\n<p>When the database changes in some way, we have what\u2019s called a database event. The two database events that referable referential actions are concerned with are updates and deletions. There are four options. We picked these options by looking at what people had been writing triggers.<\/p>\n<p><strong>NO ACTION<\/strong> This is the default and it pretty much explains itself. This is the default option, which says that you cannot modify the referencing columns unless first change the referenced columns. To make that a little more real, this is how you would enforce a rule that says you cannot take an order for an item that isn\u2019t in inventory.<\/p>\n<p><strong>CASCADE<\/strong> A cascade option is probably the most common one in actual use. If I have a delete with the cascade when I remove something in the reference table, it is also deleted from all of the referencing tables. Again, a more of a real world of example, if I delete an account, I can also delete all the <strong>REFERENCES<\/strong> to it in my databases. If I update an account number, I can cascade this update to all the referencing tables.<\/p>\n<p><strong>SET NULL<\/strong> This pretty much explains itself. This assumes that the referenced columns are all null-able. Since a <strong>PRIMARY KEY<\/strong> cannot have null-able columns by definition, this can never be used on them. However, it\u2019s okay for things with <strong>UNIQUE()<\/strong> constraints<\/p>\n<p><strong>SET DEFAULT<\/strong> This pretty much explains itself. This assumes that the referenced columns all have a default value.<\/p>\n<h2>Conclusion<\/h2>\n<p>In theory, anything you can do with procedural code, you can do with declarative code. The gimmick is you don\u2019t do it the same way or with the same mindset.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Joe Celko reminisces about the origins of databases and one of the early pioneers, Charles Bachman. He explains how pointer chains were used to traverse data in the NDL standard and referential integrity that we use today.&hellip;<\/p>\n","protected":false},"author":96214,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143531],"tags":[],"coauthors":[6781],"class_list":["post-77411","post","type-post","status-publish","format-standard","hentry","category-t-sql-programming-sql-server"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/77411","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/96214"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=77411"}],"version-history":[{"count":4,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/77411\/revisions"}],"predecessor-version":[{"id":77416,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/77411\/revisions\/77416"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=77411"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=77411"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=77411"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=77411"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}