{"id":92189,"date":"2021-08-23T20:07:46","date_gmt":"2021-08-23T20:07:46","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=92189"},"modified":"2022-04-24T20:49:27","modified_gmt":"2022-04-24T20:49:27","slug":"a-unique-experience","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/learn\/a-unique-experience\/","title":{"rendered":"A UNIQUE experience"},"content":{"rendered":"<p>If you look at the postings on SQL forums, you will notice that people are sloppy. They expect that the reader can read their minds and figure out everything that\u2019s not been posted. They seldom post DDL, and when they do, it is also wrong. I don\u2019t mean a little bit wrong. I mean fundamentally wrong. RDBMS has the concepts of (Relational) keys, so DDL without at least one key means you don\u2019t have a table at all. If all the columns are <code>NULL<\/code>-able, they cannot ever be part of any key. This non-table is really a deck of 1960s punch cards hidden as an SQL table.<\/p>\n<p>Dr. Codd defined a key as being a subset of columns in a table. A column is defined as an attribute of the entity that the table models. The <code>IDENTITY<\/code> table property is left over from sequential file days is not an attribute of an entity; it is a metadata attribute of the table. The GUID suffers the same problem of not being a proper attribute of the entities within the table in which they appear. Yes, they are <code>UNIQUE<\/code>, but they\u2019re not attributes. Just sticking the constraint of <code>PRIMARY<\/code> <code>KEY<\/code> on them doesn\u2019t avoid the fundamental problem. Novices forget that the \u201cG\u201d in <code>GUID<\/code> stands for \u201cglobal,\u201d and they are not local. Such attributes are specifically not in the table! These things cannot be keys by definition.<\/p>\n<h2>What is a key?<\/h2>\n<p>The definition of the key:<\/p>\n<p>1) A key must be a non-empty subset of columns in the table (attributes of the entity). This means it cannot be an external pointer or physical locator to rows in the table. As an analogy, when you identify an automobile, you don\u2019t use a parking space number in some particular garage; you use the VIN.<\/p>\n<p>2) A key must be unique within the table. This is necessary but not sufficient. In fact, the best keys are unique across the whole schema and then some. This is another reason to use VIN for automobiles; it\u2019s useful to the DMV, your insurance company, and any other use you should have for that automobile. It can be validated by using a really ugly regular expression or verified by physically going to your car and looking at the chassis or the etchings on your windshield. Validation and verification of data is a slightly different topic, so I\u2019ll just skip over them for now.<\/p>\n<p>3) A key must be precise, not approximate or vague. You don\u2019t think about it, but this automatically eliminates approximate numeric types as keys. Two floating-point numbers are considered equal if they differ by a small value called the epsilon. This means it two different floating-point numbers can be treated as equal. This need for precision is also the reason that <code>NULLs<\/code> are not allowed in a key. Remember that a <code>NULL<\/code> is not a value. I\u2019ll get into some of the problems with <code>NULLs<\/code> shortly.<\/p>\n<h2>A bit of history<\/h2>\n<p>When Dr. Codd created the relational model, data was still in file systems and a few early networks and hierarchical databases. Our mindset was still stuck in these pre-relational tools, so the good doctor came up with the idea that every table should have a <code>PRIMARY<\/code> <code>KEY<\/code>. It was defined as a key that we anointed as somehow special. A table could only have one of these (Wow! Just like the sort order in sequential files!).<\/p>\n<p>In the early days of the SQL Standards, we considered requiring a <code>PRIMARY<\/code> <code>KEY<\/code> on a table but decided against it. The idea was that legacy data without keys (think mag tapes and punch cards) would be easier to move into SQL products without that constraint.<\/p>\n<p>Shortly after that, Dr. Codd realized that a key is a key. There was no need to mark one as special. This is what led to the <code>UNIQUE<\/code> constraint.<\/p>\n<h2>UNIQUE and PRIMARY KEY constraints<\/h2>\n<p>The <code>UNIQUE<\/code> and <code>PRIMARY<\/code> <code>KEY<\/code> constraints say that no duplicate values are allowed in the column. It comes in two forms. The syntax is:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">&lt;UNIQUE specification&gt; ::= UNIQUE | PRIMARY KEY<\/pre>\n<p>There are some subtle differences between <code>UNIQUE<\/code> and <code>PRIMARY<\/code> KEY. There still can be only one <code>PRIMARY<\/code> <code>KEY<\/code> per table but many <code>UNIQUE<\/code> columns. A <code>PRIMARY<\/code> <code>KEY<\/code> is automatically declared to have a <code>NOT<\/code> <code>NULL<\/code> constraint on it, but a <code>UNIQUE<\/code> column can have a single <code>NULL<\/code> in it unless you explicitly add a <code>NOT<\/code> <code>NULL<\/code> constraint. Adding the <code>NOT<\/code> <code>NULL<\/code> whenever possible is a good idea, as it makes the column into a proper relational key. I also add <code>NOT<\/code> <code>NULL<\/code> to <code>PRIMARY<\/code> <code>KEY<\/code> declarations to document the table and to be sure it stays there when the key changes.<\/p>\n<p>File system programmers understand the concept of a <code>PRIMARY<\/code> KEY, but for the wrong reasons. Their mindset is a sequential file, which can have only one key, because that key is used to determine the physical sort order of the records within the file. There is no ordering in a table; the term <code>PRIMARY<\/code> <code>KEY<\/code> in SQL has to do with defaults in referential actions, which is another topic.<\/p>\n<p>A single <code>NULL<\/code> is allowed in the <code>UNIQUE<\/code> constraint to be able to match a <code>NULL<\/code> as if it were a value. The same type of logic is applied to the <code>GROUP<\/code> <code>BY<\/code> clause so that each <code>NULL<\/code> doesn\u2019t become its own group. There\u2019s a cute story about this decision that has to do with traffic tickets. Before SQL, someone got a prestige tag reading <em>missing<\/em> and was inundated by hundreds of traffic tickets because that\u2019s what the police officers wrote for the missing auto tag everywhere in California.<\/p>\n<h2>Multi-column UNIQUE constraints<\/h2>\n<p>There is also a multiple-column form of the <code>&lt;UNIQUE specification&gt;<\/code>, which is usually written at the end of the column declarations. It is a list of columns in parentheses after the appropriate keyword; it means that the combination of those columns is unique.<\/p>\n<p>Broadly speaking, there are two patterns for multi-column keys. One pattern is like coordinates, where each attribute is strong and has meaning in its own right. For example, I might declare <code>PRIMARY<\/code> <code>KEY<\/code> (city, department) so I can be sure that although I have offices in many cities and many identical departments in those offices, there is only one personnel department in Chicago.<\/p>\n<p>The other pattern is hierarchical: one attribute is subordinate to another attribute. A classic example is an order number and an order detail number subordinated to that particular order. This has to do with weak and strong entities, and that\u2019s another topic.<\/p>\n<h2>Super keys<\/h2>\n<p>A super key is a key that is too big. It has more columns in it than are needed to identify an entity in a table. In practice, you generally don\u2019t like these things. The indexes or hash tables or whatever your SQL engine uses to implement uniqueness will have to do extra work carrying the redundant attributes. The problem is it isn\u2019t always obvious which attributes are redundant. Imagine a super key <code>UNIQUE<\/code> <code>(a, b, c, d)<\/code> In which any three of the four columns is also unique. Which possible combination(s) do you want to keep?<\/p>\n<h2>Overlapping keys<\/h2>\n<p>You can also have overlapping keys, as in <code>UNIQUE (a, b, c)<\/code> and <code>UNIQUE (b, c, d)<\/code>, which share columns <code>(b, c)<\/code>.<\/p>\n<h2>Nested unique constraints<\/h2>\n<p>One of the basic tricks in SQL is representing a one-to-one or many-to-many relationship with a table that references the two (or more) entity tables related by their primary keys. This third table has several popular names such as &#8220;junction table,&#8221; \u201cAssociative Entity,\u201d or &#8220;join table,&#8221; but we know that it is a relationship. The term &#8220;junction table&#8221; is a pointer structure from network databases, not part of an RDBMS. For example, given two tables for a Cafeteria database:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">CREATE TABLE Entrees\r\n(entree_name VARCHAR(30) NOT NULL PRIMARY KEY\r\n ...);\r\nCREATE TABLE Sidedishes\r\n(sidedish_name VARCHAR(30) NOT NULL PRIMARY KEY,\r\n ... );<\/pre>\n<p>Yes, I know using names for a key is a bad practice, but it will make my examples easier to read. The simplest relationship table looks like this:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">CREATE TABLE Meals\r\n(entree_name VARCHAR(30) NOT NULL\r\n  REFERENCES Entrees (entree_name),\r\n sidedish_name VARCHAR(30) NOT NULL,\r\n  REFERENCES Sidedishes(sidedish_name));<\/pre>\n<p>The <em>Meals<\/em> table allows creating simple one entree\/one side dish meals insert rows like this:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">INSERT INTO Meals\r\nVALUES\r\n('Meat Loaf', 'Mashed Potatoes'),\r\n('Meat Loaf', 'Green Beans'),\r\n('Fried Chicken', 'Green Beans'),\r\n('Meat Loaf', 'Mashed Potatoes');<\/pre>\n<p>Oops! \u2018<em>Meat<\/em> <em>Loaf\u2019<\/em> is shown twice with <em>&#8216;Mashed<\/em> <em>Potatoes&#8217;<\/em> because the <em>Meals<\/em> table does not have its own key. This is an easy mistake to make, but fixing it is not obvious.<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">CREATE TABLE Menu\r\n(entree_name VARCHAR(30) NOT NULL\r\n  REFERENCES Entrees (entree_name),\r\n sidedish_name VARCHAR(30) NOT NULL,\r\n  REFERENCES Sidedishes(sidedish_name),\r\n PRIMARY KEY (entree_name, sidedish_name)); -- compound key<\/pre>\n<p>The <em>Menu<\/em> table gets rid of the duplicated rows and makes this a proper table. The <code>PRIMARY<\/code> <code>KEY<\/code> for the table is made up of two or more columns and is called a compound key because of that fact. These are valid rows now.<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">('Meat Loaf', 'Mashed Potatoes')\r\n('Meat Loaf', 'Green Beans')\r\n('Fried Chicken', 'Green Beans')<\/pre>\n<p>But the only restriction on the <em>Meals<\/em> is that they appear only once. Every entree can be paired with every side dish. I want a rule that entrees can have as many side dishes as they want, but the side dishes have to stick to one entree.<\/p>\n<p>The way I do this is to use a <code>NOT<\/code> <code>NULL<\/code> <code>UNIQUE<\/code> constraint on the <code>sidedish_name column<\/code>, which makes it a key. It is a simple key since it is only one column, but it is also a nested key because it appears as a subset of the compound <code>PRIMARY<\/code> <code>KEY<\/code>.<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">CREATE TABLE Plate_Combos\r\n(entree_name VARCHAR(30) NOT NULL\r\n  REFERENCES Entrees (entree_name),\r\n sidedish_name VARCHAR(30) NOT NULL UNIQUE, -- nested key\r\n  REFERENCES Sidedishes(sidedish_name),\r\n PRIMARY KEY (entree_name, sidedish_name)); -- compound key<\/pre>\n<p><em>Plate_Combos<\/em> is a proper table, without duplicated rows, but it also enforces the condition that each entree can have one or more side dishes.<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">('Meat Loaf', 'Mashed Potatoes')\r\n('Meat Loaf', 'Green Beans')<\/pre>\n<p>Instead, you might want the side dishes to be combined with more than one entree.<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">CREATE TABLE Single_Side_Plates\r\n(entree_name VARCHAR(30) NOT NULL UNIQUE -- nested key\r\n  REFERENCES Entrees (entree_name),\r\n sidedish_name VARCHAR(30) NOT NULL,\r\n  REFERENCES Sidedishes(sidedish_name),\r\n PRIMARY KEY (entree_name, sidedish_name)); -- compound key<\/pre>\n<p>The <em>Single_Side_Plates<\/em> table would permit these rows from the original set.<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">('Meat Loaf', 'Green Beans')\r\n('Fried Chicken', 'Green Beans')<\/pre>\n<p>Think about all of these possible keys for a minute. The compound <code>PRIMARY<\/code> <code>KEY<\/code> is now redundant. If each entree appears only once in the table or each side dish appears only once in the table, then each (<code>entree_name<\/code>, <code>sidedish_name<\/code>) pair can appear only once. A rather dull option of a single entree with a single side dish can be enforced with this DDL:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">CREATE TABLE Lunch_Plates\r\n(entree_name VARCHAR(30) NOT NULL UNIQUE -- nested key\r\n  REFERENCES Entrees (entree_name),\r\n sidedish_name VARCHAR(30) NOT NULL UNIQUE -- nested key,\r\n  REFERENCES Sidedishes(sidedish_name),\r\n PRIMARY KEY(entree_name, sidedish_name)); -- redundant compound key<\/pre>\n<p>The Meals table allows inserting these rows from the original set.<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">('Meat Loaf', 'Mashed Potatoes')\r\n('Fried Chicken', 'Green Beans')<\/pre>\n<p>Making special provisions for the <code>PRIMARY<\/code> <code>KEY<\/code> in the SQL engine is not a bad assumption because the <code>REFERENCES<\/code> clause uses the <code>PRIMARY<\/code> <code>KEY<\/code> of the referenced table as the default. Many new SQL programmers are unaware that a <code>FOREIGN<\/code> <code>KEY<\/code> constraint can <em>also<\/em> reference any <code>UNIQUE<\/code> constraint in the same table or another table. Such things can get very complicated and require that you defer constraints. Again, that\u2019s another topic.<\/p>\n<h2>Overlapping keys<\/h2>\n<p>Getting back to the nested keys, just how far can you go with them? My favorite example is a teacher&#8217;s schedule kept in a table like this (I am leaving off <code>REFERENCES<\/code> clauses and <code>CHECK()<\/code> constraints):<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">CREATE TABLE Class_Schedule -- skeleton table. WRONG!\r\n(teacher_name VARCHAR(15) NOT NULL,\r\n class_title CHAR(15) NOT NULL,\r\n room_nbr INTEGER NOT NULL,\r\n period_nbr INTEGER NOT NULL,\r\n PRIMARY KEY (teacher_name, class_title, room_nbr, period_nbr));<\/pre>\n<p>That choice of a <code>PRIMARY<\/code> <code>KEY<\/code> is the most obvious one &#8212; use all the columns. Typical rows would look like this:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\"> ('Mr. Celko', 'Database 101', 222, 6)<\/pre>\n<p>The rules to be enforced are:<\/p>\n<p>1) A teacher is in only one room each period.<\/p>\n<p>2) A teacher teaches only one class each period.<\/p>\n<p>3) A room has only one class each period.<\/p>\n<p>4) A room has only one teacher in it each period.<\/p>\n<p>Stop reading and see what you come up with for an answer. Okay, now consider using one constraint for each rule in the list.<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">CREATE TABLE Class_Schedule_1 -- version one, still WRONG!\r\n(teacher_name VARCHAR(15) NOT NULL,\r\n class_title CHAR(15) NOT NULL,\r\n room_nbr INTEGER NOT NULL,\r\n period_nbr INTEGER NOT NULL,\r\n UNIQUE (teacher_name, room_nbr, period_nbr), -- rule #1\r\n UNIQUE (teacher_name, class_title, period_nbr), -- rule #2\r\n UNIQUE (class_title, room_nbr, period_nbr), -- rule #3\r\n UNIQUE (teacher_name, room_nbr, period_nbr), -- rule #4\r\n PRIMARY KEY (teacher_name, class_title, room_nbr, period_nbr));<\/pre>\n<p>There are four ways to pick three things from a set of four things; it is called a combination. I could drop the <code>PRIMARY<\/code> <code>KEY<\/code> as redundant if I have all four of these constraints in place. But what happens if I drop the <code>PRIMARY<\/code> <code>KEY<\/code> and then one of the constraints?<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">CREATE TABLE Class_Schedule_2 -- still wrong\r\n(teacher_name VARCHAR(15) NOT NULL,\r\n class_title CHAR(15) NOT NULL,\r\n room_nbr INTEGER NOT NULL,\r\n period_nbr INTEGER NOT NULL,\r\n UNIQUE (teacher_name, room_nbr, period_nbr), -- rule #1\r\n UNIQUE (teacher_name, class_title, period_nbr), -- rule #2\r\n UNIQUE (class_title, room_nbr, period_nbr)); -- rule #3<\/pre>\n<p>I can now insert these rows in the second version of the table:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\"> ('Mr. Celko', 'Database 101', 222, 6)\r\n ('Mr. Celko', 'Database 102', 223, 6)<\/pre>\n<p>This gives me a very tough sixth-period teaching load since I have to be in two different rooms at the same time. Things can get even worse when another teacher is added to the schedule:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\"> ('Mr. Celko', 'Database 101', 222, 6)\r\n ('Mr. Celko', 'Database 102', 223, 6)\r\n ('Ms. Shields', 'Database 101', 223, 6)<\/pre>\n<p>Ms. Shields and I are both in room 223, trying to teach different classes at the same time. Matthew Burr looked at the constraints and rules and came up with this analysis.<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">CREATE TABLE Class_Schedule_3 -- corrected version\r\n(teacher_name VARCHAR(15) NOT NULL,\r\n class_title CHAR(15) NOT NULL,\r\n room_nbr INTEGER NOT NULL,\r\n period_nbr INTEGER NOT NULL,\r\n UNIQUE (teacher_name, period_nbr), -- rules #1 and #2\r\n UNIQUE (room_nbr, period_nbr)); -- rules #3 and #4<\/pre>\n<p>If a teacher is in only one room each period, then given a period and a teacher, I should be able to determine only one room, i.e., the room is functionally dependent upon the combination of teacher and period. Likewise, if a teacher teaches only one class each period, then class is functionally dependent upon the combination of teacher and period. The same thinking holds for the last two rules: class is functionally dependent upon the combination of room and period, and teacher is functionally dependent upon the combination of room and period.<\/p>\n<p>With the constraints provided in the first version, you will find that the rules are not enforced. For example, I could enter the following rows:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\"> ('Mr. Celko', 'Database 101', 222, 6)\r\n ('Mr. Celko', 'Database 102', 223, 6)<\/pre>\n<p>These rows violate rule #1 and rule #2.<\/p>\n<p>However, the <code>UNIQUE<\/code> constraints first provided in <em>Class_Schedule_2<\/em> do not capture this violation and will allow the rows to be entered.<\/p>\n<p>The constraint<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">UNIQUE (teacher_name, room_nbr, period_nbr)<\/pre>\n<p>is checking the complete combination of teacher, room, and period. Since (&#8216;<em>Mr. Celko<\/em>&#8216;, <em>222<\/em>, <em>6<\/em>) is different from (&#8216;<em>Mr. Celko<\/em>&#8216;, <em>223<\/em>, <em>6<\/em>), the DDL does not find any problem with both rows being entered, even though that means that Mr. Celko is in more than one room during the same period.<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\"> UNIQUE (teacher_name, class_title, period_nbr)<\/pre>\n<p>doesn&#8217;t catch its associated rule either since (&#8216;<em>Mr. Celko<\/em>&#8216;, &#8216;<em>Database<\/em> <em>101<\/em>&#8216;, <em>6<\/em>) is different from (&#8216;<em>Mr. Celko<\/em>&#8216;, &#8216;<em>Database 102<\/em>&#8216;, <em>6<\/em>). Mr. Celko can teach more than one class during the same period, thus violating rule two. It seems that it\u2019s also possible to add the following row:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">('Ms. Shields', 'Database 103', 222, 6)<\/pre>\n<p>which violates rules #3 and #4.<\/p>\n<h2>Conclusion<\/h2>\n<p>Try to imagine enforcing this with procedural code. This is why I say that most of the work in SQL is done in the DDL. You will not always need complicated, nested, overlapping uniqueness constraints, but they can be incredibly powerful when you do.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A properly designed table must have a key to uniquely identify each row. In this article, Joe Celko explains unique and primary keys. &hellip;<\/p>\n","protected":false},"author":96214,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143525],"tags":[145454,126400,5134,145455],"coauthors":[6781],"class_list":["post-92189","post","type-post","status-publish","format-standard","hentry","category-learn","tag-foreign-key","tag-primary-key","tag-sql-prompt","tag-unique-constraint"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/92189","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/96214"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=92189"}],"version-history":[{"count":4,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/92189\/revisions"}],"predecessor-version":[{"id":92193,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/92189\/revisions\/92193"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=92189"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=92189"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=92189"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=92189"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}