SQL and Relational Theory
by C. J. Date
O’Reilly, 2009
IBSN: 978-0-596-52306-0
Hard Copy: Amazon US | Amazon UK
O’Reilly Website
The title of this book is SQL and Relational Theory. It is subtitled “How to Write Accurate SQL Code” and it is part of a “Theory in Practice” series.
Chris Date has produced a collection of books on RDBMS and SQL from various publishers over several decades. His “An Introduction to Database Systems” (ISBN: 978-0321197849, Addison-Wesley, 2004) is now in its eighth edition and was the standard college textbook for years. His claim to fame in the RDBMS world is that he worked with, and then partnered with, Ted Codd for many years to create the consulting company (Codd & Date).
Going back to the early days of RDBMS, when there was not so much internet, we had things made out of paper that we called magazines. In particular, there were newsstand computer magazines devoted to the “new” and exciting topic of databases. They were ‘DBMS’ from M&T Publishing and ‘Database Programming & Design’ from Miller-Freeman. The publishing industry is volatile and there are buyouts and cancellations. Thanks to something happening to a parent company in Germany, both magazines wound up belonging to Miller-Freeman.
So for a few years, Chris Date and I each had columns in a different magazine from the same publisher! Chris would write a piece on topic X and I would respond the next month with an anti-X piece. If you are really old or are a radio nostalgia buff, you will remember the Jack Benny & Fred Allen mock feud. They sniped at each other back and forth on their respective radio shows and boosted the audience for both shows.
We did the same thing; people had to buy both magazines to get the full story. Without that incentive, would you have bought two separate magazines on the same topic otherwise? Perhaps the best part of the series was “Dueling Medians”; each of us would offer a solution for finding the Median in SQL and the other would reply with another approach, other people joined in. I gave the various solutions in my SQL for Smarties.
Chris Date collected his columns and some other material into a series of books for Addison-Wesley in 1986, 1990, 1992 and 1994, then a collection from Apress in 2006 of articles written after the magazine columns were gone.
To this day, I still get asked if I hate Chris Date. Of course not! I buy and read every one of his books. But we do disagree on technical issues. The super short version is that I am the great defender of SQL and data standards; Chris is the defender of a Tutorial D and his school of Relational Theory. I am more hands-on and Chris is more theory.
The bad news is that large amounts of the discussion are about how SQL does not subscribe to the Date Relational Model, and much of the code is in Tutorial D. If you are not familiar with Tutorial D, I ought to explain that it is a relational programming language that is directly based on the relational calculus. The reader has to learn enough Tutorial D to read the comparisons between SQL and Tutorial D. Date uses his famous Suppliers and Parts database for the examples. He does not spend a lot of time on the DDL and moves to DML. But 80-95% of the work in SQL is done in the DDL, not the DML. And his examples are done with very simple code at the SQL-92 level. Let me be more specific:
The Parts table P (I will get to the DDL for it shortly) gives the weight of a part in pounds and we want it in grams. The Tutorial D version is:
1 |
EXTEND P ADD (weight *454.0 AS gm_wt) |
The SQL is:
1 |
SELECT P.*, (weight *454.0) AS gm_wt FROM P; |
Now let’s go to the next step. Write a query to give us all parts with a weight greater than 7000.0 gm. The tutorial D version Date gives is:
1 2 3 4 |
((EXTEND P ADD (weight *454.0 AS gm_wt)) WHERE gm_wt > 7000.0) {pno, gm_wt} |
The SQL he gives is:
1 2 3 |
SELECT pno, (weight *454.0) AS gm_wt FROM P WHERE (weight *454.0 > 7000.0); |
His point is that you have to re-use the computation in the WHERE clause. But that is not the case; it can be done with a derived table or CTE if you want to avoid using a VIEW:
1 2 3 4 |
SELECT X.* FROM (SELECT pno, (weight *454.0) FROM P) AS X (pno, gm_wt) WHERE gm_wt > 7000.0; |
This is a direct translation of the first query into SQL. The inner SELECT is a derived table that mimics the function of the EXTEND in Tutorial D. Going further, an SQL programmer would probably say to himself, “I am going to need to do this conversion in a lot of places” and he then does the computation in the DDL with a VIEW or with a computed column.
1 2 3 4 |
CREATE VIEW Metric_Parts AS SELECT pno, pname, color, city, (weight *454.0) AS gm_wt FROM P; |
Alternatively, the computed column will act like a VIEW; the syntax is just a little different.
1 2 3 4 5 6 7 |
CREATE TABLE Metric_Parts (pno CHAR(2) NOT NULL PRIMARY KEY, pname VARCHAR(10) NOT NULL, color VARCHAR(10) NOT NULL, city VARCHAR(10) NOT NULL, weight DECIMAL (5,2) NOT NULL, COMPUTE (weight *454.0) AS gm_wt); |
In fairness, Date also has complaints about Tutorial D because it lacks a Relational Division and introduces his DIVIDEBY operator from his previous books. Unfortunately, Relational Division come in many flavors – Codd’s original division, Todd’s division, with and without remainders and probably others.
Some statements are incorrect. In Chapter 8 on constraints, on page 100, he states that “Transition constraints aren’t current supported in either Tutorial D or SQL (other than procedurally).” His discussion uses a transition from “never married” to “married” and you cannot go back to a status of “never married” again.
I published the DDL code for state transition constraint in an article entitled “Constraint Yourself” in Simple Talk. I used (born, married, divorced, dead) as my status values.
In a footnote, he says:
“The semantics of WITH LOCAL CHECK OPTION are far too baroque to be spelled out in detail here. In any case, it’s hard to see why anyone would ever want such semantics; indeed, it’s hard to resist the suspicion that this alternative was included in the standard for no other reason than to allow certain flawed implementations, extent at the time, to be abler to claim conformance.”
ANSI tries to get all of the membership to agree on a common abstract model of how SQL works. We then create features on that model. By now, you should know that the clauses of a basic SELECT.. FROM.. WHERE.. GROUP BY.. HAVING.. statement start with the from clause and end with the select clause. We had to work out those rules in the committee because not all products did it that way. At one point, the GROUP BY implementations either put the NULLs in one group or put each NULL in its own group. Sybase did the ALL() and ANY() predicates wrong. Oracle had to add the CHAR2(n) because they got the CHAR(n) wrong; Microsoft added DATETIME(2) to implement the ANSI Standard TIMESTAMP. Did you notice that *= is long gone from products? The truth is that when the standards change, the vendors change their products, not the other way around. As the standards have progressed, we have fewer and fewer “implementation defined” features.
All that said, yes, the WITH [LOCAL | CASCADE] CHECK OPTION is baroque when you nest VIEWs inside each other. But it can be very powerful and enforce complex relationships that would otherwise have to be done with triggers or worse. It how you can express multi-table joins in SQL when you do not have a CREATE ASSERTION statement.
Chapter 11 is “Using Logic to Formulate Expressions” which introduces two-valued predicate logic and quantifiers. The explanations are done with Tutorial D, and then translated into SQL. That is confusing. Tutorial D is like classic (i.e. NULL free) predicate logic but it is still another language to learn. But SQL does have NULLs and we need to consider them from the start. This is one reason that minimal Netiquette on SQL Forums requires that you post DDL even for the simplest SQL problems.
For example, Date gives one of his classic tables:
1 2 3 4 5 6 |
CREATE TABLE Parts (pno CHAR(2) NOT NULL PRIMARY KEY, ARCHAR(10) NOT NULL, color VARCHAR(10) NOT NULL, weight DECIMAL (5,2), city VARCHAR(10) NOT NULL); |
Please notice that the weight is NULL-able, but the first sample data is like this:
1 2 3 4 5 6 7 8 |
INSERT INTO Parts VALUES ('P1', 'nut', 'red', 12.0, 'London'), ('P2', 'bolt', 'green', 17.0, 'Paris'), ('P3', 'screw', 'blue', 17.0, 'Oslo'), ('P4', 'screw', 'red', 14.0, 'London'), ('P5', 'cam', 'blue', 12.0, 'Paris'), ('P6', 'cog', 'red', 19.0, 'London'); |
Given the problem “find the parts that have a weight that is different from the weight of any part in Paris.”
1 2 3 4 5 6 7 |
SELECT P1.pname arts AS P1 WHERE NOT EXISTS (SELECT * FROM Parts AS P2 WHERE P2.city = 'Paris' AND P2.weight = P1.weight); |
Produces the same results, but only if there are no NULL weights in the table.
1 2 3 4 5 6 |
SELECT P1.pname Parts AS P1 WHERE P1.weight NOT IN (SELECT P2.weight FROM Parts AS P2 WHERE P2.city = 'Paris'); |
The EXISTS() predicate is always TRUE or FALSE, but the IN() predicate is a shorthand for a chain of OR-ed predicates. The IN() predicate can return UNKNOWN if there are NULLs. We would need get rid of the NULLs:
1 2 3 4 5 6 7 8 |
SELECT P1.pname arts AS P1 WHERE P1.weight IS NOT NULL AND P1.weight NOT IN (SELECT P2.weight FROM Parts AS P2 WHERE P2.weight IS NOT NULL AND P2.city = 'Paris'); |
And we now the “IS [NOT] DISTINCT FROM” comparison operator that will treat NULLs as if they are equal:
1 2 3 4 5 6 7 |
SELECT P1.pname arts AS P1 WHERE NOT EXISTS (SELECT * FROM Parts AS P2 WHERE P2.city = 'Paris' AND P2.weight IS NOT DISTINCT FROM P1.weight); |
Summary:
I do not feel the book lived up to its title. Someone trying to improve his SQL or find a systematic approach to constructing a query has to first predicate logic and Tutorial D. Date’s dislike of SQL shows up everywhere in the book; he was looking for ways to make SQL look bad. Much of his SQL code is dated and fails to use newer features.
While you expect some repetition, the material was a re-arrangement of his older material without adding anything new. If you have not read Date’s other books, then this might not be a problem.
I feel that a better approach would have been to show dangerous or shoddy SQL, demonstrate the problems, explain the math, relational algebra and logic that was ignored and then solve the problems with better SQL. The “Paris-weights” example I showed would be the start of such a detailed analysis. The reader would not be confused by Tutorial D code and would have come away with a better understanding for the limits and power of SQL.
Load comments