SQL is based on set theory and logic. The only bad news is that many programmers have never had a class on either of those topics. They muddle through using the Boolean operators in their programming language and think that’s all there is to formal logic.

Let’s flashback to the early days of logic and play catch up. We need to start with syllogisms. Syllogisms are logical forms made up of combinations of two statements about classes of things that lead to a conclusion. They were invented by the Greeks and written up by Aristotle in *Prior Analytics*. You might have run into them, If you had a philosophy class that included lectures on formal fallacies. The three forms of statements allowed are:

1) ALL x ARE y

2) SOME x ARE y

3) NO x ARE y

These simple tools let us take two statements and come to a valid conclusion:

All men are mortal

All Greeks are Men

therefore: All Greeks are mortal

The Ancient Greeks seem to like to use a minimal set of tools which we could expect from a culture that gave us only circles and straight edges in plane geometry. We know that our logic was valid, so we are confident he will die if we know that Aristotle is a Greek. A polysyllogism, or a sorites, is a set of three or more such constructs that can be chained together to come to a conclusion.

You can figure out syllogisms or sorites using Euler circles, Venn diagrams, or Lewis Carroll’s diagrams. And before you ask, yes, it is *that *Lewis Carroll. People tend to forget that he was one of the pioneers in the evolution of logic. You don’t hear more about him when you’re studying early logic because he was on the losing side of a philosophical argument called “existential import” in the literature. Briefly, this has to do with what “ALL x ARE y” means. Does it imply that “SOME x ARE y” and that “NO x ARE non-y”; that is, there are x’s in the universe. Or does it mean only that “NO x ARE non-y”? I will get back to this point shortly.

George Boole and his book “Laws of Thought” (available in reprint additions to this very day) gave us the Boolean algebra that dominates programming languages with the basic `AND`

, `OR`

, and `NOT`

operators we’ve come to know and love over the decades. Our syllogisms mutated into quantifiers which could be applied to the Boolean algebra expressions. The two forms are `For all x, P(x),`

and `For some x, P(x).`

If you want to look up formulas in a textbook, the symbol for the universal quantifier is ∀, an inverted letter ‘A’, and the symbol for the existential quantifier is ∃, a rotated letter ‘E’.

Existential import lost the battle, and the modern convention is that “All men are mortal” has the same meaning as “There are no men who are immortal” but does not imply that any men exist at all.

The reasoning is that if I were to walk into a bar and announce that I can beat any pink elephant in the bar, that would be a true statement. The fact that there are no pink elephants in the bar merely shows that the problem is reduced to the minimum case. If this still seems unnatural, then consider the formal mathematical properties of these quantifiers:

1) (∀x)P(x) = ¬(∃x) ¬P(x)

2) (∃x)P(x) = ¬(∀x) ¬P(x)

In Boolean terms, you can think of the (∀x) quantifier as being a generalized `AND`

which connects an infinite string of propositions. Likewise, you can think of the (∃x) quantifier as a generalized `OR`

, which connects an infinite string of propositions. These generalizations are actually taken from DeMorgan’s Laws.

## The EXISTS() predicate

SQL has had the `EXISTS()`

predicate since the beginning. It’s interesting within SQL because it has to return either `TRUE`

or `FALSE`

but can never return `UNKNOWN`

. This is an exception to most logical operations inside SQL’s three-valued logic. It’s also not quite the same as the syllogisms we have been talking about.

Consider the statement “Some salesmen are liars,” and one way we would write it with the `EXISTS()`

predicate in SQL:

1 2 3 4 5 |
... WHERE EXISTS (SELECT * FROM Personnel AS P, Liars AS L WHERE P.job_title = 'Salesman' AND P.emp_name = L.emp_name); |

If we are more cynical about salesmen, we might want to formulate the predicate “All salesmen are liars” with the `EXISTS`

predicate in SQL, using the transform rule just discussed:

1 2 3 4 5 6 7 8 9 |
... WHERE NOT EXISTS (SELECT * FROM Personnel AS P WHERE P.job_title = 'Salesman' AND P.emp_name NOT IN (SELECT L.emp_name FROM Liars AS L)); |

This informally says “There are no salesmen who are not liars” in English. In this case, the `IN()`

predicate can be changed into a `JOIN`

, which might improve performance and be a bit easier to read.

As an aside, the subquery doesn’t need to evaluate the `SELECT`

clause for the `EXISTS()`

predicate to work. In one very early version of Oracle, however, the compiler would work out the expressions in the `SELECT`

clause and then ignore the results. Today, the preferred form is `SELECT *`

because the star represents a nonspecific row in several places in SQL. It is easy to find in the text, and the compiler will not waste any time with expressions in the `SELECT `

list (in fairness, this optimization should not be a problem anymore).

Now, onto the fancy stuff!

## The [SOME | ANY] <subquery> predicate

The predicate `<value expression> <comparison op> [ANY | SOME] <table expression>`

is equivalent to taking each row of the table expression, `si,`

(assume that they are numbered from 1 to n) of `<table expression>`

and testing `<value expression> <comparison op> si`

with `ORs`

between the expanded expressions:

1 2 3 4 |
((<value expression> <comparison op> s<sub>1</sub>) OR (<value expression> <comparison op> s<sub>2)</sub> ... OR (<value expression> <comparison op> s<sub>n</sub>)) |

When you get a single `TRUE`

result, the whole predicate is `TRUE`

. As long as `<table expression>`

has cardinality greater than zero and one non-`NULL`

value, you will get a result of `TRUE`

or `FALSE`

. The keyword `SOME`

is the same as `ANY`

, and the choice is just a matter of style and readability.

If you think about it, the `IN()`

predicate is equivalent to a simple `= ANY`

predicate. In fact, that is how it is defined in the ANSI/ISO Standard. But the `<comparison op>`

can be any of `=, <, >, <>, <=, >=,`

and (in theory, but not always in practice) any other comparison operator (such as `IS [NOT] DISTINCT FROM`

).

## The ALL <subquery> predicate

The `<value expression> <comp op> ALL <table expression>`

takes each row of `<table expression>,`

call them `s1, s2,.. sn`

, builds a search condition for each such row expression, tests `<value expression> <comp op> si`

with `ANDs`

between the search conditions, thus:

1 2 3 4 |
((<value expression> <comp op> s<sub>1</sub>) AND (<value expression> <comp op> s<sub>2</sub>) ... AND (<value expression> <comp op> s<sub>n</sub>)) |

When you get a single `FALSE`

result, the whole predicate is `FALSE`

. As long as `<table expression> `

has cardinality greater than zero and all non-`NULL`

values, you will get a result of `TRUE`

or `FALSE`

.

That sounds reasonable so far. Now let *Empty_Table* be an empty table (no rows, cardinality zero) and *Null_Table* be a table with only NULLs in its rows and a cardinality greater than zero. The rules for SQL say that `<value expression> <comp op> ALL Null_Table`

always returns `UNKNOWN`

, and likewise `<value expression> <comp op> ANY Null_Table`

always returns `UNKNOWN`

. This makes sense because every row comparison test in the expansion would return `UNKNOWN`

, so the series of `OR`

and `AND`

operators would behave in the usual way.

However, `<value expression> <comp op> ALL Empty_Set`

always returns `TRUE`

and `<value expression> <comp op> ANY Empty_Set`

always returns `FALSE`

. Most people have no trouble seeing why the `ANY`

predicate works that way; you cannot find a match, so the result is `FALSE`

. But most people have trouble seeing why the `ALL`

predicate is `TRUE`

. We are trying to preserve those generalized DeMorgan’s Laws.

The `Foobar.x <comp op> ALL (SELECT y FROM Table2 WHERE <search condition>)`

predicate converts to:

1 2 3 4 5 |
... NOT EXISTS (SELECT * FROM Foobar, Table2 WHERE Foobar.x <comp op> Table2.y AND NOT <search condition>)... |

The `Foobar.x <comp op> ANY (SELECT y FROM Table2 WHERE <search condition>)`

predicate converts to

1 2 3 4 5 |
... EXISTS (SELECT * FROM Foobar, Table2 WHERE Foobar.x <comp op> Table2.y AND <search condition>) ... |

Of the two quantified predicates, the `<comp op> ALL`

predicate is probably the more useful of the two since it cannot be written in terms of an `IN()`

predicate. The trick with it is to make sure that its subquery defines the set of values in which you are interested. For example, to find the authors whose books *all *sell for $49.95 or less, you could write:

1 2 3 4 5 6 |
SELECT * FROM Authors AS A1 WHERE 49.95 <= ALL (SELECT book_price FROM Books AS B1 WHERE A1.author_name = B1.author_name); |

The best way to think of this is to reverse the usual English sentence “Show me all x that are y” in your mind so that it says “y is the value of all x” instead.

## The ALL predicate and Extrema functions

At first, it is counter-intuitive that these two predicates are *not* the same in SQL:

1 2 |
x >= (SELECT MAX(y) FROM Foobar) x >= ALL (SELECT y FROM Foobar) |

You have to remember the rules for the extrema functions – they drop out all the `NULLs`

*before *returning the greatest, least, or computed value the aggregate function wants. But the `ALL`

operator does not drop `NULLs`

so that you can get them in the results. (**Editor’s note:** If any `NULLs`

are returned from the subquery, no rows are returned from the main query with `ALL`

. While the `ALL`

operator is not dropping `NULL`

, `NULL`

doesn’t show up in the results because the expression `>= NULL`

will return `UNKNOWN`

.)

The first expression uses an aggregate function, so it returns a numeric value. In effect, it works like this:

0) Create the data

1 2 3 4 |
CREATE TABLE Foobar (foo_id VARCHAR(10) NOT NULL PRIMARY KEY, y INTEGER); INSERT INTO Foobar VALUES ('Albert', 42), ('Bob', NULL), ('Chuck', 43); |

1) We can now apply the aggregate function. Remove the rows with `NULLs`

.

(‘Albert’, 42)

(‘Chuck’, 43)

2) Find the maximum value in the column of y in the remaining rows.

1 |
{(42), (43)} => (43) and finally x>= (43) |

But look at the second expression. That `ALL()`

predicate makes this a l*ogical expression. *Bob’s `NULL`

is not removed as it would be with an aggregation. Instead, we build a chain of `AND`

-ed predicates:

((‘Albert’, 42)

AND (‘Bob’, NULL)

AND (‘Chuck’, 43))

Which becomes in effect:

1 2 3 |
((42 >= x) AND (NULL >= x) --- UNKNOWN AND (43 >= x)) |

SQL’s three-valued logic comes into play here. We know that

TRUE AND UNKNOWN = UNKNOWN

FALSE AND UNKNOWN = FALSE

UNKNOWN AND UNKNOWN = UNKNOWN

As you can see, once you’ve got an `UNKNOWN`

value in a chain of `AND`

-ed predicates, you really can’t get it to resolve to a `TRUE`

value. You would have to cast them to a known value, using `COALESCE()`

or some similar function.

## Conclusion

It’s always a good idea to take a little time out and learn some of the more obscure features in this very large language we call SQL. But when you do, have mercy on the poor guys that have to maintain your code after you’re gone. This is what comments are for.

Load comments