Predicates With Subqueries

The ALL, SOME and ANY predicates aren't much used in SQL Server, but they are there. You can use the Exists() predicate instead but the logic is more contorted and difficult to read at a glance. Set-oriented predicates can greatly simplify the answering of many real-life business questions, so it is worth getting familiar with them. Joe Celko explains.

You’ve probably read that SQL is a language based on sets and predicates. Well, that sounds impressive but what does it really mean? It means that we ought to have some predicates that use sets as their operands. And we do; but a lot of them are a bit obscure and not often used. All of them can be written with the [NOT] EXISTS () predicate and other logical operators in correlated subqueries. While that is perfectly OK, the results can be a bit hard to read. Let us examine some of these features and see how we can use them. 

The [NOT] EXISTS() Predicate

I am going to assume by now that you have seen an EXISTS() predicate in SQL. , but perhaps you do not yet fully appreciate it. It existed at the very start of the language, and it has a nice intuitive meaning. This is because it is one of the few predicates that we have, perhaps the only one,  that evaluates to either TRUE and FALSE, but never UNKNOWN.

The current BNF is very simple.

<exists predicate> ::= EXISTS <table subquery>

Originally, the <table subquery> had to be a one column table. This is because the EXISTS() predicate was defined in the same part of the standard that gave us the “<expression <comp op> ALL <table subquery>” and “<expression <comp op> [SOME|ANY] <table subquery>” predicate definitions (we will get to them in another section so be patient). Originally, comparison operators were defined only for scalars; currently standard SQL allows row-based comparisons.

The single column restriction does not make any sense for the EXISTS() predicate; it is clearly a table-based function. If the cardinality of <table subquery> is greater than zero, then the result of the <exists predicate> is TRUE; otherwise, the result of the <exists predicate> is FALSE.

This is why there was a popular piece of folklore that the best way to write EXISTS() predicate was “EXISTS (SELECT <constant> FROM…)” or “EXISTS (SELECT <expression> FROM…)” and it actually did work that way in some of the very early SQL engines; they actually materialized the query expression table! However, using “EXISTS (SELECT * FROM…)” defined the asterisk as a single undefined column. It is the preferred choice today, but we see it as standing for an entire row, not a column. Hey, it is hard to write good, clear BNF for a new language! The truth is that internally optimizers quickly got rid of the materialization, and simply evaluated the table expression until they got a row.

It was also confusing in those days to realize that a NULL exists. There were proposals to make a table subquery of all NULLs return an UNKNOWN result from EXISTS(). We decided against it, on the grounds that existence is at a “higher level” than exact, known values. Imagine that you have a paper bag and cannot see what is in it, but you can still pick it up and know of it has some kind of contents.

Quantified Comparison Predicates: SOME ANY and ALL 

These predicates have been in the language from the very early days. They were our attempts to get the universal quantifier (“) which means “for All” and existential quantifier () which means “there exists” from formal logic into SQL. A surprising number of SQL programmers do not even know they exist.

The current definition of these predicates allows row comparisons, but that  has not been implemented in SQL Server. Let us stick to the basic, original scalar value syntax that is in SQL Server.

The easiest way to think of this is that we are using an abbreviation to distribute the comparisons over a set of AND-ed or OR-ed simple comparison predicates. 

The predicate “<value expression> <comp op> ANY <table expression>” is equivalent to taking each row, call it “s” and assume that they are numbered from 1 to n, and testing <value expression> <comp op> s with ORs between the expanded expressions:

When you get a single TRUE result, the whole predicate is TRUE.

As long as table S has cardinality greater than zero and one non-NULL value, you will get a result of TRUE or FALSE. The keyword SOME is the same as ANY; it is just a matter of style and readability. Likewise, “<value expression> <comp op> ALL <table expression>” takes each row s of table S and tests <value expression> <comp op>s with ANDs between the expanded expressions:

When you get a single FALSE result, the whole predicate is FALSE. As long as table S has cardinality greater than zero and all non-NULL values, you will get a result of TRUE or FALSE.

That sounds reasonable so far. Now let EmptyTable be an empty table (no rows, cardinality zero) and NULLTable be a table with only NULLs in its rows (but cardinality greater than zero). The rules for SQL say that “<value expression> <comp op> ALL NullTable” always returns UNKNOWN, and likewise “<value expression> <comp op> ANY NullTable” always returns UNKNOWN. This makes sense, because every row comparison test in the expansion would return UNKNOWN, so the series of OR and AND operators would behave in the usual way.

This is easier to see with actual values, we can use  a table with (10, NULL, 10) and value expression of 10 to get:

See how SQL’s three valued logic can bite us? In the DML, and unknown is rejected, but in the DDL an unknown is accepted.Likewise, the table  (10, NULL, 25) yields

Repeat the exercise with ORs:

Likewise, the row (10, NULL, 25) yields

However, “<value expression> <comp op> ALL EmptyTable” always returns TRUE and “<value expression> <comp op> ANY EmptyTable” always returns FALSE. Most people have no trouble seeing why the ANY predicate works that way; you cannot find a match, so the result is FALSE. But most people have lots of trouble seeing why the ALL predicate is TRUE. This convention is called existential import in formal logic. If I were to walk into a bar and announce that I can beat any pink elephant in the bar, that would be a true statement. The fact that there are no pink elephants in the bar merely shows that the problem is reduced to the minimum case.

This was actually a major issue in the early days of symbolic logic. Lewis Carroll believed in existential import, which means if you say “all men are mortal” you imply “some men (at least one) exists” but historically logic went against them. That is because we define them in terms of  AND and OR to give us some very nice mathematical properties.

  1) ∃x P(x) = ¬∃ x ¬P(x)

  2) ∃x P(x) = ¬∀ ¬P(x)

This rule lets us use the [NOT] EXISTS() predicate in some cases. The “Table1.x <comp op> ALL (SELECT y FROM Table2 WHERE <search condition>)” predicate converts to:

The “Table1.x <comp op> ANY (SELECT y FROM Table2 WHERE <search condition>)" predicate converts to

Of the two quantified predicates, the  ALL predicate is used more. The ANY predicate is more easily replaced and more naturally written with an EXISTS() predicate or an IN predicate. In fact, the standard defines the IN()  predicate as shorthand for “= ANY” and the NOT IN predicate as shorthand for “<> ANY”, which is how most people would construct them in English.

The IN() predicate first appeared in the Pascal programming language, so it is also more natural for programmers. We generally teach the IN() predicate without mentioning that it is an ANY predicate in disguise. Stealing from Pascal, the IN() predicate can have a simple list of expressions (they do not have to be scalars!) Or a sub query. The sub query may or may not be correlated.

The ALL Predicate and Extrema functions

It is counter-intuitive at first that these two predicates are not the same in SQL:

   x >= (SELECT MAX(y) FROM Table1)

   x >= ALL (SELECT y FROM Table1)

but you have to remember the rules for the extrema functions — they drop out all the NULLs before returning the greatest or least values. The ALL predicate does not drop NULLs, so you can get them in the results.

However, if you know that there are no NULLs in a column or are willing to drop the NULLs yourself, then you can use the ALL predicate to construct single queries to do work that would otherwise be done by two queries. For example, in order to find which manager handles the largest number of products, you would first construct a derived table :

But Alex Dorfman found a single query solution instead:

The use of the SELECT DISTINCT in the subquery is to guarantee that we do not get duplicate rows when two managers handle the same number of products. You can also add a “.. WHERE dept IS NOT NULL” clause to the subquery to get the effect of a true MAX() aggregate function. 

UNIQUE Predicate

While SQL Server has the ANSI standard UNIQUE constraint, it does not have the UNIQUE predicate. The syntax is quite simple and  can be computed by comparing COUNT (*) to COUNT (<value expression >).

If there are no two rows in <table subquery> such that the value of each column in one row is non-NULL and is not distinct from the value of the corresponding column in the other row, then the result of the <unique predicate> is TRUE; otherwise, the result of the <unique predicate> is FALSE.

This is based on the GROUP BY equivalence operator.  Essentially you are saying the make-believe HAVING clause has a COUNT(*) = 1. And optimizer does not even have to look at the table if it has unique indexes on the appropriate columns, so implementation should be pretty fast in modern SQL engines.

MATCH Predicate

While not available in SQL Server, the MATCH predicate  is conceptually useful for other advanced constructs. It involves matching rows against a table. ANSI standard SQL uses it in CASE expressions and declarative referential integrity constraints. It is very hard to fake in SQL server. Here is the syntax

Obviously the <row value predicand> has to have the same structure as the table to which it is matched. If neither SIMPLE, PARTIAL, nor FULL is specified, then SIMPLE is implicit. The idea is to take a template and match the table so subquery against a row value. The the matching options handle NULLs slightly differently. A full match must have a NULL in the corresponding columns or an exact matching value. A partial match gives the benefit of the doubt to the NULLs (think of the CHECK() constraint in DDL). A simple match follows the usual rules for row equivalence in DDL. This is probably easier to see with actual example

Table 1

column_1

column_2

column_3

10

10

10

10

10

10

10

NULL

10

10

10

20

Here are some examples of <match predicate> expressions, all of which are TRUE:

And here are examples that are FALSE:

Conclusion

Set-oriented predicates can greatly simplify the answering of many real-life business questions.  They are definitely more than mathematical curiosities. There is more to set-oriented predicates in SQL than just the simple IN() and EXISTS(). It will be worthwhile for you to sit down and make up some sample data so you can play with it. Learn how the other features that you might not have known about actually work in this language.