SQL Server CASE Law

SQLs CASE expressions can be powerful magic, but can trap the unwary who are used to the more familiar CASE statements of procedural languages.

When you learn a new language, you tend to keep the grammar of your old language and just change the words. Consider German versus English word order; “Bitte mich morden nicht”, which is literally, “Please me murder not”, but better expressed as “Please don’t murder me” in English. English really likes a <subject><verb><object> structure to its sentence. German tends to use <subject><object><verb> structure in its sentence.

In Esperanto, we have “Bonvole, ne murdas min” or “Bonvole, min ne murdas” since the parts of speech are shown by suffixes (-n = object, -e = adverb, -as = present tense verb). When you see the first version, the word order gives you a strong hint that this Esperanto student was an English speaker; the second form implies a German speaker.

The same principle applies to SQL students. They make systematic errors based on their prior languages. Random errors are rare.

Procedural Control Flow versus Declarative Code

Classic Structured Programming is based on three constructs; IF-THEN-ELSE, WHILE-DO and BEGIN-END. If you are a math major, you know these constructs match to operators in primitive recursive function theory. You can compute certain classes of algorithms with primitive recursive functions. If you are not a math major, then you don’t care about all of that; you are happy that these constructs let you write neat, clean programs that are easy to write, easy to optimize and easy to maintain (if you do it right).

Structured Programming is the only way most programmers have coded for the last 40 years, so they don’t give up that mindset easily. Just Google any SQL forum for the phrase “CASE Statement” and you will lots of hits. There is no such thing in SQL; we have a CASE expression instead. An expression returns a scalar value and has nothing to do with control flow. Here is a typical example of one of these postings.

The CASE was supposed to change the logic in the WHERE clause. The intent was probably something like this:

You will also see IF and IFF() dropped into SQL in the hope that the compiler somehow knows languages that use those constructs.

Lasagne versus Spaghetti Code

This is based on a posting on a forum. The problem was to read in raw data. The records came in three types (header, body and terminator) which have to be grouped together, then loaded as a complete fact in a single row of twenty columns. The gimmick is that the rules for assigning values to the columns are tricky and depend on each other.

The approach the poster used was classic COBOL. He declared twenty local variables for each output column, split the input into that working storage, and then wrote IF-THEN-ELSE logic with assignment statements. When the row was ready, he wrote a loop to insert it into the output table.

Here is a skeleton of some of the code. Do not worry about the particulars; look at the basic approach.

The most obvious fix is to mechanically re-write the IFs to CASE WHEN and add an END for each CASE. A fairly simple parser could this for you.

Oops! That will not work. Each of those CASE clauses has an implicit “ELSE NULL” clause before the “END” terminator. Try again, with a “do nothing” clause for safety.

In 1982, at the start of the Structured Programming Revolution, I coined the term “Lasagne code” as opposed to “Spaghetti code” that had appeared in the literature.

Spaghetti code was defined as code with so many GOTO statements in it, you could not follow the execution path. You had no idea what the state of the program was at any particular time. Lasagne code was IF-THEN-ELSE, switch and other control flow GOTO-less structures nested inside each other that you also lost track of the the state of the program.

You can un-layer lasagne code to make it more readable. The layered code was probably faster, but not that much faster in the majority of cases. But see how much easier to read and maintain this version is:

In this particular example, we can squeeze the code a bit more.

The point is that you can read each WHEN clause and see what the business rule is without having to move your eyes back to a containing CASE expression. As the optimizer gets smarter, it will handle the CASE expressions for you, just as procedural optimizers learned to improve loops, find dead code and do other good things for us.

CASE Expressions

It allows the programmer to pick a value based on a logical expression in his code. ANSI stole the idea and the syntax from the now-defunct ADA programming language. Here is the syntax for a <case specification> :

The searched CASE expression is probably the most used version of the expression. First the expression is given a data type by seeing what the highest data type in its THEN clauses is. The WHEN … THEN … clauses are executed in left-to-right order. The first WHEN clause that tests TRUE returns the value given in its THEN clause.

And, yes, you can nest CASE expressions inside each other. If no explicit ELSE clause is given for the CASE expression, then the database will insert an implicit ELSE NULL clause before the END keyword. If you wish to return a NULL from a THEN, however, you should use a CAST (NULL AS <data type>) expression to establish the data type for the compiler.

I recommend always writing an explicit ELSE clause, so that you can change it later when you find a value to return. I would also recommend that you explicitly cast a NULL in the CASE expression THEN clauses to the desired data type.

The <simple case expression> is defined as a searched CASE expression in which all the WHEN clauses are made into equality comparisons against the <case operand> . For example,

could also be written as:

There is a gimmick in this definition, however. The expression,

becomes

The “WHEN foo = NULL” clause is always UNKNOWN.

The COALESCE() and NULLIF() Functions

The SQL-92 Standard defines other functions in terms of the CASE expression, which makes the language a bit more compact and easier to implement. For example, the COALESCE() function can be defined for one or two expressions by

  1. COALESCE(<value exp #1>) is equivalent to (<value exp #1>)
  2. COALESCE(<value exp #1>, <value exp #2>) is equivalent to

then we can recursively define it for (n) expressions, where

as equivalent to:

Programmers trying to UN-learn their old T-SQL dialect confuse the COALESCE() and NULLIF() functions. The data type proprietary IFNULL(<expression 1>, <expression 2>) is determined by <expression 1>. Run this code to see what happens.

It is even messier when <expression 1> is a character data type.

Likewise, NULLIF (<value exp #1>, <value exp #2>) is equivalent to:

The THEN NULL clause should cast to the data type of <value exp #1>, since that is all we have to use as a template. This also makes sense because you don’t want to replace a value with a different data type in different rows in the same column of a table.

Before your get too comfortable with this, there are more problems in T-SQL dialect. Run these statements a few times. The ISNULL() will produce either 666 or 999. The COALESCE() will produce 666,999 or NULL.

This has been reported as a bug by Hugo Kornelis here. In fact there is a general problem with subquery expressions in a COALESCE() because it is expanded as per the CASE statement and just gave. If the <value exp #1> is not deterministic. T-SQL will re-compute each invocation in the expression and can get different results.

The final 999 value in both expressions should ensure the overall result is never NULL, but Microsoft has it wrong. I have no idea where it comes from. The Standard SQL/PSM language has an option to declare a procedure to be DETERMINISTIC or not to aid the compiler. As I read the Standards, a non-deterministic function call should be evaluated once in each statement where it appears and that value used in the statement for each invocation. Its next invocation in a separate statement is then re-computed. The work-around is to load the non-deterministic function value into a local variable or scalar subquery expression then use that in the COALESCE