Using WITH in an IF Condition

I am in the middle of working on my DB Design conference that occurs in a bit over a week from now. But I had this come up in some work I was doing and wanted to put it down in a blog.

One of the most exciting features of SQL Server 2005 was the inclusion of Common Table Expressions (CTE). Code that often needed a tangle of temp tables could be now be done in a single query (Derived tables can be used too, but I can’t remember when derived tables started in SQL Server, but it may have been 2005, or perhaps 2000).

The problem is, often you want to write a query to look for bad data, fix the bad data in the table, then use the same base query in a procedure/trigger or testing/validation code. If you have used a CTE in your query, this can sometimes be tricky as they cannot be used in a conditional like IF EXISTS (queryWithCTE).

For example, say your query was the following (using WideWorldImporters), where you want to make sure that a customer only ordered one size of product (this predicate is clearly senseless (and no data meets this requirement), but uses it WWI data which is simple and something that everyone can easily attain and fiddle with.) So, you might write the following query:

Note: There is another, simpler way to write this query using COUNT(DISTINCT Size) in a HAVING clause without the CTE. The technique to find duplicates is not the point of this article, it is the technique of using a query that needs WITH in it in a conditional, and a real example would be a lot more complex to build (this query is unwieldy enough).

This finds that there are customers who have ordered more than one size (in fact that is the case for every customer in this database that ordered products that record a size). So you might clean up the data, and in your code, want to stop them from doing it again. More than once I have they tried to take the query I have written with the CTE, shove it in an IF EXISTS() construct, not even thinking whether it would run:

Only to be greeted with:

If, like this query, you only have one, non-recursive CTE, the query could be easily rewritten with the CTE in a derived table such as:

But often the query may have a CTE that has multiple parts, each referencing the previous CTE, and perhaps one or more of the CTEs being referenced multiple times. This cannot easily be rewritten. Instead, you can use the SQL statement to assign a value to a variable. In its simplest form, that might be either 1 or 0. For In the following format:

This lets you keep the CTE oriented code and catch that at least one value has failed (which is to say, has succeeded from the query’s point of view). So expanding this to my previous example, the query would look like the following:

You can test that it works by changing the HAVING clause to = 1, as there are no customers that have ordered a single size. No error message will be thrown.

Lastly, if you really want to get a bit more interesting with your error messages, you can grab some information in the query, for example a customer name that violated the rules. Also, to tell if multiple violations have occurred, instead of TOP (1), get TOP(2) and if the rowcount <> 1, you can know that multiple rows fail the check.

Now you have an error message that gives you a meaningful place to look, AND indicates at least one example for you to check (you could use the PK of the customer to pass to an application in your error message.

This returns:

Note that I still used the @condition variable, rather than using rowcount, or the name of the customer. Rowcount is not a bad choice, but it is easy for some other coder to inadvertently mess up fetching the rowcount, because you have to get @@ROWCOUNT in the very next statement or it can be cleared. CustomerName is tricky too, because you need to make sure that you pick a value that CustomerName can never be. Maybe NULL? Seen it. A customer named ‘’, possible if you do not have constraints to prevent empty data. So using a common variable that you trust to be set by a literal just feels safer, and adding on error message stuff is less dangerous in the long run. It all seems kind of annoying until it is 3 in the morning and you get an error message that you don’t have a value to start looking for in your source query.

Now, if we could just get a few horrifying error messages from SQL Server to do the same. I am looking at you truncation message: