Failure #1: Not knowing what NULL means
Failure #2: Treating NULL like a real value in comparisons
Failure #3: Not knowing how ANSI_NULL works
Failure #4: Treating ISNULL the same as IS NULL
Failure #5: Treating ISNULL the same as COALESCE
Failure #6: Forgetting about the NULLIF function
Failure #7: Assuming that 2 + NULL = 2
Failure #8: Aggregating data without taking NULL into account
Failure #9: Forgetting about NULL in your Boolean expressions
Failure #10: Not accounting for NULL in your string expressions
Failure #11: Joining tables on nullable columns
Failure #12: Creating a unique index without taking NULL into account
Failure #13: Sorting data on a nullable column

Failure #1: Not knowing what `NULL` means

When referring to NULL within our data sets, we often use phrases such as “the NULL value” or “a value of NULL.” I do it all the time. I see it done all the time. Such phrases are so common that we think little of their use. But inherent in the phrase is the notion of NULL as an actual value. It is not. NULL is a non-value, a nonexistent value. It is not zero. It is not an empty string. A value cannot equal NULL. No two NULL values are equal.

A NULL value is often defined as one that is unknown or not applicable, but even these definitions can be open to debate. For example, a record might not include a customer’s birthdate because the salesperson didn’t ask or because the customer would not provide it, but the customer still knows that date, so it is hardly unknown, nor is it any less applicable. There can be many reasons for missing data, and there has been much debate about a precise meaning of NULL. If you need to attach a specific meaning to NULL, then missing or absent data is probably your safest bet.

Perhaps a better way to think of NULL is as a setting or marker that indicates if a data value does not exist. That certainly seems to be how SQL Server treats NULL. The database engine uses a special bitmap to track which columns in a row are NULL and which are not. The bitmap contains a bit for each column, with the bit set to 1 if the column is NULL, that is, if the value is missing.

The confusion for some, particularly beginning T-SQL developers, is the way SQL Server Management Studio indicates that no value exists for a specific column: by displaying what appears to be a value that reads NULL. For example, the following SELECT statement returns data from the Product table in the AdventureWorks2014 sample database:

SELECT Name AS ProductName, ProductLine,

Class, Style, SellEndDate

FROM Production.Product

WHERE ProductID BETWEEN 680 AND 715;

As the following results show, every row includes at least one column for which no value exists:

ProductName	ProductLine	Class	Style	SellEndDate
HL Road Frame – Black, 58	R	H	U	NULL
HL Road Frame – Red, 58	R	H	U	NULL
Sport-100 Helmet, Red	S	NULL	NULL	NULL
Sport-100 Helmet, Black	S	NULL	NULL	NULL
Mountain Bike Socks, M	M	NULL	U	2012-05-29 00:00:00.000
Mountain Bike Socks, L	M	NULL	U	2012-05-29 00:00:00.000
Sport-100 Helmet, Blue	S	NULL	NULL	NULL
AWC Logo Cap	S	NULL	U	NULL
Long-Sleeve Logo Jersey, S	S	NULL	U	NULL
Long-Sleeve Logo Jersey, M	S	NULL	U	NULL
Long-Sleeve Logo Jersey, L	S	NULL	U	NULL

If we look at the columns with the NULL values, we can see how difficult it is to determine why the values are missing. For example, is a SellEndDate value missing because the product is still live or because the application failed to properly record the end date? Are the Class and Style values missing because they don’t apply to those particular rows or because someone was careless in entering data?

An even bigger problem with viewing the results in this way is that it suggests that the database is actually storing multiple instances of this mysterious placeholder called NULL and that all those NULL values are the same. They are not the same. There is nothing there. Nothing cannot be the same as nothing. Nothing cannot be compared to nothing. You can think of these as the fundamental principles behind NULL. The principles of nothingness. And not understanding these principles can lead to all sorts of problems when working with SQL Server, resulting in many of the failures we cover here.

Failure #2: Treating NULL like a real value in comparisons

Developers who don’t understand how NULL works will sometimes use comparison operators to compare an expression to a NULL value, resulting in a statement that looks similar to the following:

SELECT Title, FirstName, MiddleName, LastName

FROM Person.Person

WHERE Title = NULL AND BusinessEntityID < 7;

The assumption here is that NULL is a valid value and therefore can be compared to other NULL values. That means, in this case, the statement should return each row whose Title value is NULL. A quick check into the source data will show four rows that match the statement’s assumed logic. However, because the database engine cannot compare nothing to nothing, it will not return these rows and will instead provide us with an empty result set.

If we assume from the empty result set that no rows in the source data match our criteria, we might be satisfied with the statement as is and send the code onto production. (Let’s hope no one would really do that.) We might also choose to continue along this logical trajectory and recast the statement to return the non-NULL data:

SELECT Title, FirstName, MiddleName, LastName

FROM Person.Person

WHERE Title <> NULL AND BusinessEntityID < 7;

The statement is no longer trying to compare NULL with NULL, but rather compare real non-NULL values to NULL, which would seem an equally credible comparison, if not more so. It is not. We cannot compare real values to nothing any more than we can compare nothing to nothing. As a result, the statement still returns no data, even though we know for certain the data exists.

But before we go further, note that there are cases when comparing NULL values in this way can work, which we cover in the next failure. For now, know that the proper way to perform NULL comparisons is to use the IS NULL and IS NOT NULL operators. For example, the following SELECT statement uses the IS NULL operator to return rows whose Title value is NULL:

SELECT Title, FirstName, MiddleName, LastName

FROM Person.Person

WHERE Title IS NULL AND BusinessEntityID < 7;

This time, the statement returns the data we expect, with the Title value missing, along with a couple of MiddleName values:

Title	FirstName	MiddleName	LastName
NULL	Ken	J	SÃ¡nchez
NULL	Terri	Lee	Duffy
NULL	Roberto	NULL	Tamburello
NULL	Rob	NULL	Walters

We can also return those rows that contain an actual Title value by instead using the IS NOT NULL operator:

SELECT Title, FirstName, MiddleName, LastName

FROM Person.Person

WHERE Title IS NOT NULL AND BusinessEntityID < 7;

Now the statement returns only the following two rows:

Title	FirstName	MiddleName	LastName
Ms.	Gail	A	Erickson
Mr.	Jossef	H	Goldberg

The IS NULL and IS NOT NULL operators provide the logic we need to handle the ambiguous nature of NULL values. The purpose of these operators is to determine whether or not an expression is NULL and from that return either TRUE or FALSE. Comparison operators, when faced with a NULL value, instead want to return UNKNOWN, which can result in unpredictable and costly mistakes if implemented in production.

Failure #3: Not knowing how `ANSI_NULL` works

In the last failure, I demonstrated why you should not use comparison operators against NULL values. Even if there are results to be had, a statement such as the following will likely not return them:

SELECT Title, FirstName, MiddleName, LastName

FROM Person.Person

WHERE Title = NULL AND BusinessEntityID < 7;

The word likely is key here. It turns out that the behavior I described is the default behavior, but not the only behavior. SQL Server supports a feature called ANSI_NULLS, which determines how the database engine handles NULL comparisons. The option is a carry-over from the ghost of SQL Server Past and has been deprecated since 2005. But the setting still exists, it still works, and it can still cause all sorts of trouble.

By default, SQL Server installs with the ANSI_NULLS option turned on, which gives us the behavior I’ve described so far. If we compare values to NULL, the database engine does not return those rows, even if NULL exists.

We can verify the current ANSI_NULLS setting by running a statement similar to the following:

IF ( (32 & @@OPTIONS) = 32 )

PRINT 'The ANSI_NULLS option turned on.'

ELSE

PRINT 'The ANSI_NULLS option turned off.';

If no one has messed with ANSI_NULLS, the database engine will return the first option, which will give us the results we’ve seen so far. But suppose we turn the option off:

SET ANSI_NULLS OFF;

SELECT Title, FirstName, MiddleName, LastName

FROM Person.Person

WHERE Title = NULL AND BusinessEntityID < 7;

Now our statement returns the expected four rows, despite the fact that we’re using a comparison operator against NULL:

Title	FirstName	MiddleName	LastName
NULL	Ken	J	SÃ¡nchez
NULL	Terri	Lee	Duffy
NULL	Roberto	NULL	Tamburello
NULL	Rob	NULL	Walters

With the option turned off, we can also use the not equal operator:

SET ANSI_NULLS OFF;

SELECT Title, FirstName, MiddleName, LastName

FROM Person.Person

WHERE Title <> NULL AND BusinessEntityID < 7;

Now the results look like the following:

Title	FirstName	MiddleName	LastName
Ms.	Gail	A	Erickson
Mr.	Jossef	H	Goldberg

If we turn the option back on (it’s default setting), the database engine will again return no rows:

SET ANSI_NULLS ON;

SELECT Title, FirstName, MiddleName, LastName

FROM Person.Person

WHERE Title = NULL AND BusinessEntityID < 7;

Because the ANSI_NULLS option has been deprecated, you should not use it and you should update any code that does, assuming you have the access and time. If you don’t, you could be in store for bigger problems.

Imagine an application that connects to two different SQL Server instances, one with the option turned on (the default) and one with it turned off. If the code is using comparison operators with NULL values, you could end up with inaccurate data or conflicting results. Even if you’re dealing with only one SQL Server instance, you still risk unexpected results, especially if statements are turning the ANSI_NULLS option on at the session level.

Failure #4: Treating ISNULL the same as IS NULL

The ISNULL function tends to cause a bit of confusion when first encountered, in part because of its name, which implies a meaning similar to the IS NULL operator. Another reason for confusion arises from the fact that products such as Microsoft Access and MySQL also include the ISNULL function, where it behaves much differently, more like the name suggests. But we’re talking SQL Server and need to understand how ISNULL is implemented here. A lack of understanding can lead to misuse and failure to take advantage of its real purpose in SQL Server.

Let’s looks at a few examples to demonstrate, starting with a simple query that returns products whose Color value is NULL:

SELECT Name AS ProductName, Color

FROM Production.Product

WHERE ProductID BETWEEN 940 AND 949

AND Color IS NULL;

ProductName	Color
LL Touring Handlebars	NULL
HL Touring Handlebars	NULL

If we’re not familiar with ISNULL in SQL Server or are transitioning in from Access, we might attempt to retrieve the same results by using ISNULL in our WHERE clause:

SELECT Name AS ProductName, Color

FROM Production.Product

WHERE ProductID BETWEEN 940 AND 949

AND ISNULL(Color);

Fortunately, this construction will generate the following error, which we’ll likely catch immediately, or at least during the Q&A phase:

1 2	Msg 174, Level 15, State 1, Line 305 The isnull function requires 2 argument(s).

Those of us with a background in MySQL might instead take a different approach with the ISNULL function by comparing it to 1 or 0, the function’s two possible return values:

SELECT Name AS ProductName, Color

FROM Production.Product

WHERE ProductID BETWEEN 940 AND 949

AND ISNULL(Color) = 1);

Once again, we receive an error message:

1 2	Msg 174, Level 15, State 1, Line 322 The isnull function requires 2 argument(s). near ';'.

By now, we should have checked the documentation to verify what this function is supposed to do and have come to the conclusion that, based on the syntax and the fairly explicit error messages, we should be adding a second argument, which we do:

SELECT Name AS ProductName, Color

FROM Production.Product

WHERE ProductID BETWEEN 940 AND 949

AND ISNULL(Color, 'neutral');

Unfortunately, all we’ve done is make our predicate completely illogical, resulting in yet another (but different) error message:

1 2	Msg 4145, Level 15, State 1, Line 339 An expression of non-boolean type specified in a context where a condition is expected, near ';'.

The point of that this is that, in SQL Server, the purpose of the ISNULL function is simply to replace an expression’s value with a real value, if the original value is NULL (nonexistent). If we wanted to make the preceding example right, we would need to define our Boolean condition correctly:

SELECT Name AS ProductName, Color

FROM Production.Product

WHERE ProductID BETWEEN 940 AND 949

AND ISNULL(Color, 'neutral') IN ('black', 'neutral');

Now the SELECT statement returns the results we want, rather than an error:

ProductName	Color
LL Mountain Frame – Black, 40	Black
LL Touring Handlebars	NULL
HL Touring Handlebars	NULL
LL Crankset	Black

That said, we can achieve the same thing by using IS NULL, without incurring the extra processing:

SELECT Name AS ProductName, Color

FROM Production.Product

WHERE ProductID BETWEEN 940 AND 949

AND (Color = 'black' OR Color IS NULL);

More often than not, the place we’ll want to use the ISNULL function is within our select list to replace the NULL values with real values:

SELECT Name AS ProductName, ISNULL(Color, 'neutral')

FROM Production.Product

WHERE ProductID BETWEEN 940 AND 949;

The Color column now contains a value for each row, with no missing values to contend with:

ProductName	Color
HL Road Pedal	Silver/Black
Touring Pedal	Silver/Black
ML Mountain Frame-W – Silver, 38	Silver
LL Mountain Frame – Black, 40	Black
LL Mountain Frame – Silver, 40	Silver
Front Derailleur	Silver
LL Touring Handlebars	neutral
HL Touring Handlebars	neutral
Front Brakes	Silver
LL Crankset	Black

Be aware that, when using ISNULL, the database engine converts the replacement value to the data type of the original value, making its possible for data to get truncated or for the database engine to generate a conversion error. Make sure you read up on ISNULL before adding it to your code.

Failure #5: Treating ISNULL the same as COALESCE

In the last failure, I tried to address any confusion there might be around the ISNULL function, but there’s another area of possible confusion, and that is in understanding the differences between ISNULL and COALESCE. Both functions let us replace NULL with a real value, but there are differences between the two that can be important to understand.

First off, the ISNULL function implemented in SQL Server is specific to that brand of T-SQL, whereas COALESCE is standard ANSI, making it more universal and, consequently, more portable, an important consideration if the possibility exists that you might one day want to point your apps to a different database system.

The COALESCE function also lets us include more than two expressions, as shown in the following SELECT statement:

SELECT Name AS ProductName, Class, Style, ProductLine,

COALESCE(Class, Style, ProductLine, 'n/a') AS FirstNotNull

FROM Production.Product

WHERE ProductID IN(679, 706, 709, 711);

The function evaluates the expressions in the order they’re specified and returns the first non-NULL value. As the following results show, each row returns one of the specified values in the FirstNotNull column:

ProductName	Class	Style	ProductLine	FirstNotNull
Rear Derailleur Cage	NULL	NULL	NULL	n/a
HL Road Frame – Red, 58	H	U	R	H
Mountain Bike Socks, M	NULL	U	M	U
Sport-100 Helmet, Blue	NULL	NULL	S	S

The COALESCE function can actually be thought of as syntactical shorthand for a CASE expression:

SELECT Name AS ProductName, Class, Style, ProductLine,

CASE

WHEN Class IS NOT NULL THEN Class

WHEN Style IS NOT NULL THEN Style

WHEN ProductLine IS NOT NULL THEN ProductLine

ELSE 'n/a'

END AS FirstNotNull

FROM Production.Product

WHERE ProductID IN(679, 706, 709, 711);

The SELECT statement achieves the same results as the preceding example, but its construction is more complex. The main advantage of using a CASE expression is that it makes the logic clearer, for those who might need a little extra help.

Compared to the COALESCE function, the ISNULL function is much more limited:

SELECT Name AS ProductName, Class, Style,

ISNULL(Class, Style) AS FirstNotNull

FROM Production.Product

WHERE ProductID IN(679, 706, 709, 711);

In this case, all we’re saying is that if Class is NULL, then use Style, giving us the following results:

ProductName	Class	Style	FirstNotNull
Rear Derailleur Cage	NULL	NULL	NULL
HL Road Frame – Red, 58	H	U	H
Mountain Bike Socks, M	NULL	U	U
Sport-100 Helmet, Blue	NULL	NULL	NULL

We could have specified n/a as our second argument, similar to what we did with COALESCE, but you get the point. COALESCE is more flexible and it conforms to ANSI standards. In addition, COALESCE handles data types differently. To understand this difference, let’s start with a different example:

SELECT

Name AS ProductName, Class, Style, ProductSubcategoryID

FROM Production.Product

WHERE ProductID = 525;

The Class and Style columns are defined as NCHAR, and the ProductSubcategoryID column as INT. The SELECT statement returns the following results:

ProductName	Class	Style	ProductSubcategoryID
LL Shell	L	NULL	NULL

Now lets modify our code to include the COALESCE function:

SELECT

Name AS ProductName, Class, Style, ProductSubcategoryID,

COALESCE(Class, Style, ProductSubcategoryID) AS FirstNotNull

FROM Production.Product

WHERE ProductID = 525;

The database engine now generates an error when it tries to convert the Class value to an INT:

1 2	Msg 245, Level 16, State 1, Line 415 Conversion failed when converting the nvarchar value 'L ' to data type int.

It turns out that COALESCE uses the type with the highest precedence, based on the submitted values. Because the ProductSubcategoryID column takes an INT and an INT takes precedence over NCHAR (which is treated as NVARCHAR), the database engine tries to convert L to an INT, which of course is not possible.

In such cases, we might try changing the order of the columns we pass to the COALESCE function:

SELECT

Name AS ProductName, Class, Style, ProductSubcategoryID,

COALESCE(ProductSubcategoryID, Class, Style) AS FirstNotNull

FROM Production.Product

WHERE ProductID = 525;

But we will again receive a conversion area. The database engine does not care about the order of the arguments, only which types take precedence over others:

The ISNULL function works differently:

SELECT

Name AS ProductName, Class, ProductSubcategoryID,

ISNULL(Class, ProductSubcategoryID) AS FirstNotNull

FROM Production.Product

WHERE ProductID = 525;

In this case, the Class data type sets the standard, so the database engine has no problem returning the results:

ProductName	Class	ProductSubcategoryID	FirstNotNull
LL Shell	L	NULL	L

Now let’s reverse the order of the columns we pass to ISNULL:

SELECT

Name AS ProductName, Class, ProductSubcategoryID,

ISNULL(ProductSubcategoryID, Class) AS FirstNotNull

FROM Production.Product

WHERE ProductID = 525;

The database engine will once again try to convert L to an INT and generate a conversion error. Be sure that, whenever you’re using these functions, you are well aware of the possibility for conversion errors.

Failure #6: Forgetting about the `NULLIF` function

While we’re on the topic of NULL-related functions, it’s worth noting another one that garners less attention, but can still be useful: the NULLIF function. We don’t seem to come across this one as often as the other functions, in part because there does not seem to be as many use cases for NULLIF as there are the others. Basically, the function returns a NULL value if two expressions are equal. If they are not equal, the function returns the first expression. Here’s an example:

SELECT Name AS ProductName, MakeFlag, FinishedGoodsFlag,

NULLIF(MakeFlag, FinishedGoodsFlag) AS NullIfEqual

FROM Production.Product

WHERE ProductID IN(534, 679, 680, 707);

All we’re doing here is passing the two columns into the function and returning the results, as shown in the following table:

ProductName	MakeFlag	FinishedGoodsFlag	NullIfEqual
Top Tube	1	0	1
Rear Derailleur Cage	0	0	NULL
HL Road Frame – Black, 58	1	1	NULL
Sport-100 Helmet, Red	0	1	0

We can achieve the same results by using a CASE expression instead of the NULLIF function:

SELECT Name AS ProductName, MakeFlag, FinishedGoodsFlag,

CASE

WHEN MakeFlag = FinishedGoodsFlag THEN NULL

ELSE MakeFlag

END AS NullIfEqual

FROM Production.Product

WHERE ProductID IN(534, 679, 680, 707);

As with COALESCE, the NULLIF function helps to simplify our statements, but the CASE expression makes the logic clearer to the rest of the world.

Perhaps one of the most useful places to use the NULLIF function is within a mathematical expression in which an error would be generated if a zero were encountered. Dividing by zero is a good example of this. Let’s start with the following SELECT statement:

SELECT Name AS ProductName, StandardCost, ListPrice,

StandardCost / (ListPrice - StandardCost) AS NetDiff

FROM Production.Product

WHERE ProductID = 680;

The database engine runs the calculation with no problem and returns the following results:

ProductName	StandardCost	ListPrice	NetDiff
HL Road Frame – Black, 58	1059.31	1431.50	2.8461

Now let’s pull the data from a row in which the StandardCost and ListPrice values are zero:

SELECT Name AS ProductName, StandardCost, ListPrice,

StandardCost / (ListPrice - StandardCost) AS NetDiff

FROM Production.Product

WHERE ProductID = 679;

This time, we get a divide-by-zero error:

1 2	Msg 8134, Level 16, State 1, Line 603 Divide by zero error encountered.

This is where NULLIF comes it. By returning a NULL in place of a zero, we can at least prevent the database engine from generating an error:

SELECT Name AS ProductName, StandardCost, ListPrice,

StandardCost / NULLIF((ListPrice - StandardCost), 0) AS NetDiff

FROM Production.Product

WHERE ProductID = 679;

Now we at least get results we can live with.

ProductName	StandardCost	ListPrice	NetDiff
Rear Derailleur Cage	0.00	0.00	NULL

We could have also used the WHERE clause to filter our rows that would have given us zero, but then we would not be able to view any of the data’s row. The NULLIF function at least gives us data to work with and makes it easy to do so.

Failure #7: Assuming that 2 + `NULL` = 2

As great as the NULLIF function is for handling certain situations, such as avoiding divide-by-zero errors, having NULL show up in our numeric calculations is often less than optimal. It’s not that SQL Server does anything surprising, but rather that you need to know what’s coming should you run into issues with your calculations.

This all goes back to the idea that NULL is a non-value. It is nothing. It merely indicates that any real data value is missing. Consequently, when we try to add a value to nothing or subtract from nothing or multiply by nothing or divide by nothing or do anything by nothing, we end up with nothing, or NULL. Take a look at this SELECT statement:

SELECT Name AS ProductName, Weight,

(Weight * 4.22) AS FreightCost

FROM Production.Product

WHERE ProductID BETWEEN 715 AND 718;

We’re simply multiplying the Weight value by 4.22 and getting the following results:

ProductName	Weight	FreightCost
Long-Sleeve Logo Jersey, L	NULL	NULL
Long-Sleeve Logo Jersey, XL	NULL	NULL
HL Road Frame – Red, 62	2.30	9.7060
HL Road Frame – Red, 44	2.12	8.9464

Not surprisingly, whenever we try to multiply by NULL, we get NULL. What choice does the database engine have? Now let’s throw a little addition in there to see what happens:

SELECT Name AS ProductName, Weight,

((Weight + 3) * 4.22) AS FreightCost

FROM Production.Product

WHERE ProductID BETWEEN 715 AND 718;

As before, when Weight contains a real value, we get a real value in return, but when NULL enters the equation, we always end up with NULL:

ProductName	Weight	FreightCost
Long-Sleeve Logo Jersey, L	NULL	NULL
Long-Sleeve Logo Jersey, XL	NULL	NULL
HL Road Frame – Red, 62	2.30	22.3660
HL Road Frame – Red, 44	2.12	21.6064

One approach we can take (when circumstances make it reasonable) is to define a default value that we can return in the event the source data is NULL:

SELECT Name AS ProductName, Weight,

ISNULL((Weight * 4.22), 4.95) AS FreightCost

FROM Production.Product

WHERE ProductID BETWEEN 715 AND 718;

In this case, we’re using ISNULL to return a default value of 4.95 should our calculation result in a NULL value, which gives us the following results:

ProductName	Weight	FreightCost
Long-Sleeve Logo Jersey, L	NULL	4.9500
Long-Sleeve Logo Jersey, XL	NULL	4.9500
HL Road Frame – Red, 62	2.30	9.7060
HL Road Frame – Red, 44	2.12	8.9464

SQL Server does what it should do when it encounters NULL. It returns NULL. We just need to be prepared for this eventuality. In some cases, having our expressions return a NULL value is fine for our circumstances, but at other times, we’ll want to adjust our code to accommodate those wayward missing values. Otherwise a whole lot of customers might end up getting free shipping because the weight was initially not a consideration.

Failure #8: Aggregating data without taking NULL into account

As with numeric calculations, we must be prepared to handle aggregations that could include NULL values. However, unlike mathematic operations, the database engine eliminates NULL values from the calculations before performing the actual aggregations, at least at the most basic level. Let’s start with a simple SELECT statement:

SELECT Name AS ProductName, Weight

FROM Production.Product

WHERE ProductID BETWEEN 715 AND 720;

The statement returns the following results, which include two rows with a Weight value of NULL:

ProductName	Weight
Long-Sleeve Logo Jersey, L	NULL
Long-Sleeve Logo Jersey, XL	NULL
HL Road Frame – Red, 62	2.30
HL Road Frame – Red, 44	2.12
HL Road Frame – Red, 48	2.16
HL Road Frame – Red, 52	2.20

Suppose we now try to find the average weight for these rows:

SELECT AVG(Weight)

FROM Production.Product

WHERE ProductID BETWEEN 715 AND 720;

The database engine will perform the aggregation with no problem by first eliminating the NULL rows from the mix and then returning a result of 2.195000. However, the results also include something else, a warning message about the NULL values:

1	Warning: Null value is eliminated by an aggregate or other SET operation.

We can verify that the NULL rows are being eliminated by instead using the COUNT aggregate function on the Weight column:

SELECT COUNT(weight)

FROM Production.Product

WHERE ProductID BETWEEN 715 AND 720;

This time the database engine returns a value of 4, indicating that the two NULL rows have been eliminated. We will again receive the warning message.

All aggregate functions eliminate the NULL values prior to preforming the actual calculations except for the COUNT(*) function:

SELECT COUNT(*)

FROM Production.Product

WHERE ProductID BETWEEN 715 AND 720;

In this case, the database engine includes all rows, no matter where the NULL values lie, and will generate no warning messages.

The fact that SQL Server handles aggregations this way is not a bad thing. But it’s something you need to understand. If you’re not familiar with the data, your queries might be eliminating rows without understanding the full implications. You can consider using a function such as ISNULL to replace the NULL values, but you still risk skewing your results. As with many situations, handling NULL values can take a special touch. For example, detailed analytics will require, at the very least, knowledge of the number of NULL values in a particular data set.

As a side note, those who don’t appreciate having their queries return warning messages like those we’ve seen here, can use the SET ANSI_WARNINGS statement to turn warnings off when executing an aggregation:

SET ANSI_WARNINGS OFF;

SELECT AVG(Weight)

FROM Production.Product

WHERE ProductID BETWEEN 715 AND 720;

SET ANSI_WARNINGS ON;

Although this will give us the same results without the message, turning off this option can affect other operations. For example, modifying data on indexed views or on tables with indexed computed columns will fail if the option is turned off. So tread carefully if you consider this route.

Also be aware that when you start grouping and aggregating data, you might not get exactly the results you expect when that data includes NULL values. For example, the following SELECT statement groups by the ProductSubcategoryID column in our trusty Product table:

SELECT ProductSubcategoryID AS ProdSubID,

AVG(Weight) AS AvgWeight,

COUNT(*) AS FullCount,

COUNT(Weight) AS ColCount

FROM Production.Product

GROUP BY ProductSubcategoryID;

The following table shows the first 10 rows in our result set:

ProdSubID	AvgWeight	FullCount	ColCount
NULL	434.285714	209	7
1	25.058750	32	32
2	17.448139	43	43
3	27.680909	22	22
4	NULL	8	0
5	187.000000	3	3
6	317.000000	2	2
7	NULL	1	0
8	603.333333	3	3
9	151.500000	2	2

The full result set represents all 504 rows in the Product table, including rows whose ProductSubcategoryID and Weight columns are NULL. If we were to dig into the averages and the individual column counts for each group, we would discover that the NULL values are not included, but the result set as a whole includes all the data. Again, I point this out not to suggest that there is a problem with the way SQL Server groups and aggregates data, but only to demonstrate that we need to be aware of what’s happening with our queries when NULL values are involved. In this case, by including both types of counts in each group, we’re providing a handy way to provide insight into our source data.

Failure #9: Forgetting about `NULL` in your Boolean expressions

Boolean expressions that involve NULL values can be especially tricky because they introduce what is called three-valued logic. Typically, when an expression is evaluated, it returns either TRUE or FALSE. However, when NULL enters the mix, we get a third possibility: UNKNOWN. As we’ve already covered, SQL Server marks a value as NULL if it does not exist. As a result, a Boolean expression that tries to evaluate a condition based on a missing value results in UNKNOWN.

The main thing to keep in mind with Boolean expressions is that your result set will include only those rows in which the expression evaluates to TRUE. If the expression evaluates to FALSE or UNKNOWN, the row is omitted.

The specifics of three-valued logic can seem unwieldy when taking account the various combinations that can result from using the AND, OR, and NOT operators, but you can find a handy reference to help sort all that out be referring to the SQL Server help topic “Null Values.” In the meantime, lets look at a few examples that demonstrate what happens when NULL values enter the Boolean mix, starting with the following temporary table:

SELECT ProductID, Name AS ProductName, Weight, Class

INTO #products

FROM Production.Product

WHERE ProductID BETWEEN 945 AND 954;

SELECT * FROM #products;

The following results show the data that’s now stored in the #products table:

ProductID	ProductName	Weight	Class
945	Front Derailleur	88.00	NULL
946	LL Touring Handlebars	NULL	L
947	HL Touring Handlebars	NULL	H
948	Front Brakes	317.00	NULL
949	LL Crankset	600.00	L
950	ML Crankset	635.00	M
951	HL Crankset	575.00	H
952	Chain	NULL	NULL
953	Touring-2000 Blue, 60	27.90	M
954	Touring-1000 Yellow, 46	25.13	H

Now let’s use the AND operator to create a simple Boolean expression in our WHERE clause:

1 2	SELECT * FROM #products WHERE Weight > 500 AND Class = 'H';

As expected, the statement returns the results shown in the following table:

ProductID	ProductName	Weight	Class
951	HL Crankset	575.00	H

Notice that the query engine excludes the row for product ID 947. Even though the Class value is H, the Weight value is NULL, so the expression evaluates to UNKNOWN. And of course, row 952 is excluded because both the Weight and Class values are NULL.

Now let’s rerun the statement, but this time use the OR operator for our Boolean expression:

1 2	SELECT * FROM #products WHERE Weight > 500 OR Class = 'H';

As expected, our results look quite different:

ProductID	ProductName	Weight	Class
947	HL Touring Handlebars	NULL	H
949	LL Crankset	600.00	L
950	ML Crankset	635.00	M
951	HL Crankset	575.00	H
954	Touring-1000 Yellow, 46	25.13	H

In this case, only one of the two Boolean conditions needs to evaluate to TRUE, resulting in many more returned rows. Even row 947 is included, even through the Weight value is NULL. Because the OR operator is used and the second condition evaluates to TRUE, that row is included.

This time around, let’s use the AND NOT operator in our statement:

1 2	SELECT * FROM #products WHERE Weight > 500 AND NOT Class = 'H';

Once again, we get very different results:

ProductID	ProductName	Weight	Class
949	LL Crankset	600.00	L
950	ML Crankset	635.00	M

Not surprising, the only rows to be included are those in which both Boolean conditions evaluate to TRUE. All the FALSE and UNKNOWN values are left behind.

As with aggregate functions, SQL Server’s treatment of Boolean expressions in itself is not a bad thing. You just need to be aware of what can happen with your data when NULL values are involved. Data can be eliminated that you might expect to be included, leading to unintended consequences in subsequent analytics and decision-making.

Failure #10: Not accounting for NULL in your string expressions

Up to this point, we’ve touched little upon string values and what happens when NULL enters the mix, but at least some coverage in this area is warranted, beyond what we saw when working with Boolean expressions.

Suppose we start with the following SELECT statement:

SELECT ProductID, Name AS ProductName, Color

FROM Production.Product

WHERE ProductID BETWEEN 835 AND 845;

The statement returns the results shown here:

ProductID	ProductName	Color
835	ML Road Frame-W – Yellow, 44	Yellow
836	ML Road Frame-W – Yellow, 48	Yellow
837	HL Road Frame – Black, 62	Black
838	HL Road Frame – Black, 44	Black
839	HL Road Frame – Black, 48	Black
840	HL Road Frame – Black, 52	Black
841	Men’s Sports Shorts, S	Black
842	Touring-Panniers, Large	Grey
843	Cable Lock	NULL
844	Minipump	NULL
845	Mountain Pump	NULL

Now suppose when want to filter our data set even further by returning all rows with a Color value other than black:

SELECT ProductID, Name AS ProductName, Color

FROM Production.Product

WHERE ProductID BETWEEN 835 AND 845

AND Color <> 'black';

Now our SELECT statement returns the following results:

ProductID	ProductName	Color
835	ML Road Frame-W – Yellow, 44	Yellow
836	ML Road Frame-W – Yellow, 48	Yellow
842	Touring-Panniers, Large	Grey

We might have expected the results to include those rows whose Color value is NULL. However, we’re once again faced with the issue of trying to evaluate an expression against what is essentially a missing value. A “not black” value cannot be equated with a “missing” value; consequently, those rows are omitted.

Concatenating string values follows similar logic:

SELECT Name AS ProductName, ProductNumber, Color,

(ProductNumber + '_' + Color) AS NewProductNumber

FROM Production.Product

WHERE ProductID BETWEEN 835 AND 845;

Whenever NULL shows up, we end up with nothing (NULL) in our NewProductNumber column:

ProductName	ProductNumber	Color	NewProductNumber
ML Road Frame-W – Yellow, 44	FR-R72Y-44	Yellow	FR-R72Y-44_Yellow
ML Road Frame-W – Yellow, 48	FR-R72Y-48	Yellow	FR-R72Y-48_Yellow
HL Road Frame – Black, 62	FR-R92B-62	Black	FR-R92B-62_Black
HL Road Frame – Black, 44	FR-R92B-44	Black	FR-R92B-44_Black
HL Road Frame – Black, 48	FR-R92B-48	Black	FR-R92B-48_Black
HL Road Frame – Black, 52	FR-R92B-52	Black	FR-R92B-52_Black
Men’s Sports Shorts, S	SH-M897-S	Black	SH-M897-S_Black
Touring-Panniers, Large	PA-T100	Grey	PA-T100_Grey
Cable Lock	LO-C100	NULL	NULL
Minipump	PU-0452	NULL	NULL
Mountain Pump	PU-M044	NULL	NULL

Once again, the bottom line is that you must be prepared to handle NULL values if they could in any way impact your results. For example, you might try throwing an ISNULL into the mix:

SELECT Name AS ProductName, ProductNumber, Color,

(ProductNumber + '_' + ISNULL(Color, 'neutral')) AS NewProductNumber

FROM Production.Product

WHERE ProductID BETWEEN 835 AND 845;

Now the statement returns the following results:

ProductName	ProductNumber	Color	NewProductNumber
ML Road Frame-W – Yellow, 44	FR-R72Y-44	Yellow	FR-R72Y-44_Yellow
ML Road Frame-W – Yellow, 48	FR-R72Y-48	Yellow	FR-R72Y-48_Yellow
HL Road Frame – Black, 62	FR-R92B-62	Black	FR-R92B-62_Black
HL Road Frame – Black, 44	FR-R92B-44	Black	FR-R92B-44_Black
HL Road Frame – Black, 48	FR-R92B-48	Black	FR-R92B-48_Black
HL Road Frame – Black, 52	FR-R92B-52	Black	FR-R92B-52_Black
Men’s Sports Shorts, S	SH-M897-S	Black	SH-M897-S_Black
Touring-Panniers, Large	PA-T100	Grey	PA-T100_Grey
Cable Lock	LO-C100	NULL	LO-C100_neutral
Minipump	PU-0452	NULL	PU-0452_neutral
Mountain Pump	PU-M044	NULL	PU-M044_neutral

At least this way we have something to work with other than NULL.

Failure #11: Joining tables on nullable columns

Another area where we might get results other than what we expect is when we join data based on nullable columns. Because NULL value are considered non-values and because the database engine cannot match non-values, NULL values will show up in the join column only for outer joins. To demonstrate, let’s start by creating and populating a couple tables:

CREATE TABLE Products

(ProductID INT, ProductName NVARCHAR(50), CategoryAltID INT);

INSERT INTO Products VALUES

(945, 'Front Derailleur', 10),

(946, 'LL Touring Handlebars', 20),

(947, 'HL Touring Handlebars', NULL),

(948, 'Front Brakes', 20),

(949, 'LL Crankset', 10),

(950, 'ML Crankset', NULL);

CREATE TABLE Categories

(CategoryNumber NVARCHAR(25), CategoryAltID INT);

INSERT INTO Categories VALUES

('FR-R72Y-44', 10),

('FR-R72Y-48', 20),

('FR-R92B-62', NULL),

('FR-R92B-52', NULL);

Notice that the CategoryAltID column in both tables contains NULL values. Now let’s create an inner join between those tables, based on that column:

SELECT p.ProductID, p.ProductName, p.CategoryAltID AS pCatAltID,

c.CategoryAltID AS cCatAltID, c.CategoryNumber

FROM Products p JOIN Categories c

ON p.CategoryAltID = c.CategoryAltID;

Although the CategoryAltID column contains NULL values in both tables, they are not considered equal (for all the reasons we’ve discussed), which means they’re not included in the result set:

ProductID	ProductName	pCatAltID	cCatAltID	CategoryNumber
945	Front Derailleur	10	10	FR-R72Y-44
946	LL Touring Handlebars	20	20	FR-R72Y-48
948	Front Brakes	20	20	FR-R72Y-48
949	LL Crankset	10	10	FR-R72Y-44

Now let’s revise the statement to include a left outer join:

SELECT p.ProductID, p.ProductName, p.CategoryAltID AS pCatAltID,

c.CategoryAltID AS cCatAltID, c.CategoryNumber

FROM Products p LEFT OUTER JOIN Categories c

ON p.CategoryAltID = c.CategoryAltID;

This time our results include all rows from the Product table, including those whose CategoryAltID column is NULL in both tables:

ProductID	ProductName	pCatAltID	cCatAltID	CategoryNumber
945	Front Derailleur	10	10	FR-R72Y-44
946	LL Touring Handlebars	20	20	FR-R72Y-48
947	HL Touring Handlebars	NULL	NULL	NULL
948	Front Brakes	20	20	FR-R72Y-48
949	LL Crankset	10	10	FR-R72Y-44
950	ML Crankset	NULL	NULL	NULL

The query engine includes NULL rows not because they’re considered matches, but rather because that’s the nature of an outer join and the way it returns all rows from the “outer” table. We can also try our statement with a full outer join:

SELECT p.ProductID, p.ProductName, p.CategoryAltID AS pCatAltID,

c.CategoryAltID AS cCatAltID, c.CategoryNumber

FROM Products p FULL OUTER JOIN Categories c

ON p.CategoryAltID = c.CategoryAltID;

Now our results include everything, regardless of the NULL values:

ProductID	ProductName	pCatAltID	cCatAltID	CategoryNumber
945	Front Derailleur	10	10	FR-R72Y-44
946	LL Touring Handlebars	20	20	FR-R72Y-48
947	HL Touring Handlebars	NULL	NULL	NULL
948	Front Brakes	20	20	FR-R72Y-48
949	LL Crankset	10	10	FR-R72Y-44
950	ML Crankset	NULL	NULL	NULL
NULL	NULL	NULL	NULL	FR-R92B-62
NULL	NULL	NULL	NULL	FR-R92B-52

If you consider the logic behind the query engine’s approach to NULL values, it’s not surprising we get the results we do. Like any place where NULL shows up, it comes down to understanding the data we could potentially be dealing with and what that means to the bigger picture. Whenever NULL enters into the equation, we must take it into consideration so we don’t end up with query results very different from what we expect. When we join data, a NULL in one column can cause us to miss important data in other columns, all because we did not think through the possible outcomes.

Failure #12: Creating a unique index without taking NULL into account

Up to this point, I’ve tried to emphasize how SQL Server uses NULL only to indicate that a value is missing. We cannot compare or add or multiply NULL values because we cannot compare or add or multiply missing values. When it comes to unique indexes, however, SQL Server seems to have a double standard, treating NULL values as equal values and consequently permitting only one NULL value in a unique index.

To demonstrate, let’s create and populate a simple table:

CREATE TABLE Products

(ProductID INT, ProductName NVARCHAR(50), ProductNumber NVARCHAR(25));

INSERT INTO Products VALUES

(945, 'Front Derailleur', 'FD-2342'),

(946, 'LL Touring Handlebars', 'HB-T721'),

(947, 'HL Touring Handlebars', 'HB-T928'),

(948, 'Front Brakes', 'FB-9873'),

(949, 'LL Crankset', NULL),

(950, 'ML Crankset', NULL);

Now let’s try to create a unique index on the ProductNumber column:

1 2	CREATE UNIQUE INDEX ux_products_productnumber ON Products(ProductNumber);

Because the column contains multiple NULL values, the database engine returns the following error:

Msg 1505, Level 16, State 1, Line 982

The CREATE UNIQUE INDEX statement terminated because a duplicate key was found for the object name 'dbo.Products' and the index name 'ux_products_productnumber'. The duplicate key value is (<NULL>).

The statement has been terminated.

This might seem contradictory behavior on the part of SQL Server, given the non-value status it imposes on NULL everywhere else. After all, how can missing values be considered duplicates if there is no way to compare them?

Because of this limitation, we have to be particularly careful when we create our tables and indexes prior to populating them. We could be faced with similar error messages if we try to insert a row with a duplicate NULL value.

Fortunately, since SQL Server 2008, we’ve been able to create filtered indexes that let us sidestep the duplicate NULL issue:

CREATE UNIQUE INDEX ux_products_productnumber

ON Products(ProductNumber)

WHERE ProductNumber IS NOT NULL;

The index is now created without a hitch. We can even insert additional duplicate NULL values:

1 2	INSERT INTO Products VALUES (951, 'HL Crankset', NULL);

Creating a filtered index is easy enough to do and is a handy way to sidestep the limitations of a unique index. Just be sure to refer to the SQL Server help topic “Create Filtered Indexes” to make certain you’re familiar with the limitations and restrictions on these types of indexes.

Failure #13: Sorting data on a nullable column

A result set that is not in the order you want might not seem such a big deal, but it can still be annoying when you sort your results and end up with a bunch of NULL values at the beginning. For example, the following SELECT statement sorts the data by the Color column:

SELECT ProductID, Name AS ProductName, Color

FROM Production.Product

WHERE ProductID BETWEEN 835 AND 845

ORDER BY Color;

The statement gives us the following results:

ProductID	ProductName	Color
843	Cable Lock	NULL
844	Minipump	NULL
845	Mountain Pump	NULL
837	HL Road Frame – Black, 62	Black
838	HL Road Frame – Black, 44	Black
839	HL Road Frame – Black, 48	Black
840	HL Road Frame – Black, 52	Black
841	Men’s Sports Shorts, S	Black
842	Touring-Panniers, Large	Grey
835	ML Road Frame-W – Yellow, 44	Yellow
836	ML Road Frame-W – Yellow, 48	Yellow

We can instead sort the data in descending order, but that would put the colors themselves in reverse order. Another work-around is to slip a CASE expression into our WHERE clause to trick the database engine:

SELECT ProductID, Name AS ProductName, Color

FROM Production.Product

WHERE ProductID BETWEEN 835 AND 845

ORDER BY

CASE WHEN Color IS NULL THEN 1 ELSE 0 END,

Color;

The extra expression assigns 1 to the NULL rows and 0 to the rest so they’re sorted first. From there, the normal sorting occurs, as shown in the following results:

ProductID	ProductName	Color
837	HL Road Frame – Black, 62	Black
838	HL Road Frame – Black, 44	Black
839	HL Road Frame – Black, 48	Black
840	HL Road Frame – Black, 52	Black
841	Men’s Sports Shorts, S	Black
842	Touring-Panniers, Large	Grey
835	ML Road Frame-W – Yellow, 44	Yellow
836	ML Road Frame-W – Yellow, 48	Yellow
843	Cable Lock	NULL
844	Minipump	NULL
845	Mountain Pump	NULL

This might not be an earth-shattering solution, but it’s an easy enough way to move those NULL values to the bottom, if you happen to prefer them there.

This document contains proprietary information and is protected by copyright law.

Register for Simple Talk

How to Get NULLs Horribly Wrong in SQL Server

Failure #1: Not knowing what `NULL` means

Failure #2: Treating NULL like a real value in comparisons

Failure #3: Not knowing how `ANSI_NULL` works

Failure #4: Treating ISNULL the same as IS NULL

Failure #5: Treating ISNULL the same as COALESCE

Failure #6: Forgetting about the `NULLIF` function

Failure #7: Assuming that 2 + `NULL` = 2

Failure #8: Aggregating data without taking NULL into account

Failure #9: Forgetting about `NULL` in your Boolean expressions

Failure #10: Not accounting for NULL in your string expressions

Failure #11: Joining tables on nullable columns

Failure #12: Creating a unique index without taking NULL into account

Failure #13: Sorting data on a nullable column

Article tags

About the author

How to Use the psql Command-Line Tool with Azure Database for PostgreSQL Flexible Server

How to use pgAdmin with Azure Database for PostgreSQL flexible server

Getting started with Azure Database for PostgreSQL Flexible Server

How to Get NULLs Horribly Wrong in SQL Server

Failure #1: Not knowing what NULL means

Failure #2: Treating NULL like a real value in comparisons

Failure #3: Not knowing how ANSI_NULL works

Failure #4: Treating ISNULL the same as IS NULL

Failure #5: Treating ISNULL the same as COALESCE

Failure #6: Forgetting about the NULLIF function

Failure #7: Assuming that 2 + NULL = 2

Failure #8: Aggregating data without taking NULL into account

Failure #9: Forgetting about NULL in your Boolean expressions

Failure #10: Not accounting for NULL in your string expressions

Failure #11: Joining tables on nullable columns

Failure #12: Creating a unique index without taking NULL into account

Failure #13: Sorting data on a nullable column

Article tags

Recommended

Subscribe for more

About the author

How to Use the psql Command-Line Tool with Azure Database for PostgreSQL Flexible Server

How to use pgAdmin with Azure Database for PostgreSQL flexible server

Getting started with Azure Database for PostgreSQL Flexible Server

Failure #1: Not knowing what `NULL` means

Failure #3: Not knowing how `ANSI_NULL` works

Failure #6: Forgetting about the `NULLIF` function

Failure #7: Assuming that 2 + `NULL` = 2

Failure #9: Forgetting about `NULL` in your Boolean expressions