- Failure #1: Not knowing what NULL means
- Failure #2: Treating NULL like a real value in comparisons
- Failure #3: Not knowing how ANSI_NULL works
- Failure #4: Treating ISNULL the same as IS NULL
- Failure #5: Treating ISNULL the same as COALESCE
- Failure #6: Forgetting about the NULLIF function
- Failure #7: Assuming that 2 + NULL = 2
- Failure #8: Aggregating data without taking NULL into account
- Failure #9: Forgetting about NULL in your Boolean expressions
- Failure #10: Not accounting for NULL in your string expressions
- Failure #11: Joining tables on nullable columns
- Failure #12: Creating a unique index without taking NULL into account
- Failure #13: Sorting data on a nullable column
Failure #1: Not knowing what NULL
means
When referring to NULL
within our data sets, we often use phrases such as “the NULL
value” or “a value of NULL
.” I do it all the time. I see it done all the time. Such phrases are so common that we think little of their use. But inherent in the phrase is the notion of NULL
as an actual value. It is not. NULL
is a non-value, a nonexistent value. It is not zero. It is not an empty string. A value cannot equal NULL
. No two NULL
values are equal.
A NULL
value is often defined as one that is unknown or not applicable, but even these definitions can be open to debate. For example, a record might not include a customer’s birthdate because the salesperson didn’t ask or because the customer would not provide it, but the customer still knows that date, so it is hardly unknown, nor is it any less applicable. There can be many reasons for missing data, and there has been much debate about a precise meaning of NULL
. If you need to attach a specific meaning to NULL
, then missing or absent data is probably your safest bet.
Perhaps a better way to think of NULL
is as a setting or marker that indicates if a data value does not exist. That certainly seems to be how SQL Server treats NULL
. The database engine uses a special bitmap to track which columns in a row are NULL
and which are not. The bitmap contains a bit for each column, with the bit set to 1 if the column is NULL
, that is, if the value is missing.
The confusion for some, particularly beginning T-SQL developers, is the way SQL Server Management Studio indicates that no value exists for a specific column: by displaying what appears to be a value that reads NULL
. For example, the following SELECT
statement returns data from the Product
table in the AdventureWorks2014
sample database:
1 2 3 4 |
SELECT Name AS ProductName, ProductLine, Class, Style, SellEndDate FROM Production.Product WHERE ProductID BETWEEN 680 AND 715; |
As the following results show, every row includes at least one column for which no value exists:
ProductName |
ProductLine |
Class |
Style |
SellEndDate |
HL Road Frame – Black, 58 |
R |
H |
U |
NULL |
HL Road Frame – Red, 58 |
R |
H |
U |
NULL |
Sport-100 Helmet, Red |
S |
NULL |
NULL |
NULL |
Sport-100 Helmet, Black |
S |
NULL |
NULL |
NULL |
Mountain Bike Socks, M |
M |
NULL |
U |
2012-05-29 00:00:00.000 |
Mountain Bike Socks, L |
M |
NULL |
U |
2012-05-29 00:00:00.000 |
Sport-100 Helmet, Blue |
S |
NULL |
NULL |
NULL |
AWC Logo Cap |
S |
NULL |
U |
NULL |
Long-Sleeve Logo Jersey, S |
S |
NULL |
U |
NULL |
Long-Sleeve Logo Jersey, M |
S |
NULL |
U |
NULL |
Long-Sleeve Logo Jersey, L |
S |
NULL |
U |
NULL |
If we look at the columns with the NULL
values, we can see how difficult it is to determine why the values are missing. For example, is a SellEndDate
value missing because the product is still live or because the application failed to properly record the end date? Are the Class
and Style
values missing because they don’t apply to those particular rows or because someone was careless in entering data?
An even bigger problem with viewing the results in this way is that it suggests that the database is actually storing multiple instances of this mysterious placeholder called NULL
and that all those NULL
values are the same. They are not the same. There is nothing there. Nothing cannot be the same as nothing. Nothing cannot be compared to nothing. You can think of these as the fundamental principles behind NULL
. The principles of nothingness. And not understanding these principles can lead to all sorts of problems when working with SQL Server, resulting in many of the failures we cover here.
Failure #2: Treating NULL like a real value in comparisons
Developers who don’t understand how NULL
works will sometimes use comparison operators to compare an expression to a NULL
value, resulting in a statement that looks similar to the following:
1 2 3 |
SELECT Title, FirstName, MiddleName, LastName FROM Person.Person WHERE Title = NULL AND BusinessEntityID < 7; |
The assumption here is that NULL
is a valid value and therefore can be compared to other NULL
values. That means, in this case, the statement should return each row whose Title
value is NULL
. A quick check into the source data will show four rows that match the statement’s assumed logic. However, because the database engine cannot compare nothing to nothing, it will not return these rows and will instead provide us with an empty result set.
If we assume from the empty result set that no rows in the source data match our criteria, we might be satisfied with the statement as is and send the code onto production. (Let’s hope no one would really do that.) We might also choose to continue along this logical trajectory and recast the statement to return the non-NULL
data:
1 2 3 |
SELECT Title, FirstName, MiddleName, LastName FROM Person.Person WHERE Title <> NULL AND BusinessEntityID < 7; |
The statement is no longer trying to compare NULL
with NULL
, but rather compare real non-NULL
values to NULL
, which would seem an equally credible comparison, if not more so. It is not. We cannot compare real values to nothing any more than we can compare nothing to nothing. As a result, the statement still returns no data, even though we know for certain the data exists.
But before we go further, note that there are cases when comparing NULL
values in this way can work, which we cover in the next failure. For now, know that the proper way to perform NULL
comparisons is to use the IS
NULL
and IS
NOT
NULL
operators. For example, the following SELECT
statement uses the IS
NULL
operator to return rows whose Title
value is NULL
:
1 2 3 |
SELECT Title, FirstName, MiddleName, LastName FROM Person.Person WHERE Title IS NULL AND BusinessEntityID < 7; |
This time, the statement returns the data we expect, with the Title
value missing, along with a couple of MiddleName
values:
Title |
FirstName |
MiddleName |
LastName |
NULL |
Ken |
J |
Sánchez |
NULL |
Terri |
Lee |
Duffy |
NULL |
Roberto |
NULL |
Tamburello |
NULL |
Rob |
NULL |
Walters |
We can also return those rows that contain an actual Title
value by instead using the IS
NOT
NULL
operator:
1 2 3 |
SELECT Title, FirstName, MiddleName, LastName FROM Person.Person WHERE Title IS NOT NULL AND BusinessEntityID < 7; |
Now the statement returns only the following two rows:
Title |
FirstName |
MiddleName |
LastName |
Ms. |
Gail |
A |
Erickson |
Mr. |
Jossef |
H |
Goldberg |
The IS
NULL
and IS
NOT
NULL
operators provide the logic we need to handle the ambiguous nature of NULL
values. The purpose of these operators is to determine whether or not an expression is NULL
and from that return either TRUE
or FALSE
. Comparison operators, when faced with a NULL
value, instead want to return UNKNOWN
, which can result in unpredictable and costly mistakes if implemented in production.
Failure #3: Not knowing how ANSI_NULL
works
In the last failure, I demonstrated why you should not use comparison operators against NULL
values. Even if there are results to be had, a statement such as the following will likely not return them:
1 2 3 |
SELECT Title, FirstName, MiddleName, LastName FROM Person.Person WHERE Title = NULL AND BusinessEntityID < 7; |
The word likely is key here. It turns out that the behavior I described is the default behavior, but not the only behavior. SQL Server supports a feature called ANSI_NULLS
, which determines how the database engine handles NULL
comparisons. The option is a carry-over from the ghost of SQL Server Past and has been deprecated since 2005. But the setting still exists, it still works, and it can still cause all sorts of trouble.
By default, SQL Server installs with the ANSI_NULLS
option turned on, which gives us the behavior I’ve described so far. If we compare values to NULL
, the database engine does not return those rows, even if NULL
exists.
We can verify the current ANSI_NULLS
setting by running a statement similar to the following:
1 2 3 4 |
IF ( (32 & @@OPTIONS) = 32 ) PRINT 'The ANSI_NULLS option turned on.' ELSE PRINT 'The ANSI_NULLS option turned off.'; |
If no one has messed with ANSI_NULLS
, the database engine will return the first option, which will give us the results we’ve seen so far. But suppose we turn the option off:
1 2 3 4 |
SET ANSI_NULLS OFF; SELECT Title, FirstName, MiddleName, LastName FROM Person.Person WHERE Title = NULL AND BusinessEntityID < 7; |
Now our statement returns the expected four rows, despite the fact that we’re using a comparison operator against NULL
:
Title |
FirstName |
MiddleName |
LastName |
NULL |
Ken |
J |
Sánchez |
NULL |
Terri |
Lee |
Duffy |
NULL |
Roberto |
NULL |
Tamburello |
NULL |
Rob |
NULL |
Walters |
With the option turned off, we can also use the not equal operator:
1 2 3 4 |
SET ANSI_NULLS OFF; SELECT Title, FirstName, MiddleName, LastName FROM Person.Person WHERE Title <> NULL AND BusinessEntityID < 7; |
Now the results look like the following:
Title |
FirstName |
MiddleName |
LastName |
Ms. |
Gail |
A |
Erickson |
Mr. |
Jossef |
H |
Goldberg |
If we turn the option back on (it’s default setting), the database engine will again return no rows:
1 2 3 4 |
SET ANSI_NULLS ON; SELECT Title, FirstName, MiddleName, LastName FROM Person.Person WHERE Title = NULL AND BusinessEntityID < 7; |
Because the ANSI_NULLS
option has been deprecated, you should not use it and you should update any code that does, assuming you have the access and time. If you don’t, you could be in store for bigger problems.
Imagine an application that connects to two different SQL Server instances, one with the option turned on (the default) and one with it turned off. If the code is using comparison operators with NULL
values, you could end up with inaccurate data or conflicting results. Even if you’re dealing with only one SQL Server instance, you still risk unexpected results, especially if statements are turning the ANSI_NULLS
option on at the session level.
Failure #4: Treating ISNULL the same as IS NULL
The ISNULL
function tends to cause a bit of confusion when first encountered, in part because of its name, which implies a meaning similar to the IS
NULL
operator. Another reason for confusion arises from the fact that products such as Microsoft Access and MySQL also include the ISNULL
function, where it behaves much differently, more like the name suggests. But we’re talking SQL Server and need to understand how ISNULL
is implemented here. A lack of understanding can lead to misuse and failure to take advantage of its real purpose in SQL Server.
Let’s looks at a few examples to demonstrate, starting with a simple query that returns products whose Color
value is NULL
:
1 2 3 4 |
SELECT Name AS ProductName, Color FROM Production.Product WHERE ProductID BETWEEN 940 AND 949 AND Color IS NULL; |
ProductName |
Color |
LL Touring Handlebars |
NULL |
HL Touring Handlebars |
NULL |
If we’re not familiar with ISNULL
in SQL Server or are transitioning in from Access, we might attempt to retrieve the same results by using ISNULL
in our WHERE
clause:
1 2 3 4 |
SELECT Name AS ProductName, Color FROM Production.Product WHERE ProductID BETWEEN 940 AND 949 AND ISNULL(Color); |
Fortunately, this construction will generate the following error, which we’ll likely catch immediately, or at least during the Q&A phase:
1 2 |
Msg 174, Level 15, State 1, Line 305 The isnull function requires 2 argument(s). |
Those of us with a background in MySQL might instead take a different approach with the ISNULL
function by comparing it to 1 or 0, the function’s two possible return values:
1 2 3 4 |
SELECT Name AS ProductName, Color FROM Production.Product WHERE ProductID BETWEEN 940 AND 949 AND ISNULL(Color) = 1); |
Once again, we receive an error message:
1 2 |
Msg 174, Level 15, State 1, Line 322 The isnull function requires 2 argument(s). near ';'. |
By now, we should have checked the documentation to verify what this function is supposed to do and have come to the conclusion that, based on the syntax and the fairly explicit error messages, we should be adding a second argument, which we do:
1 2 3 4 |
SELECT Name AS ProductName, Color FROM Production.Product WHERE ProductID BETWEEN 940 AND 949 AND ISNULL(Color, 'neutral'); |
Unfortunately, all we’ve done is make our predicate completely illogical, resulting in yet another (but different) error message:
1 2 |
Msg 4145, Level 15, State 1, Line 339 An expression of non-boolean type specified in a context where a condition is expected, near ';'. |
The point of that this is that, in SQL Server, the purpose of the ISNULL
function is simply to replace an expression’s value with a real value, if the original value is NULL
(nonexistent). If we wanted to make the preceding example right, we would need to define our Boolean condition correctly:
1 2 3 4 |
SELECT Name AS ProductName, Color FROM Production.Product WHERE ProductID BETWEEN 940 AND 949 AND ISNULL(Color, 'neutral') IN ('black', 'neutral'); |
Now the SELECT statement returns the results we want, rather than an error:
ProductName |
Color |
LL Mountain Frame – Black, 40 |
Black |
LL Touring Handlebars |
NULL |
HL Touring Handlebars |
NULL |
LL Crankset |
Black |
That said, we can achieve the same thing by using IS
NULL
, without incurring the extra processing:
1 2 3 4 |
SELECT Name AS ProductName, Color FROM Production.Product WHERE ProductID BETWEEN 940 AND 949 AND (Color = 'black' OR Color IS NULL); |
More often than not, the place we’ll want to use the ISNULL
function is within our select list to replace the NULL
values with real values:
1 2 3 |
SELECT Name AS ProductName, ISNULL(Color, 'neutral') FROM Production.Product WHERE ProductID BETWEEN 940 AND 949; |
The Color
column now contains a value for each row, with no missing values to contend with:
ProductName |
Color |
HL Road Pedal |
Silver/Black |
Touring Pedal |
Silver/Black |
ML Mountain Frame-W – Silver, 38 |
Silver |
LL Mountain Frame – Black, 40 |
Black |
LL Mountain Frame – Silver, 40 |
Silver |
Front Derailleur |
Silver |
LL Touring Handlebars |
neutral |
HL Touring Handlebars |
neutral |
Front Brakes |
Silver |
LL Crankset |
Black |
Be aware that, when using ISNULL
, the database engine converts the replacement value to the data type of the original value, making its possible for data to get truncated or for the database engine to generate a conversion error. Make sure you read up on ISNULL
before adding it to your code.
Failure #5: Treating ISNULL the same as COALESCE
In the last failure, I tried to address any confusion there might be around the ISNULL
function, but there’s another area of possible confusion, and that is in understanding the differences between ISNULL
and COALESCE
. Both functions let us replace NULL
with a real value, but there are differences between the two that can be important to understand.
First off, the ISNULL
function implemented in SQL Server is specific to that brand of T-SQL, whereas COALESCE
is standard ANSI, making it more universal and, consequently, more portable, an important consideration if the possibility exists that you might one day want to point your apps to a different database system.
The COALESCE
function also lets us include more than two expressions, as shown in the following SELECT
statement:
1 2 3 4 |
SELECT Name AS ProductName, Class, Style, ProductLine, COALESCE(Class, Style, ProductLine, 'n/a') AS FirstNotNull FROM Production.Product WHERE ProductID IN(679, 706, 709, 711); |
The function evaluates the expressions in the order they’re specified and returns the first non-NULL
value. As the following results show, each row returns one of the specified values in the FirstNotNull
column:
ProductName |
Class |
Style |
ProductLine |
FirstNotNull |
Rear Derailleur Cage |
NULL |
NULL |
NULL |
n/a |
HL Road Frame – Red, 58 |
H |
U |
R |
H |
Mountain Bike Socks, M |
NULL |
U |
M |
U |
Sport-100 Helmet, Blue |
NULL |
NULL |
S |
S |
The COALESCE
function can actually be thought of as syntactical shorthand for a CASE
expression:
1 2 3 4 5 6 7 8 9 |
SELECT Name AS ProductName, Class, Style, ProductLine, CASE WHEN Class IS NOT NULL THEN Class WHEN Style IS NOT NULL THEN Style WHEN ProductLine IS NOT NULL THEN ProductLine ELSE 'n/a' END AS FirstNotNull FROM Production.Product WHERE ProductID IN(679, 706, 709, 711); |
The SELECT
statement achieves the same results as the preceding example, but its construction is more complex. The main advantage of using a CASE
expression is that it makes the logic clearer, for those who might need a little extra help.
Compared to the COALESCE
function, the ISNULL
function is much more limited:
1 2 3 4 |
SELECT Name AS ProductName, Class, Style, ISNULL(Class, Style) AS FirstNotNull FROM Production.Product WHERE ProductID IN(679, 706, 709, 711); |
In this case, all we’re saying is that if Class
is NULL
, then use Style
, giving us the following results:
ProductName |
Class |
Style |
FirstNotNull |
Rear Derailleur Cage |
NULL |
NULL |
NULL |
HL Road Frame – Red, 58 |
H |
U |
H |
Mountain Bike Socks, M |
NULL |
U |
U |
Sport-100 Helmet, Blue |
NULL |
NULL |
NULL |
We could have specified n/a
as our second argument, similar to what we did with COALESCE
, but you get the point. COALESCE
is more flexible and it conforms to ANSI standards. In addition, COALESCE
handles data types differently. To understand this difference, let’s start with a different example:
1 2 3 4 |
SELECT Name AS ProductName, Class, Style, ProductSubcategoryID FROM Production.Product WHERE ProductID = 525; |
The Class
and Style
columns are defined as NCHAR
, and the ProductSubcategoryID
column as INT
. The SELECT
statement returns the following results:
ProductName |
Class |
Style |
ProductSubcategoryID |
LL Shell |
L |
NULL |
NULL |
Now lets modify our code to include the COALESCE
function:
1 2 3 4 5 |
SELECT Name AS ProductName, Class, Style, ProductSubcategoryID, COALESCE(Class, Style, ProductSubcategoryID) AS FirstNotNull FROM Production.Product WHERE ProductID = 525; |
The database engine now generates an error when it tries to convert the Class
value to an INT
:
1 2 |
Msg 245, Level 16, State 1, Line 415 Conversion failed when converting the nvarchar value 'L ' to data type int. |
It turns out that COALESCE
uses the type with the highest precedence, based on the submitted values. Because the ProductSubcategoryID
column takes an INT
and an INT
takes precedence over NCHAR
(which is treated as NVARCHAR
), the database engine tries to convert L
to an INT
, which of course is not possible.
In such cases, we might try changing the order of the columns we pass to the COALESCE
function:
1 2 3 4 5 |
SELECT Name AS ProductName, Class, Style, ProductSubcategoryID, COALESCE(ProductSubcategoryID, Class, Style) AS FirstNotNull FROM Production.Product WHERE ProductID = 525; |
But we will again receive a conversion area. The database engine does not care about the order of the arguments, only which types take precedence over others:
The ISNULL
function works differently:
1 2 3 4 5 |
SELECT Name AS ProductName, Class, ProductSubcategoryID, ISNULL(Class, ProductSubcategoryID) AS FirstNotNull FROM Production.Product WHERE ProductID = 525; |
In this case, the Class
data type sets the standard, so the database engine has no problem returning the results:
ProductName |
Class |
ProductSubcategoryID |
FirstNotNull |
LL Shell |
L |
NULL |
L |
Now let’s reverse the order of the columns we pass to ISNULL
:
1 2 3 4 5 |
SELECT Name AS ProductName, Class, ProductSubcategoryID, ISNULL(ProductSubcategoryID, Class) AS FirstNotNull FROM Production.Product WHERE ProductID = 525; |
The database engine will once again try to convert L
to an INT
and generate a conversion error. Be sure that, whenever you’re using these functions, you are well aware of the possibility for conversion errors.
Failure #6: Forgetting about the NULLIF
function
While we’re on the topic of NULL
-related functions, it’s worth noting another one that garners less attention, but can still be useful: the NULLIF
function. We don’t seem to come across this one as often as the other functions, in part because there does not seem to be as many use cases for NULLIF
as there are the others. Basically, the function returns a NULL
value if two expressions are equal. If they are not equal, the function returns the first expression. Here’s an example:
1 2 3 4 |
SELECT Name AS ProductName, MakeFlag, FinishedGoodsFlag, NULLIF(MakeFlag, FinishedGoodsFlag) AS NullIfEqual FROM Production.Product WHERE ProductID IN(534, 679, 680, 707); |
All we’re doing here is passing the two columns into the function and returning the results, as shown in the following table:
ProductName |
MakeFlag |
FinishedGoodsFlag |
NullIfEqual |
Top Tube |
1 |
0 |
1 |
Rear Derailleur Cage |
0 |
0 |
NULL |
HL Road Frame – Black, 58 |
1 |
1 |
NULL |
Sport-100 Helmet, Red |
0 |
1 |
0 |
We can achieve the same results by using a CASE
expression instead of the NULLIF
function:
1 2 3 4 5 6 7 |
SELECT Name AS ProductName, MakeFlag, FinishedGoodsFlag, CASE WHEN MakeFlag = FinishedGoodsFlag THEN NULL ELSE MakeFlag END AS NullIfEqual FROM Production.Product WHERE ProductID IN(534, 679, 680, 707); |
As with COALESCE, the NULLIF
function helps to simplify our statements, but the CASE
expression makes the logic clearer to the rest of the world.
Perhaps one of the most useful places to use the NULLIF
function is within a mathematical expression in which an error would be generated if a zero were encountered. Dividing by zero is a good example of this. Let’s start with the following SELECT
statement:
1 2 3 4 |
SELECT Name AS ProductName, StandardCost, ListPrice, StandardCost / (ListPrice - StandardCost) AS NetDiff FROM Production.Product WHERE ProductID = 680; |
The database engine runs the calculation with no problem and returns the following results:
ProductName |
StandardCost |
ListPrice |
NetDiff |
HL Road Frame – Black, 58 |
1059.31 |
1431.50 |
2.8461 |
Now let’s pull the data from a row in which the StandardCost
and ListPrice
values are zero:
1 2 3 4 |
SELECT Name AS ProductName, StandardCost, ListPrice, StandardCost / (ListPrice - StandardCost) AS NetDiff FROM Production.Product WHERE ProductID = 679; |
This time, we get a divide-by-zero error:
1 2 |
Msg 8134, Level 16, State 1, Line 603 Divide by zero error encountered. |
This is where NULLIF
comes it. By returning a NULL
in place of a zero, we can at least prevent the database engine from generating an error:
1 2 3 4 |
SELECT Name AS ProductName, StandardCost, ListPrice, StandardCost / NULLIF((ListPrice - StandardCost), 0) AS NetDiff FROM Production.Product WHERE ProductID = 679; |
Now we at least get results we can live with.
ProductName |
StandardCost |
ListPrice |
NetDiff |
Rear Derailleur Cage |
0.00 |
0.00 |
NULL |
We could have also used the WHERE
clause to filter our rows that would have given us zero, but then we would not be able to view any of the data’s row. The NULLIF
function at least gives us data to work with and makes it easy to do so.
Failure #7: Assuming that 2 + NULL
= 2
As great as the NULLIF
function is for handling certain situations, such as avoiding divide-by-zero errors, having NULL
show up in our numeric calculations is often less than optimal. It’s not that SQL Server does anything surprising, but rather that you need to know what’s coming should you run into issues with your calculations.
This all goes back to the idea that NULL
is a non-value. It is nothing. It merely indicates that any real data value is missing. Consequently, when we try to add a value to nothing or subtract from nothing or multiply by nothing or divide by nothing or do anything by nothing, we end up with nothing, or NULL
. Take a look at this SELECT
statement:
1 2 3 4 |
SELECT Name AS ProductName, Weight, (Weight * 4.22) AS FreightCost FROM Production.Product WHERE ProductID BETWEEN 715 AND 718; |
We’re simply multiplying the Weight
value by 4.22
and getting the following results:
ProductName |
Weight |
FreightCost |
Long-Sleeve Logo Jersey, L |
NULL |
NULL |
Long-Sleeve Logo Jersey, XL |
NULL |
NULL |
HL Road Frame – Red, 62 |
2.30 |
9.7060 |
HL Road Frame – Red, 44 |
2.12 |
8.9464 |
Not surprisingly, whenever we try to multiply by NULL
, we get NULL
. What choice does the database engine have? Now let’s throw a little addition in there to see what happens:
1 2 3 4 |
SELECT Name AS ProductName, Weight, ((Weight + 3) * 4.22) AS FreightCost FROM Production.Product WHERE ProductID BETWEEN 715 AND 718; |
As before, when Weight
contains a real value, we get a real value in return, but when NULL
enters the equation, we always end up with NULL
:
ProductName |
Weight |
FreightCost |
Long-Sleeve Logo Jersey, L |
NULL |
NULL |
Long-Sleeve Logo Jersey, XL |
NULL |
NULL |
HL Road Frame – Red, 62 |
2.30 |
22.3660 |
HL Road Frame – Red, 44 |
2.12 |
21.6064 |
One approach we can take (when circumstances make it reasonable) is to define a default value that we can return in the event the source data is NULL
:
1 2 3 4 |
SELECT Name AS ProductName, Weight, ISNULL((Weight * 4.22), 4.95) AS FreightCost FROM Production.Product WHERE ProductID BETWEEN 715 AND 718; |
In this case, we’re using ISNULL
to return a default value of 4.95 should our calculation result in a NULL
value, which gives us the following results:
ProductName |
Weight |
FreightCost |
Long-Sleeve Logo Jersey, L |
NULL |
4.9500 |
Long-Sleeve Logo Jersey, XL |
NULL |
4.9500 |
HL Road Frame – Red, 62 |
2.30 |
9.7060 |
HL Road Frame – Red, 44 |
2.12 |
8.9464 |
SQL Server does what it should do when it encounters NULL
. It returns NULL
. We just need to be prepared for this eventuality. In some cases, having our expressions return a NULL
value is fine for our circumstances, but at other times, we’ll want to adjust our code to accommodate those wayward missing values. Otherwise a whole lot of customers might end up getting free shipping because the weight was initially not a consideration.
Failure #8: Aggregating data without taking NULL into account
As with numeric calculations, we must be prepared to handle aggregations that could include NULL
values. However, unlike mathematic operations, the database engine eliminates NULL
values from the calculations before performing the actual aggregations, at least at the most basic level. Let’s start with a simple SELECT
statement:
1 2 3 |
SELECT Name AS ProductName, Weight FROM Production.Product WHERE ProductID BETWEEN 715 AND 720; |
The statement returns the following results, which include two rows with a Weight
value of NULL
:
ProductName |
Weight |
Long-Sleeve Logo Jersey, L |
NULL |
Long-Sleeve Logo Jersey, XL |
NULL |
HL Road Frame – Red, 62 |
2.30 |
HL Road Frame – Red, 44 |
2.12 |
HL Road Frame – Red, 48 |
2.16 |
HL Road Frame – Red, 52 |
2.20 |
Suppose we now try to find the average weight for these rows:
1 2 3 |
SELECT AVG(Weight) FROM Production.Product WHERE ProductID BETWEEN 715 AND 720; |
The database engine will perform the aggregation with no problem by first eliminating the NULL
rows from the mix and then returning a result of 2.195000
. However, the results also include something else, a warning message about the NULL
values:
1 |
Warning: Null value is eliminated by an aggregate or other SET operation. |
We can verify that the NULL
rows are being eliminated by instead using the COUNT
aggregate function on the Weight column:
1 2 3 |
SELECT COUNT(weight) FROM Production.Product WHERE ProductID BETWEEN 715 AND 720; |
This time the database engine returns a value of 4
, indicating that the two NULL
rows have been eliminated. We will again receive the warning message.
All aggregate functions eliminate the NULL
values prior to preforming the actual calculations except for the COUNT(*)
function:
1 2 3 |
SELECT COUNT(*) FROM Production.Product WHERE ProductID BETWEEN 715 AND 720; |
In this case, the database engine includes all rows, no matter where the NULL
values lie, and will generate no warning messages.
The fact that SQL Server handles aggregations this way is not a bad thing. But it’s something you need to understand. If you’re not familiar with the data, your queries might be eliminating rows without understanding the full implications. You can consider using a function such as ISNULL
to replace the NULL
values, but you still risk skewing your results. As with many situations, handling NULL
values can take a special touch. For example, detailed analytics will require, at the very least, knowledge of the number of NULL
values in a particular data set.
As a side note, those who don’t appreciate having their queries return warning messages like those we’ve seen here, can use the SET
ANSI_WARNINGS
statement to turn warnings off when executing an aggregation:
1 2 3 4 5 |
SET ANSI_WARNINGS OFF; SELECT AVG(Weight) FROM Production.Product WHERE ProductID BETWEEN 715 AND 720; SET ANSI_WARNINGS ON; |
Although this will give us the same results without the message, turning off this option can affect other operations. For example, modifying data on indexed views or on tables with indexed computed columns will fail if the option is turned off. So tread carefully if you consider this route.
Also be aware that when you start grouping and aggregating data, you might not get exactly the results you expect when that data includes NULL
values. For example, the following SELECT
statement groups by the ProductSubcategoryID
column in our trusty Product
table:
1 2 3 4 5 6 |
SELECT ProductSubcategoryID AS ProdSubID, AVG(Weight) AS AvgWeight, COUNT(*) AS FullCount, COUNT(Weight) AS ColCount FROM Production.Product GROUP BY ProductSubcategoryID; |
The following table shows the first 10 rows in our result set:
ProdSubID |
AvgWeight |
FullCount |
ColCount |
NULL |
434.285714 |
209 |
7 |
1 |
25.058750 |
32 |
32 |
2 |
17.448139 |
43 |
43 |
3 |
27.680909 |
22 |
22 |
4 |
NULL |
8 |
0 |
5 |
187.000000 |
3 |
3 |
6 |
317.000000 |
2 |
2 |
7 |
NULL |
1 |
0 |
8 |
603.333333 |
3 |
3 |
9 |
151.500000 |
2 |
2 |
The full result set represents all 504 rows in the Product
table, including rows whose ProductSubcategoryID
and Weight
columns are NULL
. If we were to dig into the averages and the individual column counts for each group, we would discover that the NULL
values are not included, but the result set as a whole includes all the data. Again, I point this out not to suggest that there is a problem with the way SQL Server groups and aggregates data, but only to demonstrate that we need to be aware of what’s happening with our queries when NULL
values are involved. In this case, by including both types of counts in each group, we’re providing a handy way to provide insight into our source data.
Failure #9: Forgetting about NULL
in your Boolean expressions
Boolean expressions that involve NULL
values can be especially tricky because they introduce what is called three-valued logic. Typically, when an expression is evaluated, it returns either TRUE
or FALSE
. However, when NULL
enters the mix, we get a third possibility: UNKNOWN
. As we’ve already covered, SQL Server marks a value as NULL
if it does not exist. As a result, a Boolean expression that tries to evaluate a condition based on a missing value results in UNKNOWN
.
The main thing to keep in mind with Boolean expressions is that your result set will include only those rows in which the expression evaluates to TRUE
. If the expression evaluates to FALSE
or UNKNOWN
, the row is omitted.
The specifics of three-valued logic can seem unwieldy when taking account the various combinations that can result from using the AND
, OR
, and NOT
operators, but you can find a handy reference to help sort all that out be referring to the SQL Server help topic “Null Values.” In the meantime, lets look at a few examples that demonstrate what happens when NULL values enter the Boolean mix, starting with the following temporary table:
1 2 3 4 5 |
SELECT ProductID, Name AS ProductName, Weight, Class INTO #products FROM Production.Product WHERE ProductID BETWEEN 945 AND 954; SELECT * FROM #products; |
The following results show the data that’s now stored in the #products
table:
ProductID |
ProductName |
Weight |
Class |
945 |
Front Derailleur |
88.00 |
NULL |
946 |
LL Touring Handlebars |
NULL |
L |
947 |
HL Touring Handlebars |
NULL |
H |
948 |
Front Brakes |
317.00 |
NULL |
949 |
LL Crankset |
600.00 |
L |
950 |
ML Crankset |
635.00 |
M |
951 |
HL Crankset |
575.00 |
H |
952 |
Chain |
NULL |
NULL |
953 |
Touring-2000 Blue, 60 |
27.90 |
M |
954 |
Touring-1000 Yellow, 46 |
25.13 |
H |
Now let’s use the AND
operator to create a simple Boolean expression in our WHERE
clause:
1 2 |
SELECT * FROM #products WHERE Weight > 500 AND Class = 'H'; |
As expected, the statement returns the results shown in the following table:
ProductID |
ProductName |
Weight |
Class |
951 |
HL Crankset |
575.00 |
H |
Notice that the query engine excludes the row for product ID 947. Even though the Class
value is H
, the Weight
value is NULL
, so the expression evaluates to UNKNOWN. And of course, row 952 is excluded because both the Weight
and Class
values are NULL
.
Now let’s rerun the statement, but this time use the OR
operator for our Boolean expression:
1 2 |
SELECT * FROM #products WHERE Weight > 500 OR Class = 'H'; |
As expected, our results look quite different:
ProductID |
ProductName |
Weight |
Class |
947 |
HL Touring Handlebars |
NULL |
H |
949 |
LL Crankset |
600.00 |
L |
950 |
ML Crankset |
635.00 |
M |
951 |
HL Crankset |
575.00 |
H |
954 |
Touring-1000 Yellow, 46 |
25.13 |
H |
In this case, only one of the two Boolean conditions needs to evaluate to TRUE
, resulting in many more returned rows. Even row 947 is included, even through the Weight
value is NULL
. Because the OR
operator is used and the second condition evaluates to TRUE
, that row is included.
This time around, let’s use the AND
NOT
operator in our statement:
1 2 |
SELECT * FROM #products WHERE Weight > 500 AND NOT Class = 'H'; |
Once again, we get very different results:
ProductID |
ProductName |
Weight |
Class |
949 |
LL Crankset |
600.00 |
L |
950 |
ML Crankset |
635.00 |
M |
Not surprising, the only rows to be included are those in which both Boolean conditions evaluate to TRUE
. All the FALSE
and UNKNOWN
values are left behind.
As with aggregate functions, SQL Server’s treatment of Boolean expressions in itself is not a bad thing. You just need to be aware of what can happen with your data when NULL
values are involved. Data can be eliminated that you might expect to be included, leading to unintended consequences in subsequent analytics and decision-making.
Failure #10: Not accounting for NULL in your string expressions
Up to this point, we’ve touched little upon string values and what happens when NULL enters the mix, but at least some coverage in this area is warranted, beyond what we saw when working with Boolean expressions.
Suppose we start with the following SELECT
statement:
1 2 3 |
SELECT ProductID, Name AS ProductName, Color FROM Production.Product WHERE ProductID BETWEEN 835 AND 845; |
The statement returns the results shown here:
ProductID |
ProductName |
Color |
835 |
ML Road Frame-W – Yellow, 44 |
Yellow |
836 |
ML Road Frame-W – Yellow, 48 |
Yellow |
837 |
HL Road Frame – Black, 62 |
Black |
838 |
HL Road Frame – Black, 44 |
Black |
839 |
HL Road Frame – Black, 48 |
Black |
840 |
HL Road Frame – Black, 52 |
Black |
841 |
Men’s Sports Shorts, S |
Black |
842 |
Touring-Panniers, Large |
Grey |
843 |
Cable Lock |
NULL |
844 |
Minipump |
NULL |
845 |
Mountain Pump |
NULL |
Now suppose when want to filter our data set even further by returning all rows with a Color value other than black
:
1 2 3 4 |
SELECT ProductID, Name AS ProductName, Color FROM Production.Product WHERE ProductID BETWEEN 835 AND 845 AND Color <> 'black'; |
Now our SELECT statement returns the following results:
ProductID |
ProductName |
Color |
835 |
ML Road Frame-W – Yellow, 44 |
Yellow |
836 |
ML Road Frame-W – Yellow, 48 |
Yellow |
842 |
Touring-Panniers, Large |
Grey |
We might have expected the results to include those rows whose Color
value is NULL
. However, we’re once again faced with the issue of trying to evaluate an expression against what is essentially a missing value. A “not black” value cannot be equated with a “missing” value; consequently, those rows are omitted.
Concatenating string values follows similar logic:
1 2 3 4 |
SELECT Name AS ProductName, ProductNumber, Color, (ProductNumber + '_' + Color) AS NewProductNumber FROM Production.Product WHERE ProductID BETWEEN 835 AND 845; |
Whenever NULL shows up, we end up with nothing (NULL
) in our NewProductNumber
column:
ProductName |
ProductNumber |
Color |
NewProductNumber |
ML Road Frame-W – Yellow, 44 |
FR-R72Y-44 |
Yellow |
FR-R72Y-44_Yellow |
ML Road Frame-W – Yellow, 48 |
FR-R72Y-48 |
Yellow |
FR-R72Y-48_Yellow |
HL Road Frame – Black, 62 |
FR-R92B-62 |
Black |
FR-R92B-62_Black |
HL Road Frame – Black, 44 |
FR-R92B-44 |
Black |
FR-R92B-44_Black |
HL Road Frame – Black, 48 |
FR-R92B-48 |
Black |
FR-R92B-48_Black |
HL Road Frame – Black, 52 |
FR-R92B-52 |
Black |
FR-R92B-52_Black |
Men’s Sports Shorts, S |
SH-M897-S |
Black |
SH-M897-S_Black |
Touring-Panniers, Large |
PA-T100 |
Grey |
PA-T100_Grey |
Cable Lock |
LO-C100 |
NULL |
NULL |
Minipump |
PU-0452 |
NULL |
NULL |
Mountain Pump |
PU-M044 |
NULL |
NULL |
Once again, the bottom line is that you must be prepared to handle NULL
values if they could in any way impact your results. For example, you might try throwing an ISNULL
into the mix:
1 2 3 4 |
SELECT Name AS ProductName, ProductNumber, Color, (ProductNumber + '_' + ISNULL(Color, 'neutral')) AS NewProductNumber FROM Production.Product WHERE ProductID BETWEEN 835 AND 845; |
Now the statement returns the following results:
ProductName |
ProductNumber |
Color |
NewProductNumber |
ML Road Frame-W – Yellow, 44 |
FR-R72Y-44 |
Yellow |
FR-R72Y-44_Yellow |
ML Road Frame-W – Yellow, 48 |
FR-R72Y-48 |
Yellow |
FR-R72Y-48_Yellow |
HL Road Frame – Black, 62 |
FR-R92B-62 |
Black |
FR-R92B-62_Black |
HL Road Frame – Black, 44 |
FR-R92B-44 |
Black |
FR-R92B-44_Black |
HL Road Frame – Black, 48 |
FR-R92B-48 |
Black |
FR-R92B-48_Black |
HL Road Frame – Black, 52 |
FR-R92B-52 |
Black |
FR-R92B-52_Black |
Men’s Sports Shorts, S |
SH-M897-S |
Black |
SH-M897-S_Black |
Touring-Panniers, Large |
PA-T100 |
Grey |
PA-T100_Grey |
Cable Lock |
LO-C100 |
NULL |
LO-C100_neutral |
Minipump |
PU-0452 |
NULL |
PU-0452_neutral |
Mountain Pump |
PU-M044 |
NULL |
PU-M044_neutral |
At least this way we have something to work with other than NULL
.
Failure #11: Joining tables on nullable columns
Another area where we might get results other than what we expect is when we join data based on nullable columns. Because NULL
value are considered non-values and because the database engine cannot match non-values, NULL
values will show up in the join column only for outer joins. To demonstrate, let’s start by creating and populating a couple tables:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
CREATE TABLE Products (ProductID INT, ProductName NVARCHAR(50), CategoryAltID INT); INSERT INTO Products VALUES (945, 'Front Derailleur', 10), (946, 'LL Touring Handlebars', 20), (947, 'HL Touring Handlebars', NULL), (948, 'Front Brakes', 20), (949, 'LL Crankset', 10), (950, 'ML Crankset', NULL); CREATE TABLE Categories (CategoryNumber NVARCHAR(25), CategoryAltID INT); INSERT INTO Categories VALUES ('FR-R72Y-44', 10), ('FR-R72Y-48', 20), ('FR-R92B-62', NULL), ('FR-R92B-52', NULL); |
Notice that the CategoryAltID
column in both tables contains NULL
values. Now let’s create an inner join between those tables, based on that column:
1 2 3 4 |
SELECT p.ProductID, p.ProductName, p.CategoryAltID AS pCatAltID, c.CategoryAltID AS cCatAltID, c.CategoryNumber FROM Products p JOIN Categories c ON p.CategoryAltID = c.CategoryAltID; |
Although the CategoryAltID
column contains NULL
values in both tables, they are not considered equal (for all the reasons we’ve discussed), which means they’re not included in the result set:
ProductID |
ProductName |
pCatAltID |
cCatAltID |
CategoryNumber |
945 |
Front Derailleur |
10 |
10 |
FR-R72Y-44 |
946 |
LL Touring Handlebars |
20 |
20 |
FR-R72Y-48 |
948 |
Front Brakes |
20 |
20 |
FR-R72Y-48 |
949 |
LL Crankset |
10 |
10 |
FR-R72Y-44 |
Now let’s revise the statement to include a left outer join:
1 2 3 4 |
SELECT p.ProductID, p.ProductName, p.CategoryAltID AS pCatAltID, c.CategoryAltID AS cCatAltID, c.CategoryNumber FROM Products p LEFT OUTER JOIN Categories c ON p.CategoryAltID = c.CategoryAltID; |
This time our results include all rows from the Product
table, including those whose CategoryAltID
column is NULL
in both tables:
ProductID |
ProductName |
pCatAltID |
cCatAltID |
CategoryNumber |
945 |
Front Derailleur |
10 |
10 |
FR-R72Y-44 |
946 |
LL Touring Handlebars |
20 |
20 |
FR-R72Y-48 |
947 |
HL Touring Handlebars |
NULL |
NULL |
NULL |
948 |
Front Brakes |
20 |
20 |
FR-R72Y-48 |
949 |
LL Crankset |
10 |
10 |
FR-R72Y-44 |
950 |
ML Crankset |
NULL |
NULL |
NULL |
The query engine includes NULL
rows not because they’re considered matches, but rather because that’s the nature of an outer join and the way it returns all rows from the “outer” table. We can also try our statement with a full outer join:
1 2 3 4 |
SELECT p.ProductID, p.ProductName, p.CategoryAltID AS pCatAltID, c.CategoryAltID AS cCatAltID, c.CategoryNumber FROM Products p FULL OUTER JOIN Categories c ON p.CategoryAltID = c.CategoryAltID; |
Now our results include everything, regardless of the NULL
values:
ProductID |
ProductName |
pCatAltID |
cCatAltID |
CategoryNumber |
945 |
Front Derailleur |
10 |
10 |
FR-R72Y-44 |
946 |
LL Touring Handlebars |
20 |
20 |
FR-R72Y-48 |
947 |
HL Touring Handlebars |
NULL |
NULL |
NULL |
948 |
Front Brakes |
20 |
20 |
FR-R72Y-48 |
949 |
LL Crankset |
10 |
10 |
FR-R72Y-44 |
950 |
ML Crankset |
NULL |
NULL |
NULL |
NULL |
NULL |
NULL |
NULL |
FR-R92B-62 |
NULL |
NULL |
NULL |
NULL |
FR-R92B-52 |
If you consider the logic behind the query engine’s approach to NULL
values, it’s not surprising we get the results we do. Like any place where NULL shows up, it comes down to understanding the data we could potentially be dealing with and what that means to the bigger picture. Whenever NULL
enters into the equation, we must take it into consideration so we don’t end up with query results very different from what we expect. When we join data, a NULL
in one column can cause us to miss important data in other columns, all because we did not think through the possible outcomes.
Failure #12: Creating a unique index without taking NULL into account
Up to this point, I’ve tried to emphasize how SQL Server uses NULL
only to indicate that a value is missing. We cannot compare or add or multiply NULL
values because we cannot compare or add or multiply missing values. When it comes to unique indexes, however, SQL Server seems to have a double standard, treating NULL
values as equal values and consequently permitting only one NULL
value in a unique index.
To demonstrate, let’s create and populate a simple table:
1 2 3 4 5 6 7 8 9 |
CREATE TABLE Products (ProductID INT, ProductName NVARCHAR(50), ProductNumber NVARCHAR(25)); INSERT INTO Products VALUES (945, 'Front Derailleur', 'FD-2342'), (946, 'LL Touring Handlebars', 'HB-T721'), (947, 'HL Touring Handlebars', 'HB-T928'), (948, 'Front Brakes', 'FB-9873'), (949, 'LL Crankset', NULL), (950, 'ML Crankset', NULL); |
Now let’s try to create a unique index on the ProductNumber
column:
1 2 |
CREATE UNIQUE INDEX ux_products_productnumber ON Products(ProductNumber); |
Because the column contains multiple NULL
values, the database engine returns the following error:
1 2 3 |
Msg 1505, Level 16, State 1, Line 982 The CREATE UNIQUE INDEX statement terminated because a duplicate key was found for the object name 'dbo.Products' and the index name 'ux_products_productnumber'. The duplicate key value is (<NULL>). The statement has been terminated. |
This might seem contradictory behavior on the part of SQL Server, given the non-value status it imposes on NULL
everywhere else. After all, how can missing values be considered duplicates if there is no way to compare them?
Because of this limitation, we have to be particularly careful when we create our tables and indexes prior to populating them. We could be faced with similar error messages if we try to insert a row with a duplicate NULL
value.
Fortunately, since SQL Server 2008, we’ve been able to create filtered indexes that let us sidestep the duplicate NULL
issue:
1 2 3 |
CREATE UNIQUE INDEX ux_products_productnumber ON Products(ProductNumber) WHERE ProductNumber IS NOT NULL; |
The index is now created without a hitch. We can even insert additional duplicate NULL
values:
1 2 |
INSERT INTO Products VALUES (951, 'HL Crankset', NULL); |
Creating a filtered index is easy enough to do and is a handy way to sidestep the limitations of a unique index. Just be sure to refer to the SQL Server help topic “Create Filtered Indexes” to make certain you’re familiar with the limitations and restrictions on these types of indexes.
Failure #13: Sorting data on a nullable column
A result set that is not in the order you want might not seem such a big deal, but it can still be annoying when you sort your results and end up with a bunch of NULL values at the beginning. For example, the following SELECT
statement sorts the data by the Color column:
1 2 3 4 |
SELECT ProductID, Name AS ProductName, Color FROM Production.Product WHERE ProductID BETWEEN 835 AND 845 ORDER BY Color; |
The statement gives us the following results:
ProductID |
ProductName |
Color |
843 |
Cable Lock |
NULL |
844 |
Minipump |
NULL |
845 |
Mountain Pump |
NULL |
837 |
HL Road Frame – Black, 62 |
Black |
838 |
HL Road Frame – Black, 44 |
Black |
839 |
HL Road Frame – Black, 48 |
Black |
840 |
HL Road Frame – Black, 52 |
Black |
841 |
Men’s Sports Shorts, S |
Black |
842 |
Touring-Panniers, Large |
Grey |
835 |
ML Road Frame-W – Yellow, 44 |
Yellow |
836 |
ML Road Frame-W – Yellow, 48 |
Yellow |
We can instead sort the data in descending order, but that would put the colors themselves in reverse order. Another work-around is to slip a CASE
expression into our WHERE
clause to trick the database engine:
1 2 3 4 5 6 |
SELECT ProductID, Name AS ProductName, Color FROM Production.Product WHERE ProductID BETWEEN 835 AND 845 ORDER BY CASE WHEN Color IS NULL THEN 1 ELSE 0 END, Color; |
The extra expression assigns 1
to the NULL
rows and 0
to the rest so they’re sorted first. From there, the normal sorting occurs, as shown in the following results:
ProductID |
ProductName |
Color |
837 |
HL Road Frame – Black, 62 |
Black |
838 |
HL Road Frame – Black, 44 |
Black |
839 |
HL Road Frame – Black, 48 |
Black |
840 |
HL Road Frame – Black, 52 |
Black |
841 |
Men’s Sports Shorts, S |
Black |
842 |
Touring-Panniers, Large |
Grey |
835 |
ML Road Frame-W – Yellow, 44 |
Yellow |
836 |
ML Road Frame-W – Yellow, 48 |
Yellow |
843 |
Cable Lock |
NULL |
844 |
Minipump |
NULL |
845 |
Mountain Pump |
NULL |
This might not be an earth-shattering solution, but it’s an easy enough way to move those NULL
values to the bottom, if you happen to prefer them there.
Load comments