- Failure #1: Not knowing what NULL means
- Failure #2: Treating NULL like a real value in comparisons
- Failure #3: Not knowing how ANSI_NULL works
- Failure #4: Treating ISNULL the same as IS NULL
- Failure #5: Treating ISNULL the same as COALESCE
- Failure #6: Forgetting about the NULLIF function
- Failure #7: Assuming that 2 + NULL = 2
- Failure #8: Aggregating data without taking NULL into account
- Failure #9: Forgetting about NULL in your Boolean expressions
- Failure #10: Not accounting for NULL in your string expressions
- Failure #11: Joining tables on nullable columns
- Failure #12: Creating a unique index without taking NULL into account
- Failure #13: Sorting data on a nullable column
Failure #1: Not knowing what NULL means
When referring to NULL within our data sets, we often use phrases such as “the NULL value” or “a value of NULL.” I do it all the time. I see it done all the time. Such phrases are so common that we think little of their use. But inherent in the phrase is the notion of NULL as an actual value. It is not. NULL is a non-value, a nonexistent value. It is not zero. It is not an empty string. A value cannot equal NULL. No two NULL values are equal.
A NULL value is often defined as one that is unknown or not applicable, but even these definitions can be open to debate. For example, a record might not include a customer’s birthdate because the salesperson didn’t ask or because the customer would not provide it, but the customer still knows that date, so it is hardly unknown, nor is it any less applicable. There can be many reasons for missing data, and there has been much debate about a precise meaning of NULL. If you need to attach a specific meaning to NULL, then missing or absent data is probably your safest bet.
Perhaps a better way to think of NULL is as a setting or marker that indicates if a data value does not exist. That certainly seems to be how SQL Server treats NULL. The database engine uses a special bitmap to track which columns in a row are NULL and which are not. The bitmap contains a bit for each column, with the bit set to 1 if the column is NULL, that is, if the value is missing.
The confusion for some, particularly beginning T-SQL developers, is the way SQL Server Management Studio indicates that no value exists for a specific column: by displaying what appears to be a value that reads NULL. For example, the following SELECT statement returns data from the Product table in the AdventureWorks2014 sample database:
|
1 2 3 4 |
SELECT Name AS ProductName, ProductLine, Class, Style, SellEndDate FROM Production.Product WHERE ProductID BETWEEN 680 AND 715; |
As the following results show, every row includes at least one column for which no value exists:
|
ProductName |
ProductLine |
Class |
Style |
SellEndDate |
|
HL Road Frame – Black, 58 |
R |
H |
U |
NULL |
|
HL Road Frame – Red, 58 |
R |
H |
U |
NULL |
|
Sport-100 Helmet, Red |
S |
NULL |
NULL |
NULL |
|
Sport-100 Helmet, Black |
S |
NULL |
NULL |
NULL |
|
Mountain Bike Socks, M |
M |
NULL |
U |
2012-05-29 00:00:00.000 |
|
Mountain Bike Socks, L |
M |
NULL |
U |
2012-05-29 00:00:00.000 |
|
Sport-100 Helmet, Blue |
S |
NULL |
NULL |
NULL |
|
AWC Logo Cap |
S |
NULL |
U |
NULL |
|
Long-Sleeve Logo Jersey, S |
S |
NULL |
U |
NULL |
|
Long-Sleeve Logo Jersey, M |
S |
NULL |
U |
NULL |
|
Long-Sleeve Logo Jersey, L |
S |
NULL |
U |
NULL |
If we look at the columns with the NULL values, we can see how difficult it is to determine why the values are missing. For example, is a SellEndDate value missing because the product is still live or because the application failed to properly record the end date? Are the Class and Style values missing because they don’t apply to those particular rows or because someone was careless in entering data?
An even bigger problem with viewing the results in this way is that it suggests that the database is actually storing multiple instances of this mysterious placeholder called NULL and that all those NULL values are the same. They are not the same. There is nothing there. Nothing cannot be the same as nothing. Nothing cannot be compared to nothing. You can think of these as the fundamental principles behind NULL. The principles of nothingness. And not understanding these principles can lead to all sorts of problems when working with SQL Server, resulting in many of the failures we cover here.
Failure #2: Treating NULL like a real value in comparisons
Developers who don’t understand how NULL works will sometimes use comparison operators to compare an expression to a NULL value, resulting in a statement that looks similar to the following:
|
1 2 3 |
SELECT Title, FirstName, MiddleName, LastName FROM Person.Person WHERE Title = NULL AND BusinessEntityID < 7; |
The assumption here is that NULL is a valid value and therefore can be compared to other NULL values. That means, in this case, the statement should return each row whose Title value is NULL. A quick check into the source data will show four rows that match the statement’s assumed logic. However, because the database engine cannot compare nothing to nothing, it will not return these rows and will instead provide us with an empty result set.
If we assume from the empty result set that no rows in the source data match our criteria, we might be satisfied with the statement as is and send the code onto production. (Let’s hope no one would really do that.) We might also choose to continue along this logical trajectory and recast the statement to return the non-NULL data:
|
1 2 3 |
SELECT Title, FirstName, MiddleName, LastName FROM Person.Person WHERE Title <> NULL AND BusinessEntityID < 7; |
The statement is no longer trying to compare NULL with NULL, but rather compare real non-NULL values to NULL, which would seem an equally credible comparison, if not more so. It is not. We cannot compare real values to nothing any more than we can compare nothing to nothing. As a result, the statement still returns no data, even though we know for certain the data exists.
But before we go further, note that there are cases when comparing NULL values in this way can work, which we cover in the next failure. For now, know that the proper way to perform NULL comparisons is to use the IS NULL and IS NOT NULL operators. For example, the following SELECT statement uses the IS NULL operator to return rows whose Title value is NULL:
|
1 2 3 |
SELECT Title, FirstName, MiddleName, LastName FROM Person.Person WHERE Title IS NULL AND BusinessEntityID < 7; |
This time, the statement returns the data we expect, with the Title value missing, along with a couple of MiddleName values:
|
Title |
FirstName |
MiddleName |
LastName |
|
NULL |
Ken |
J |
Sánchez |
|
NULL |
Terri |
Lee |
Duffy |
|
NULL |
Roberto |
NULL |
Tamburello |
|
NULL |
Rob |
NULL |
Walters |
We can also return those rows that contain an actual Title value by instead using the IS NOT NULL operator:
|
1 2 3 |
SELECT Title, FirstName, MiddleName, LastName FROM Person.Person WHERE Title IS NOT NULL AND BusinessEntityID < 7; |
Now the statement returns only the following two rows:
|
Title |
FirstName |
MiddleName |
LastName |
|
Ms. |
Gail |
A |
Erickson |
|
Mr. |
Jossef |
H |
Goldberg |
The IS NULL and IS NOT NULL operators provide the logic we need to handle the ambiguous nature of NULL values. The purpose of these operators is to determine whether or not an expression is NULL and from that return either TRUE or FALSE. Comparison operators, when faced with a NULL value, instead want to return UNKNOWN, which can result in unpredictable and costly mistakes if implemented in production.
Failure #3: Not knowing how ANSI_NULL works
In the last failure, I demonstrated why you should not use comparison operators against NULL values. Even if there are results to be had, a statement such as the following will likely not return them:
|
1 2 3 |
SELECT Title, FirstName, MiddleName, LastName FROM Person.Person WHERE Title = NULL AND BusinessEntityID < 7; |
The word likely is key here. It turns out that the behavior I described is the default behavior, but not the only behavior. SQL Server supports a feature called ANSI_NULLS, which determines how the database engine handles NULL comparisons. The option is a carry-over from the ghost of SQL Server Past and has been deprecated since 2005. But the setting still exists, it still works, and it can still cause all sorts of trouble.
By default, SQL Server installs with the ANSI_NULLS option turned on, which gives us the behavior I’ve described so far. If we compare values to NULL, the database engine does not return those rows, even if NULL exists.
We can verify the current ANSI_NULLS setting by running a statement similar to the following:
|
1 2 3 4 |
IF ( (32 & @@OPTIONS) = 32 ) PRINT 'The ANSI_NULLS option turned on.' ELSE PRINT 'The ANSI_NULLS option turned off.'; |
If no one has messed with ANSI_NULLS, the database engine will return the first option, which will give us the results we’ve seen so far. But suppose we turn the option off:
|
1 2 3 4 |
SET ANSI_NULLS OFF; SELECT Title, FirstName, MiddleName, LastName FROM Person.Person WHERE Title = NULL AND BusinessEntityID < 7; |
Now our statement returns the expected four rows, despite the fact that we’re using a comparison operator against NULL:
|
Title |
FirstName |
MiddleName |
LastName |
|
NULL |
Ken |
J |
Sánchez |
|
NULL |
Terri |
Lee |
Duffy |
|
NULL |
Roberto |
NULL |
Tamburello |
|
NULL |
Rob |
NULL |
Walters |
With the option turned off, we can also use the not equal operator:
|
1 2 3 4 |
SET ANSI_NULLS OFF; SELECT Title, FirstName, MiddleName, LastName FROM Person.Person WHERE Title <> NULL AND BusinessEntityID < 7; |
Now the results look like the following:
|
Title |
FirstName |
MiddleName |
LastName |
|
Ms. |
Gail |
A |
Erickson |
|
Mr. |
Jossef |
H |
Goldberg |
If we turn the option back on (it’s default setting), the database engine will again return no rows:
|
1 2 3 4 |
SET ANSI_NULLS ON; SELECT Title, FirstName, MiddleName, LastName FROM Person.Person WHERE Title = NULL AND BusinessEntityID < 7; |
Because the ANSI_NULLS option has been deprecated, you should not use it and you should update any code that does, assuming you have the access and time. If you don’t, you could be in store for bigger problems.
Imagine an application that connects to two different SQL Server instances, one with the option turned on (the default) and one with it turned off. If the code is using comparison operators with NULL values, you could end up with inaccurate data or conflicting results. Even if you’re dealing with only one SQL Server instance, you still risk unexpected results, especially if statements are turning the ANSI_NULLS option on at the session level.
Failure #4: Treating ISNULL the same as IS NULL
The ISNULL function tends to cause a bit of confusion when first encountered, in part because of its name, which implies a meaning similar to the IS NULL operator. Another reason for confusion arises from the fact that products such as Microsoft Access and MySQL also include the ISNULL function, where it behaves much differently, more like the name suggests. But we’re talking SQL Server and need to understand how ISNULL is implemented here. A lack of understanding can lead to misuse and failure to take advantage of its real purpose in SQL Server.
Let’s looks at a few examples to demonstrate, starting with a simple query that returns products whose Color value is NULL:
|
1 2 3 4 |
SELECT Name AS ProductName, Color FROM Production.Product WHERE ProductID BETWEEN 940 AND 949 AND Color IS NULL; |
|
ProductName |
Color |
|
LL Touring Handlebars |
NULL |
|
HL Touring Handlebars |
NULL |
If we’re not familiar with ISNULL in SQL Server or are transitioning in from Access, we might attempt to retrieve the same results by using ISNULL in our WHERE clause:
|
1 2 3 4 |
SELECT Name AS ProductName, Color FROM Production.Product WHERE ProductID BETWEEN 940 AND 949 AND ISNULL(Color); |
Fortunately, this construction will generate the following error, which we’ll likely catch immediately, or at least during the Q&A phase:
|
1 2 |
Msg 174, Level 15, State 1, Line 305 The isnull function requires 2 argument(s). |
Those of us with a background in MySQL might instead take a different approach with the ISNULL function by comparing it to 1 or 0, the function’s two possible return values:
|
1 2 3 4 |
SELECT Name AS ProductName, Color FROM Production.Product WHERE ProductID BETWEEN 940 AND 949 AND ISNULL(Color) = 1); |
Once again, we receive an error message:
|
1 2 |
Msg 174, Level 15, State 1, Line 322 The isnull function requires 2 argument(s). near ';'. |
By now, we should have checked the documentation to verify what this function is supposed to do and have come to the conclusion that, based on the syntax and the fairly explicit error messages, we should be adding a second argument, which we do:
|
1 2 3 4 |
SELECT Name AS ProductName, Color FROM Production.Product WHERE ProductID BETWEEN 940 AND 949 AND ISNULL(Color, 'neutral'); |
Unfortunately, all we’ve done is make our predicate completely illogical, resulting in yet another (but different) error message:
|
1 2 |
Msg 4145, Level 15, State 1, Line 339 An expression of non-boolean type specified in a context where a condition is expected, near ';'. |
The point of that this is that, in SQL Server, the purpose of the ISNULL function is simply to replace an expression’s value with a real value, if the original value is NULL (nonexistent). If we wanted to make the preceding example right, we would need to define our Boolean condition correctly:
|
1 2 3 4 |
SELECT Name AS ProductName, Color FROM Production.Product WHERE ProductID BETWEEN 940 AND 949 AND ISNULL(Color, 'neutral') IN ('black', 'neutral'); |
Now the SELECT statement returns the results we want, rather than an error:
|
ProductName |
Color |
|
LL Mountain Frame – Black, 40 |
Black |
|
LL Touring Handlebars |
NULL |
|
HL Touring Handlebars |
NULL |
|
LL Crankset |
Black |
That said, we can achieve the same thing by using IS NULL, without incurring the extra processing:
|
1 2 3 4 |
SELECT Name AS ProductName, Color FROM Production.Product WHERE ProductID BETWEEN 940 AND 949 AND (Color = 'black' OR Color IS NULL); |
More often than not, the place we’ll want to use the ISNULL function is within our select list to replace the NULL values with real values:
|
1 2 3 |
SELECT Name AS ProductName, ISNULL(Color, 'neutral') FROM Production.Product WHERE ProductID BETWEEN 940 AND 949; |
The Color column now contains a value for each row, with no missing values to contend with:
|
ProductName |
Color |
|
HL Road Pedal |
Silver/Black |
|
Touring Pedal |
Silver/Black |
|
ML Mountain Frame-W – Silver, 38 |
Silver |
|
LL Mountain Frame – Black, 40 |
Black |
|
LL Mountain Frame – Silver, 40 |
Silver |
|
Front Derailleur |
Silver |
|
LL Touring Handlebars |
neutral |
|
HL Touring Handlebars |
neutral |
|
Front Brakes |
Silver |
|
LL Crankset |
Black |
Be aware that, when using ISNULL, the database engine converts the replacement value to the data type of the original value, making its possible for data to get truncated or for the database engine to generate a conversion error. Make sure you read up on ISNULL before adding it to your code.
Failure #5: Treating ISNULL the same as COALESCE
In the last failure, I tried to address any confusion there might be around the ISNULL function, but there’s another area of possible confusion, and that is in understanding the differences between ISNULL and COALESCE. Both functions let us replace NULL with a real value, but there are differences between the two that can be important to understand.
First off, the ISNULL function implemented in SQL Server is specific to that brand of T-SQL, whereas COALESCE is standard ANSI, making it more universal and, consequently, more portable, an important consideration if the possibility exists that you might one day want to point your apps to a different database system.
The COALESCE function also lets us include more than two expressions, as shown in the following SELECT statement:
|
1 2 3 4 |
SELECT Name AS ProductName, Class, Style, ProductLine, COALESCE(Class, Style, ProductLine, 'n/a') AS FirstNotNull FROM Production.Product WHERE ProductID IN(679, 706, 709, 711); |
The function evaluates the expressions in the order they’re specified and returns the first non-NULL value. As the following results show, each row returns one of the specified values in the FirstNotNull column:
|
ProductName |
Class |
Style |
ProductLine |
FirstNotNull |
|
Rear Derailleur Cage |
NULL |
NULL |
NULL |
n/a |
|
HL Road Frame – Red, 58 |
H |
U |
R |
H |
|
Mountain Bike Socks, M |
NULL |
U |
M |
U |
|
Sport-100 Helmet, Blue |
NULL |
NULL |
S |
S |
The COALESCE function can actually be thought of as syntactical shorthand for a CASE expression:
|
1 2 3 4 5 6 7 8 9 |
SELECT Name AS ProductName, Class, Style, ProductLine, CASE WHEN Class IS NOT NULL THEN Class WHEN Style IS NOT NULL THEN Style WHEN ProductLine IS NOT NULL THEN ProductLine ELSE 'n/a' END AS FirstNotNull FROM Production.Product WHERE ProductID IN(679, 706, 709, 711); |
The SELECT statement achieves the same results as the preceding example, but its construction is more complex. The main advantage of using a CASE expression is that it makes the logic clearer, for those who might need a little extra help.
Compared to the COALESCE function, the ISNULL function is much more limited:
|
1 2 3 4 |
SELECT Name AS ProductName, Class, Style, ISNULL(Class, Style) AS FirstNotNull FROM Production.Product WHERE ProductID IN(679, 706, 709, 711); |
In this case, all we’re saying is that if Class is NULL, then use Style, giving us the following results:
|
ProductName |
Class |
Style |
FirstNotNull |
|
Rear Derailleur Cage |
NULL |
NULL |
NULL |
|
HL Road Frame – Red, 58 |
H |
U |
H |
|
Mountain Bike Socks, M |
NULL |
U |
U |
|
Sport-100 Helmet, Blue |
NULL |
NULL |
NULL |
We could have specified n/a as our second argument, similar to what we did with COALESCE, but you get the point. COALESCE is more flexible and it conforms to ANSI standards. In addition, COALESCE handles data types differently. To understand this difference, let’s start with a different example:
|
1 2 3 4 |
SELECT Name AS ProductName, Class, Style, ProductSubcategoryID FROM Production.Product WHERE ProductID = 525; |
The Class and Style columns are defined as NCHAR, and the ProductSubcategoryID column as INT. The SELECT statement returns the following results:
|
ProductName |
Class |
Style |
ProductSubcategoryID |
|
LL Shell |
L |
NULL |
NULL |
Now lets modify our code to include the COALESCE function:
|
1 2 3 4 5 |
SELECT Name AS ProductName, Class, Style, ProductSubcategoryID, COALESCE(Class, Style, ProductSubcategoryID) AS FirstNotNull FROM Production.Product WHERE ProductID = 525; |
The database engine now generates an error when it tries to convert the Class value to an INT:
|
1 2 |
Msg 245, Level 16, State 1, Line 415 Conversion failed when converting the nvarchar value 'L ' to data type int. |
It turns out that COALESCE uses the type with the highest precedence, based on the submitted values. Because the ProductSubcategoryID column takes an INT and an INT takes precedence over NCHAR (which is treated as NVARCHAR), the database engine tries to convert L to an INT, which of course is not possible.
In such cases, we might try changing the order of the columns we pass to the COALESCE function:
|
1 2 3 4 5 |
SELECT Name AS ProductName, Class, Style, ProductSubcategoryID, COALESCE(ProductSubcategoryID, Class, Style) AS FirstNotNull FROM Production.Product WHERE ProductID = 525; |
But we will again receive a conversion area. The database engine does not care about the order of the arguments, only which types take precedence over others:
The ISNULL function works differently:
|
1 2 3 4 5 |
SELECT Name AS ProductName, Class, ProductSubcategoryID, ISNULL(Class, ProductSubcategoryID) AS FirstNotNull FROM Production.Product WHERE ProductID = 525; |
In this case, the Class data type sets the standard, so the database engine has no problem returning the results:
|
ProductName |
Class |
ProductSubcategoryID |
FirstNotNull |
|
LL Shell |
L |
NULL |
L |
Now let’s reverse the order of the columns we pass to ISNULL:
|
1 2 3 4 5 |
SELECT Name AS ProductName, Class, ProductSubcategoryID, ISNULL(ProductSubcategoryID, Class) AS FirstNotNull FROM Production.Product WHERE ProductID = 525; |
The database engine will once again try to convert L to an INT and generate a conversion error. Be sure that, whenever you’re using these functions, you are well aware of the possibility for conversion errors.
Failure #6: Forgetting about the NULLIF function
While we’re on the topic of NULL-related functions, it’s worth noting another one that garners less attention, but can still be useful: the NULLIF function. We don’t seem to come across this one as often as the other functions, in part because there does not seem to be as many use cases for NULLIF as there are the others. Basically, the function returns a NULL value if two expressions are equal. If they are not equal, the function returns the first expression. Here’s an example:
|
1 2 3 4 |
SELECT Name AS ProductName, MakeFlag, FinishedGoodsFlag, NULLIF(MakeFlag, FinishedGoodsFlag) AS NullIfEqual FROM Production.Product WHERE ProductID IN(534, 679, 680, 707); |
All we’re doing here is passing the two columns into the function and returning the results, as shown in the following table:
|
ProductName |
MakeFlag |
FinishedGoodsFlag |
NullIfEqual |
|
Top Tube |
1 |
0 |
1 |
|
Rear Derailleur Cage |
0 |
0 |
NULL |
|
HL Road Frame – Black, 58 |
1 |
1 |
NULL |
|
Sport-100 Helmet, Red |
0 |
1 |
0 |
We can achieve the same results by using a CASE expression instead of the NULLIF function:
|
1 2 3 4 5 6 7 |
SELECT Name AS ProductName, MakeFlag, FinishedGoodsFlag, CASE WHEN MakeFlag = FinishedGoodsFlag THEN NULL ELSE MakeFlag END AS NullIfEqual FROM Production.Product WHERE ProductID IN(534, 679, 680, 707); |
As with COALESCE, the NULLIF function helps to simplify our statements, but the CASE expression makes the logic clearer to the rest of the world.
Perhaps one of the most useful places to use the NULLIF function is within a mathematical expression in which an error would be generated if a zero were encountered. Dividing by zero is a good example of this. Let’s start with the following SELECT statement:
|
1 2 3 4 |
SELECT Name AS ProductName, StandardCost, ListPrice, StandardCost / (ListPrice - StandardCost) AS NetDiff FROM Production.Product WHERE ProductID = 680; |
The database engine runs the calculation with no problem and returns the following results:
|
ProductName |
StandardCost |
ListPrice |
NetDiff |
|
HL Road Frame – Black, 58 |
1059.31 |
1431.50 |
2.8461 |
Now let’s pull the data from a row in which the StandardCost and ListPrice values are zero:
|
1 2 3 4 |
SELECT Name AS ProductName, StandardCost, ListPrice, StandardCost / (ListPrice - StandardCost) AS NetDiff FROM Production.Product WHERE ProductID = 679; |
This time, we get a divide-by-zero error:
|
1 2 |
Msg 8134, Level 16, State 1, Line 603 Divide by zero error encountered. |
This is where NULLIF comes it. By returning a NULL in place of a zero, we can at least prevent the database engine from generating an error:
|
1 2 3 4 |
SELECT Name AS ProductName, StandardCost, ListPrice, StandardCost / NULLIF((ListPrice - StandardCost), 0) AS NetDiff FROM Production.Product WHERE ProductID = 679; |
Now we at least get results we can live with.
|
ProductName |
StandardCost |
ListPrice |
NetDiff |
|
Rear Derailleur Cage |
0.00 |
0.00 |
NULL |
We could have also used the WHERE clause to filter our rows that would have given us zero, but then we would not be able to view any of the data’s row. The NULLIF function at least gives us data to work with and makes it easy to do so.
Failure #7: Assuming that 2 + NULL = 2
As great as the NULLIF function is for handling certain situations, such as avoiding divide-by-zero errors, having NULL show up in our numeric calculations is often less than optimal. It’s not that SQL Server does anything surprising, but rather that you need to know what’s coming should you run into issues with your calculations.
This all goes back to the idea that NULL is a non-value. It is nothing. It merely indicates that any real data value is missing. Consequently, when we try to add a value to nothing or subtract from nothing or multiply by nothing or divide by nothing or do anything by nothing, we end up with nothing, or NULL. Take a look at this SELECT statement:
|
1 2 3 4 |
SELECT Name AS ProductName, Weight, (Weight * 4.22) AS FreightCost FROM Production.Product WHERE ProductID BETWEEN 715 AND 718; |
We’re simply multiplying the Weight value by 4.22 and getting the following results:
|
ProductName |
Weight |
FreightCost |
|
Long-Sleeve Logo Jersey, L |
NULL |
NULL |
|
Long-Sleeve Logo Jersey, XL |
NULL |
NULL |
|
HL Road Frame – Red, 62 |
2.30 |
9.7060 |
|
HL Road Frame – Red, 44 |
2.12 |
8.9464 |
Not surprisingly, whenever we try to multiply by NULL, we get NULL. What choice does the database engine have? Now let’s throw a little addition in there to see what happens:
|
1 2 3 4 |
SELECT Name AS ProductName, Weight, ((Weight + 3) * 4.22) AS FreightCost FROM Production.Product WHERE ProductID BETWEEN 715 AND 718; |
As before, when Weight contains a real value, we get a real value in return, but when NULL enters the equation, we always end up with NULL:
|
ProductName |
Weight |
FreightCost |
|
Long-Sleeve Logo Jersey, L |
NULL |
NULL |
|
Long-Sleeve Logo Jersey, XL |
NULL |
NULL |
|
HL Road Frame – Red, 62 |
2.30 |
22.3660 |
|
HL Road Frame – Red, 44 |
2.12 |
21.6064 |
One approach we can take (when circumstances make it reasonable) is to define a default value that we can return in the event the source data is NULL:
|
1 2 3 4 |
SELECT Name AS ProductName, Weight, ISNULL((Weight * 4.22), 4.95) AS FreightCost FROM Production.Product WHERE ProductID BETWEEN 715 AND 718; |
In this case, we’re using ISNULL to return a default value of 4.95 should our calculation result in a NULL value, which gives us the following results:
|
ProductName |
Weight |
FreightCost |
|
Long-Sleeve Logo Jersey, L |
NULL |
4.9500 |
|
Long-Sleeve Logo Jersey, XL |
NULL |
4.9500 |
|
HL Road Frame – Red, 62 |
2.30 |
9.7060 |
|
HL Road Frame – Red, 44 |
2.12 |
8.9464 |
SQL Server does what it should do when it encounters NULL. It returns NULL. We just need to be prepared for this eventuality. In some cases, having our expressions return a NULL value is fine for our circumstances, but at other times, we’ll want to adjust our code to accommodate those wayward missing values. Otherwise a whole lot of customers might end up getting free shipping because the weight was initially not a consideration.
Failure #8: Aggregating data without taking NULL into account
As with numeric calculations, we must be prepared to handle aggregations that could include NULL values. However, unlike mathematic operations, the database engine eliminates NULL values from the calculations before performing the actual aggregations, at least at the most basic level. Let’s start with a simple SELECT statement:
|
1 2 3 |
SELECT Name AS ProductName, Weight FROM Production.Product WHERE ProductID BETWEEN 715 AND 720; |
The statement returns the following results, which include two rows with a Weight value of NULL:
|
ProductName |
Weight |
|
Long-Sleeve Logo Jersey, L |
NULL |
|
Long-Sleeve Logo Jersey, XL |
NULL |
|
HL Road Frame – Red, 62 |
2.30 |
|
HL Road Frame – Red, 44 |
2.12 |
|
HL Road Frame – Red, 48 |
2.16 |
|
HL Road Frame – Red, 52 |
2.20 |
Suppose we now try to find the average weight for these rows:
|
1 2 3 |
SELECT AVG(Weight) FROM Production.Product WHERE ProductID BETWEEN 715 AND 720; |
The database engine will perform the aggregation with no problem by first eliminating the NULL rows from the mix and then returning a result of 2.195000. However, the results also include something else, a warning message about the NULL values:
|
1 |
Warning: Null value is eliminated by an aggregate or other SET operation. |
We can verify that the NULL rows are being eliminated by instead using the COUNT aggregate function on the Weight column:
|
1 2 3 |
SELECT COUNT(weight) FROM Production.Product WHERE ProductID BETWEEN 715 AND 720; |
This time the database engine returns a value of 4, indicating that the two NULL rows have been eliminated. We will again receive the warning message.
All aggregate functions eliminate the NULL values prior to preforming the actual calculations except for the COUNT(*) function:
|
1 2 3 |
SELECT COUNT(*) FROM Production.Product WHERE ProductID BETWEEN 715 AND 720; |
In this case, the database engine includes all rows, no matter where the NULL values lie, and will generate no warning messages.
The fact that SQL Server handles aggregations this way is not a bad thing. But it’s something you need to understand. If you’re not familiar with the data, your queries might be eliminating rows without understanding the full implications. You can consider using a function such as ISNULL to replace the NULL values, but you still risk skewing your results. As with many situations, handling NULL values can take a special touch. For example, detailed analytics will require, at the very least, knowledge of the number of NULL values in a particular data set.
As a side note, those who don’t appreciate having their queries return warning messages like those we’ve seen here, can use the SET ANSI_WARNINGS statement to turn warnings off when executing an aggregation:
|
1 2 3 4 5 |
SET ANSI_WARNINGS OFF; SELECT AVG(Weight) FROM Production.Product WHERE ProductID BETWEEN 715 AND 720; SET ANSI_WARNINGS ON; |
Although this will give us the same results without the message, turning off this option can affect other operations. For example, modifying data on indexed views or on tables with indexed computed columns will fail if the option is turned off. So tread carefully if you consider this route.
Also be aware that when you start grouping and aggregating data, you might not get exactly the results you expect when that data includes NULL values. For example, the following SELECT statement groups by the ProductSubcategoryID column in our trusty Product table:
|
1 2 3 4 5 6 |
SELECT ProductSubcategoryID AS ProdSubID, AVG(Weight) AS AvgWeight, COUNT(*) AS FullCount, COUNT(Weight) AS ColCount FROM Production.Product GROUP BY ProductSubcategoryID; |
The following table shows the first 10 rows in our result set:
|
ProdSubID |
AvgWeight |
FullCount |
ColCount |
|
NULL |
434.285714 |
209 |
7 |
|
1 |
25.058750 |
32 |
32 |
|
2 |
17.448139 |
43 |
43 |
|
3 |
27.680909 |
22 |
22 |
|
4 |
NULL |
8 |
0 |
|
5 |
187.000000 |
3 |
3 |
|
6 |
317.000000 |
2 |
2 |
|
7 |
NULL |
1 |
0 |
|
8 |
603.333333 |
3 |
3 |
|
9 |
151.500000 |
2 |
2 |
The full result set represents all 504 rows in the Product table, including rows whose ProductSubcategoryID and Weight columns are NULL. If we were to dig into the averages and the individual column counts for each group, we would discover that the NULL values are not included, but the result set as a whole includes all the data. Again, I point this out not to suggest that there is a problem with the way SQL Server groups and aggregates data, but only to demonstrate that we need to be aware of what’s happening with our queries when NULL values are involved. In this case, by including both types of counts in each group, we’re providing a handy way to provide insight into our source data.
Failure #9: Forgetting about NULL in your Boolean expressions
Boolean expressions that involve NULL values can be especially tricky because they introduce what is called three-valued logic. Typically, when an expression is evaluated, it returns either TRUE or FALSE. However, when NULL enters the mix, we get a third possibility: UNKNOWN. As we’ve already covered, SQL Server marks a value as NULL if it does not exist. As a result, a Boolean expression that tries to evaluate a condition based on a missing value results in UNKNOWN.
The main thing to keep in mind with Boolean expressions is that your result set will include only those rows in which the expression evaluates to TRUE. If the expression evaluates to FALSE or UNKNOWN, the row is omitted.
The specifics of three-valued logic can seem unwieldy when taking account the various combinations that can result from using the AND, OR, and NOT operators, but you can find a handy reference to help sort all that out be referring to the SQL Server help topic “Null Values.” In the meantime, lets look at a few examples that demonstrate what happens when NULL values enter the Boolean mix, starting with the following temporary table:
|
1 2 3 4 5 |
SELECT ProductID, Name AS ProductName, Weight, Class INTO #products FROM Production.Product WHERE ProductID BETWEEN 945 AND 954; SELECT * FROM #products; |
The following results show the data that’s now stored in the #products table:
|
ProductID |
ProductName |
Weight |
Class |
|
945 |
Front Derailleur |
88.00 |
NULL |
|
946 |
LL Touring Handlebars |
NULL |
L |
|
947 |
HL Touring Handlebars |
NULL |
H |
|
948 |
Front Brakes |
317.00 |
NULL |
|
949 |
LL Crankset |
600.00 |
L |
|
950 |
ML Crankset |
635.00 |
M |
|
951 |
HL Crankset |
575.00 |
H |
|
952 |
Chain |
NULL |
NULL |
|
953 |
Touring-2000 Blue, 60 |
27.90 |
M |
|
954 |
Touring-1000 Yellow, 46 |
25.13 |
H |
Now let’s use the AND operator to create a simple Boolean expression in our WHERE clause:
|
1 2 |
SELECT * FROM #products WHERE Weight > 500 AND Class = 'H'; |
As expected, the statement returns the results shown in the following table:
|
ProductID |
ProductName |
Weight |
Class |
|
951 |
HL Crankset |
575.00 |
H |
Notice that the query engine excludes the row for product ID 947. Even though the Class value is H, the Weight value is NULL, so the expression evaluates to UNKNOWN. And of course, row 952 is excluded because both the Weight and Class values are NULL.
Now let’s rerun the statement, but this time use the OR operator for our Boolean expression:
|
1 2 |
SELECT * FROM #products WHERE Weight > 500 OR Class = 'H'; |
As expected, our results look quite different:
|
ProductID |
ProductName |
Weight |
Class |
|
947 |
HL Touring Handlebars |
NULL |
H |
|
949 |
LL Crankset |
600.00 |
L |
|
950 |
ML Crankset |
635.00 |
M |
|
951 |
HL Crankset |
575.00 |
H |
|
954 |
Touring-1000 Yellow, 46 |
25.13 |
H |
In this case, only one of the two Boolean conditions needs to evaluate to TRUE, resulting in many more returned rows. Even row 947 is included, even through the Weight value is NULL. Because the OR operator is used and the second condition evaluates to TRUE, that row is included.
This time around, let’s use the AND NOT operator in our statement:
|
1 2 |
SELECT * FROM #products WHERE Weight > 500 AND NOT Class = 'H'; |
Once again, we get very different results:
|
ProductID |
ProductName |
Weight |
Class |
|
949 |
LL Crankset |
600.00 |
L |
|
950 |
ML Crankset |
635.00 |
M |
Not surprising, the only rows to be included are those in which both Boolean conditions evaluate to TRUE. All the FALSE and UNKNOWN values are left behind.
As with aggregate functions, SQL Server’s treatment of Boolean expressions in itself is not a bad thing. You just need to be aware of what can happen with your data when NULL values are involved. Data can be eliminated that you might expect to be included, leading to unintended consequences in subsequent analytics and decision-making.
Failure #10: Not accounting for NULL in your string expressions
Up to this point, we’ve touched little upon string values and what happens when NULL enters the mix, but at least some coverage in this area is warranted, beyond what we saw when working with Boolean expressions.
Suppose we start with the following SELECT statement:
|
1 2 3 |
SELECT ProductID, Name AS ProductName, Color FROM Production.Product WHERE ProductID BETWEEN 835 AND 845; |
The statement returns the results shown here:
|
ProductID |
ProductName |
Color |
|
835 |
ML Road Frame-W – Yellow, 44 |
Yellow |
|
836 |
ML Road Frame-W – Yellow, 48 |
Yellow |
|
837 |
HL Road Frame – Black, 62 |
Black |
|
838 |
HL Road Frame – Black, 44 |
Black |
|
839 |
HL Road Frame – Black, 48 |
Black |
|
840 |
HL Road Frame – Black, 52 |
Black |
|
841 |
Men’s Sports Shorts, S |
Black |
|
842 |
Touring-Panniers, Large |
Grey |
|
843 |
Cable Lock |
NULL |
|
844 |
Minipump |
NULL |
|
845 |
Mountain Pump |
NULL |
Now suppose when want to filter our data set even further by returning all rows with a Color value other than black:
|
1 2 3 4 |
SELECT ProductID, Name AS ProductName, Color FROM Production.Product WHERE ProductID BETWEEN 835 AND 845 AND Color <> 'black'; |
Now our SELECT statement returns the following results:
|
ProductID |
ProductName |
Color |
|
835 |
ML Road Frame-W – Yellow, 44 |
Yellow |
|
836 |
ML Road Frame-W – Yellow, 48 |
Yellow |
|
842 |
Touring-Panniers, Large |
Grey |
We might have expected the results to include those rows whose Color value is NULL. However, we’re once again faced with the issue of trying to evaluate an expression against what is essentially a missing value. A “not black” value cannot be equated with a “missing” value; consequently, those rows are omitted.
Concatenating string values follows similar logic:
|
1 2 3 4 |
SELECT Name AS ProductName, ProductNumber, Color, (ProductNumber + '_' + Color) AS NewProductNumber FROM Production.Product WHERE ProductID BETWEEN 835 AND 845; |
Whenever NULL shows up, we end up with nothing (NULL) in our NewProductNumber column:
|
ProductName |
ProductNumber |
Color |
NewProductNumber |
|
ML Road Frame-W – Yellow, 44 |
FR-R72Y-44 |
Yellow |
FR-R72Y-44_Yellow |
|
ML Road Frame-W – Yellow, 48 |
FR-R72Y-48 |
Yellow |
FR-R72Y-48_Yellow |
|
HL Road Frame – Black, 62 |
FR-R92B-62 |
Black |
FR-R92B-62_Black |
|
HL Road Frame – Black, 44 |
FR-R92B-44 |
Black |
FR-R92B-44_Black |
|
HL Road Frame – Black, 48 |
FR-R92B-48 |
Black |
FR-R92B-48_Black |
|
HL Road Frame – Black, 52 |
FR-R92B-52 |
Black |
FR-R92B-52_Black |
|
Men’s Sports Shorts, S |
SH-M897-S |
Black |
SH-M897-S_Black |
|
Touring-Panniers, Large |
PA-T100 |
Grey |
PA-T100_Grey |
|
Cable Lock |
LO-C100 |
NULL |
NULL |
|
Minipump |
PU-0452 |
NULL |
NULL |
|
Mountain Pump |
PU-M044 |
NULL |
NULL |
Once again, the bottom line is that you must be prepared to handle NULL values if they could in any way impact your results. For example, you might try throwing an ISNULL into the mix:
|
1 2 3 4 |
SELECT Name AS ProductName, ProductNumber, Color, (ProductNumber + '_' + ISNULL(Color, 'neutral')) AS NewProductNumber FROM Production.Product WHERE ProductID BETWEEN 835 AND 845; |
Now the statement returns the following results:
|
ProductName |
ProductNumber |
Color |
NewProductNumber |
|
ML Road Frame-W – Yellow, 44 |
FR-R72Y-44 |
Yellow |
FR-R72Y-44_Yellow |
|
ML Road Frame-W – Yellow, 48 |
FR-R72Y-48 |
Yellow |
FR-R72Y-48_Yellow |
|
HL Road Frame – Black, 62 |
FR-R92B-62 |
Black |
FR-R92B-62_Black |
|
HL Road Frame – Black, 44 |
FR-R92B-44 |
Black |
FR-R92B-44_Black |
|
HL Road Frame – Black, 48 |
FR-R92B-48 |
Black |
FR-R92B-48_Black |
|
HL Road Frame – Black, 52 |
FR-R92B-52 |
Black |
FR-R92B-52_Black |
|
Men’s Sports Shorts, S |
SH-M897-S |
Black |
SH-M897-S_Black |
|
Touring-Panniers, Large |
PA-T100 |
Grey |
PA-T100_Grey |
|
Cable Lock |
LO-C100 |
NULL |
LO-C100_neutral |
|
Minipump |
PU-0452 |
NULL |
PU-0452_neutral |
|
Mountain Pump |
PU-M044 |
NULL |
PU-M044_neutral |
At least this way we have something to work with other than NULL.
Failure #11: Joining tables on nullable columns
Another area where we might get results other than what we expect is when we join data based on nullable columns. Because NULL value are considered non-values and because the database engine cannot match non-values, NULL values will show up in the join column only for outer joins. To demonstrate, let’s start by creating and populating a couple tables:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
CREATE TABLE Products (ProductID INT, ProductName NVARCHAR(50), CategoryAltID INT); INSERT INTO Products VALUES (945, 'Front Derailleur', 10), (946, 'LL Touring Handlebars', 20), (947, 'HL Touring Handlebars', NULL), (948, 'Front Brakes', 20), (949, 'LL Crankset', 10), (950, 'ML Crankset', NULL); CREATE TABLE Categories (CategoryNumber NVARCHAR(25), CategoryAltID INT); INSERT INTO Categories VALUES ('FR-R72Y-44', 10), ('FR-R72Y-48', 20), ('FR-R92B-62', NULL), ('FR-R92B-52', NULL); |
Notice that the CategoryAltID column in both tables contains NULL values. Now let’s create an inner join between those tables, based on that column:
|
1 2 3 4 |
SELECT p.ProductID, p.ProductName, p.CategoryAltID AS pCatAltID, c.CategoryAltID AS cCatAltID, c.CategoryNumber FROM Products p JOIN Categories c ON p.CategoryAltID = c.CategoryAltID; |
Although the CategoryAltID column contains NULL values in both tables, they are not considered equal (for all the reasons we’ve discussed), which means they’re not included in the result set:
|
ProductID |
ProductName |
pCatAltID |
cCatAltID |
CategoryNumber |
|
945 |
Front Derailleur |
10 |
10 |
FR-R72Y-44 |
|
946 |
LL Touring Handlebars |
20 |
20 |
FR-R72Y-48 |
|
948 |
Front Brakes |
20 |
20 |
FR-R72Y-48 |
|
949 |
LL Crankset |
10 |
10 |
FR-R72Y-44 |
Now let’s revise the statement to include a left outer join:
|
1 2 3 4 |
SELECT p.ProductID, p.ProductName, p.CategoryAltID AS pCatAltID, c.CategoryAltID AS cCatAltID, c.CategoryNumber FROM Products p LEFT OUTER JOIN Categories c ON p.CategoryAltID = c.CategoryAltID; |
This time our results include all rows from the Product table, including those whose CategoryAltID column is NULL in both tables:
|
ProductID |
ProductName |
pCatAltID |
cCatAltID |
CategoryNumber |
|
945 |
Front Derailleur |
10 |
10 |
FR-R72Y-44 |
|
946 |
LL Touring Handlebars |
20 |
20 |
FR-R72Y-48 |
|
947 |
HL Touring Handlebars |
NULL |
NULL |
NULL |
|
948 |
Front Brakes |
20 |
20 |
FR-R72Y-48 |
|
949 |
LL Crankset |
10 |
10 |
FR-R72Y-44 |
|
950 |
ML Crankset |
NULL |
NULL |
NULL |
The query engine includes NULL rows not because they’re considered matches, but rather because that’s the nature of an outer join and the way it returns all rows from the “outer” table. We can also try our statement with a full outer join:
|
1 2 3 4 |
SELECT p.ProductID, p.ProductName, p.CategoryAltID AS pCatAltID, c.CategoryAltID AS cCatAltID, c.CategoryNumber FROM Products p FULL OUTER JOIN Categories c ON p.CategoryAltID = c.CategoryAltID; |
Now our results include everything, regardless of the NULL values:
|
ProductID |
ProductName |
pCatAltID |
cCatAltID |
CategoryNumber |
|
945 |
Front Derailleur |
10 |
10 |
FR-R72Y-44 |
|
946 |
LL Touring Handlebars |
20 |
20 |
FR-R72Y-48 |
|
947 |
HL Touring Handlebars |
NULL |
NULL |
NULL |
|
948 |
Front Brakes |
20 |
20 |
FR-R72Y-48 |
|
949 |
LL Crankset |
10 |
10 |
FR-R72Y-44 |
|
950 |
ML Crankset |
NULL |
NULL |
NULL |
|
NULL |
NULL |
NULL |
NULL |
FR-R92B-62 |
|
NULL |
NULL |
NULL |
NULL |
FR-R92B-52 |
If you consider the logic behind the query engine’s approach to NULL values, it’s not surprising we get the results we do. Like any place where NULL shows up, it comes down to understanding the data we could potentially be dealing with and what that means to the bigger picture. Whenever NULL enters into the equation, we must take it into consideration so we don’t end up with query results very different from what we expect. When we join data, a NULL in one column can cause us to miss important data in other columns, all because we did not think through the possible outcomes.
Failure #12: Creating a unique index without taking NULL into account
Up to this point, I’ve tried to emphasize how SQL Server uses NULL only to indicate that a value is missing. We cannot compare or add or multiply NULL values because we cannot compare or add or multiply missing values. When it comes to unique indexes, however, SQL Server seems to have a double standard, treating NULL values as equal values and consequently permitting only one NULL value in a unique index.
To demonstrate, let’s create and populate a simple table:
|
1 2 3 4 5 6 7 8 9 |
CREATE TABLE Products (ProductID INT, ProductName NVARCHAR(50), ProductNumber NVARCHAR(25)); INSERT INTO Products VALUES (945, 'Front Derailleur', 'FD-2342'), (946, 'LL Touring Handlebars', 'HB-T721'), (947, 'HL Touring Handlebars', 'HB-T928'), (948, 'Front Brakes', 'FB-9873'), (949, 'LL Crankset', NULL), (950, 'ML Crankset', NULL); |
Now let’s try to create a unique index on the ProductNumber column:
|
1 2 |
CREATE UNIQUE INDEX ux_products_productnumber ON Products(ProductNumber); |
Because the column contains multiple NULL values, the database engine returns the following error:
|
1 2 3 |
Msg 1505, Level 16, State 1, Line 982 The CREATE UNIQUE INDEX statement terminated because a duplicate key was found for the object name 'dbo.Products' and the index name 'ux_products_productnumber'. The duplicate key value is (<NULL>). The statement has been terminated. |
This might seem contradictory behavior on the part of SQL Server, given the non-value status it imposes on NULL everywhere else. After all, how can missing values be considered duplicates if there is no way to compare them?
Because of this limitation, we have to be particularly careful when we create our tables and indexes prior to populating them. We could be faced with similar error messages if we try to insert a row with a duplicate NULL value.
Fortunately, since SQL Server 2008, we’ve been able to create filtered indexes that let us sidestep the duplicate NULL issue:
|
1 2 3 |
CREATE UNIQUE INDEX ux_products_productnumber ON Products(ProductNumber) WHERE ProductNumber IS NOT NULL; |
The index is now created without a hitch. We can even insert additional duplicate NULL values:
|
1 2 |
INSERT INTO Products VALUES (951, 'HL Crankset', NULL); |
Creating a filtered index is easy enough to do and is a handy way to sidestep the limitations of a unique index. Just be sure to refer to the SQL Server help topic “Create Filtered Indexes” to make certain you’re familiar with the limitations and restrictions on these types of indexes.
Failure #13: Sorting data on a nullable column
A result set that is not in the order you want might not seem such a big deal, but it can still be annoying when you sort your results and end up with a bunch of NULL values at the beginning. For example, the following SELECT statement sorts the data by the Color column:
|
1 2 3 4 |
SELECT ProductID, Name AS ProductName, Color FROM Production.Product WHERE ProductID BETWEEN 835 AND 845 ORDER BY Color; |
The statement gives us the following results:
|
ProductID |
ProductName |
Color |
|
843 |
Cable Lock |
NULL |
|
844 |
Minipump |
NULL |
|
845 |
Mountain Pump |
NULL |
|
837 |
HL Road Frame – Black, 62 |
Black |
|
838 |
HL Road Frame – Black, 44 |
Black |
|
839 |
HL Road Frame – Black, 48 |
Black |
|
840 |
HL Road Frame – Black, 52 |
Black |
|
841 |
Men’s Sports Shorts, S |
Black |
|
842 |
Touring-Panniers, Large |
Grey |
|
835 |
ML Road Frame-W – Yellow, 44 |
Yellow |
|
836 |
ML Road Frame-W – Yellow, 48 |
Yellow |
We can instead sort the data in descending order, but that would put the colors themselves in reverse order. Another work-around is to slip a CASE expression into our WHERE clause to trick the database engine:
|
1 2 3 4 5 6 |
SELECT ProductID, Name AS ProductName, Color FROM Production.Product WHERE ProductID BETWEEN 835 AND 845 ORDER BY CASE WHEN Color IS NULL THEN 1 ELSE 0 END, Color; |
The extra expression assigns 1 to the NULL rows and 0 to the rest so they’re sorted first. From there, the normal sorting occurs, as shown in the following results:
|
ProductID |
ProductName |
Color |
|
837 |
HL Road Frame – Black, 62 |
Black |
|
838 |
HL Road Frame – Black, 44 |
Black |
|
839 |
HL Road Frame – Black, 48 |
Black |
|
840 |
HL Road Frame – Black, 52 |
Black |
|
841 |
Men’s Sports Shorts, S |
Black |
|
842 |
Touring-Panniers, Large |
Grey |
|
835 |
ML Road Frame-W – Yellow, 44 |
Yellow |
|
836 |
ML Road Frame-W – Yellow, 48 |
Yellow |
|
843 |
Cable Lock |
NULL |
|
844 |
Minipump |
NULL |
|
845 |
Mountain Pump |
NULL |
This might not be an earth-shattering solution, but it’s an easy enough way to move those NULL values to the bottom, if you happen to prefer them there.
Load comments