How to Get NULLs Horribly Wrong in SQL Server

NULLs in SQL are a mixed blessing. The Three-Valued Logic of SQL has its uses but can cause difficulties to anyone who assumes that NULL has a 'value'. It can make reports go horribly wrong in a number of alarming ways, as Robert Sheldon explains.

Failure #1: Not knowing what NULL means

When referring to NULL within our data sets, we often use phrases such as “the NULL value” or “a value of NULL.” I do it all the time. I see it done all the time. Such phrases are so common that we think little of their use. But inherent in the phrase is the notion of NULL as an actual value. It is not. NULL is a non-value, a nonexistent value. It is not zero. It is not an empty string. A value cannot equal NULL. No two NULL values are equal.

A NULL value is often defined as one that is unknown or not applicable, but even these definitions can be open to debate. For example, a record might not include a customer’s birthdate because the salesperson didn’t ask or because the customer would not provide it, but the customer still knows that date, so it is hardly unknown, nor is it any less applicable. There can be many reasons for missing data, and there has been much debate about a precise meaning of NULL. If you need to attach a specific meaning to NULL, then missing or absent data is probably your safest bet.

Perhaps a better way to think of NULL is as a setting or marker that indicates if a data value does not exist. That certainly seems to be how SQL Server treats NULL. The database engine uses a special bitmap to track which columns in a row are NULL and which are not. The bitmap contains a bit for each column, with the bit set to 1 if the column is NULL, that is, if the value is missing.

The confusion for some, particularly beginning T-SQL developers, is the way SQL Server Management Studio indicates that no value exists for a specific column: by displaying what appears to be a value that reads NULL. For example, the following SELECT statement returns data from the Product table in the AdventureWorks2014 sample database:

As the following results show, every row includes at least one column for which no value exists:

ProductName

ProductLine

Class

Style

SellEndDate

HL Road Frame – Black, 58

R

H

U

NULL

HL Road Frame – Red, 58

R

H

U

NULL

Sport-100 Helmet, Red

S

NULL

NULL

NULL

Sport-100 Helmet, Black

S

NULL

NULL

NULL

Mountain Bike Socks, M

M

NULL

U

2012-05-29 00:00:00.000

Mountain Bike Socks, L

M

NULL

U

2012-05-29 00:00:00.000

Sport-100 Helmet, Blue

S

NULL

NULL

NULL

AWC Logo Cap

S

NULL

U

NULL

Long-Sleeve Logo Jersey, S

S

NULL

U

NULL

Long-Sleeve Logo Jersey, M

S

NULL

U

NULL

Long-Sleeve Logo Jersey, L

S

NULL

U

NULL

If we look at the columns with the NULL values, we can see how difficult it is to determine why the values are missing. For example, is a SellEndDate value missing because the product is still live or because the application failed to properly record the end date? Are the Class and Style values missing because they don’t apply to those particular rows or because someone was careless in entering data?

An even bigger problem with viewing the results in this way is that it suggests that the database is actually storing multiple instances of this mysterious placeholder called NULL and that all those NULL values are the same. They are not the same. There is nothing there. Nothing cannot be the same as nothing. Nothing cannot be compared to nothing. You can think of these as the fundamental principles behind NULL. The principles of nothingness. And not understanding these principles can lead to all sorts of problems when working with SQL Server, resulting in many of the failures we cover here.

Failure #2: Treating NULL like a real value in comparisons

Developers who don’t understand how NULL works will sometimes use comparison operators to compare an expression to a NULL value, resulting in a statement that looks similar to the following:

The assumption here is that NULL is a valid value and therefore can be compared to other NULL values. That means, in this case, the statement should return each row whose Title value is NULL. A quick check into the source data will show four rows that match the statement’s assumed logic. However, because the database engine cannot compare nothing to nothing, it will not return these rows and will instead provide us with an empty result set.

If we assume from the empty result set that no rows in the source data match our criteria, we might be satisfied with the statement as is and send the code onto production. (Let’s hope no one would really do that.) We might also choose to continue along this logical trajectory and recast the statement to return the non-NULL data:

The statement is no longer trying to compare NULL with NULL, but rather compare real non-NULL values to NULL, which would seem an equally credible comparison, if not more so. It is not. We cannot compare real values to nothing any more than we can compare nothing to nothing. As a result, the statement still returns no data, even though we know for certain the data exists.

But before we go further, note that there are cases when comparing NULL values in this way can work, which we cover in the next failure. For now, know that the proper way to perform NULL comparisons is to use the IS NULL and IS NOT NULL operators. For example, the following SELECT statement uses the IS NULL operator to return rows whose Title value is NULL:

This time, the statement returns the data we expect, with the Title value missing, along with a couple of MiddleName values:

Title

FirstName

MiddleName

LastName

NULL

Ken

J

Sánchez

NULL

Terri

Lee

Duffy

NULL

Roberto

NULL

Tamburello

NULL

Rob

NULL

Walters

We can also return those rows that contain an actual Title value by instead using the IS NOT NULL operator:

Now the statement returns only the following two rows:

Title

FirstName

MiddleName

LastName

Ms.

Gail

A

Erickson

Mr.

Jossef

H

Goldberg

The IS NULL and IS NOT NULL operators provide the logic we need to handle the ambiguous nature of NULL values. The purpose of these operators is to determine whether or not an expression is NULL and from that return either TRUE or FALSE. Comparison operators, when faced with a NULL value, instead want to return UNKNOWN, which can result in unpredictable and costly mistakes if implemented in production.

Failure #3: Not knowing how ANSI_NULL works

In the last failure, I demonstrated why you should not use comparison operators against NULL values. Even if there are results to be had, a statement such as the following will likely not return them:

The word likely is key here. It turns out that the behavior I described is the default behavior, but not the only behavior. SQL Server supports a feature called ANSI_NULLS, which determines how the database engine handles NULL comparisons. The option is a carry-over from the ghost of SQL Server Past and has been deprecated since 2005. But the setting still exists, it still works, and it can still cause all sorts of trouble.

By default, SQL Server installs with the ANSI_NULLS option turned on, which gives us the behavior I’ve described so far. If we compare values to NULL, the database engine does not return those rows, even if NULL exists.

We can verify the current ANSI_NULLS setting by running a statement similar to the following:

If no one has messed with ANSI_NULLS, the database engine will return the first option, which will give us the results we’ve seen so far. But suppose we turn the option off:

Now our statement returns the expected four rows, despite the fact that we’re using a comparison operator against NULL:

Title

FirstName

MiddleName

LastName

NULL

Ken

J

Sánchez

NULL

Terri

Lee

Duffy

NULL

Roberto

NULL

Tamburello

NULL

Rob

NULL

Walters

With the option turned off, we can also use the not equal operator:

Now the results look like the following:

Title

FirstName

MiddleName

LastName

Ms.

Gail

A

Erickson

Mr.

Jossef

H

Goldberg

If we turn the option back on (it’s default setting), the database engine will again return no rows:

Because the ANSI_NULLS option has been deprecated, you should not use it and you should update any code that does, assuming you have the access and time. If you don’t, you could be in store for bigger problems.

Imagine an application that connects to two different SQL Server instances, one with the option turned on (the default) and one with it turned off. If the code is using comparison operators with NULL values, you could end up with inaccurate data or conflicting results. Even if you’re dealing with only one SQL Server instance, you still risk unexpected results, especially if statements are turning the ANSI_NULLS option on at the session level.

Failure #4: Treating ISNULL the same as IS NULL

The ISNULL function tends to cause a bit of confusion when first encountered, in part because of its name, which implies a meaning similar to the IS NULL operator. Another reason for confusion arises from the fact that products such as Microsoft Access and MySQL also include the ISNULL function, where it behaves much differently, more like the name suggests. But we’re talking SQL Server and need to understand how ISNULL is implemented here. A lack of understanding can lead to misuse and failure to take advantage of its real purpose in SQL Server.

Let’s looks at a few examples to demonstrate, starting with a simple query that returns products whose Color value is NULL:

ProductName

Color

LL Touring Handlebars

NULL

HL Touring Handlebars

NULL

If we’re not familiar with ISNULL in SQL Server or are transitioning in from Access, we might attempt to retrieve the same results by using ISNULL in our WHERE clause:

Fortunately, this construction will generate the following error, which we’ll likely catch immediately, or at least during the Q&A phase:

Those of us with a background in MySQL might instead take a different approach with the ISNULL function by comparing it to 1 or 0, the function’s two possible return values:

Once again, we receive an error message:

By now, we should have checked the documentation to verify what this function is supposed to do and have come to the conclusion that, based on the syntax and the fairly explicit error messages, we should be adding a second argument, which we do:

Unfortunately, all we’ve done is make our predicate completely illogical, resulting in yet another (but different) error message:

The point of that this is that, in SQL Server, the purpose of the ISNULL function is simply to replace an expression’s value with a real value, if the original value is NULL (nonexistent). If we wanted to make the preceding example right, we would need to define our Boolean condition correctly:

Now the SELECT statement returns the results we want, rather than an error:

ProductName

Color

LL Mountain Frame – Black, 40

Black

LL Touring Handlebars

NULL

HL Touring Handlebars

NULL

LL Crankset

Black

That said, we can achieve the same thing by using IS NULL, without incurring the extra processing:

More often than not, the place we’ll want to use the ISNULL function is within our select list to replace the NULL values with real values:

The Color column now contains a value for each row, with no missing values to contend with:

ProductName

Color

HL Road Pedal

Silver/Black

Touring Pedal

Silver/Black

ML Mountain Frame-W – Silver, 38

Silver

LL Mountain Frame – Black, 40

Black

LL Mountain Frame – Silver, 40

Silver

Front Derailleur

Silver

LL Touring Handlebars

neutral

HL Touring Handlebars

neutral

Front Brakes

Silver

LL Crankset

Black

Be aware that, when using ISNULL, the database engine converts the replacement value to the data type of the original value, making its possible for data to get truncated or for the database engine to generate a conversion error. Make sure you read up on ISNULL before adding it to your code.

Failure #5: Treating ISNULL the same as COALESCE

In the last failure, I tried to address any confusion there might be around the ISNULL function, but there’s another area of possible confusion, and that is in understanding the differences between ISNULL and COALESCE. Both functions let us replace NULL with a real value, but there are differences between the two that can be important to understand.

First off, the ISNULL function implemented in SQL Server is specific to that brand of T-SQL, whereas COALESCE is standard ANSI, making it more universal and, consequently, more portable, an important consideration if the possibility exists that you might one day want to point your apps to a different database system.

The COALESCE function also lets us include more than two expressions, as shown in the following SELECT statement:

The function evaluates the expressions in the order they’re specified and returns the first non-NULL value. As the following results show, each row returns one of the specified values in the FirstNotNull column:

ProductName

Class

Style

ProductLine

FirstNotNull

Rear Derailleur Cage

NULL

NULL

NULL

n/a

HL Road Frame – Red, 58

H

U

R

H

Mountain Bike Socks, M

NULL

U

M

U

Sport-100 Helmet, Blue

NULL

NULL

S

S

The COALESCE function can actually be thought of as syntactical shorthand for a CASE expression:

The SELECT statement achieves the same results as the preceding example, but its construction is more complex. The main advantage of using a CASE expression is that it makes the logic clearer, for those who might need a little extra help.

Compared to the COALESCE function, the ISNULL function is much more limited:

In this case, all we’re saying is that if Class is NULL, then use Style, giving us the following results:

ProductName

Class

Style

FirstNotNull

Rear Derailleur Cage

NULL

NULL

NULL

HL Road Frame – Red, 58

H

U

H

Mountain Bike Socks, M

NULL

U

U

Sport-100 Helmet, Blue

NULL

NULL

NULL

We could have specified n/a as our second argument, similar to what we did with COALESCE, but you get the point. COALESCE is more flexible and it conforms to ANSI standards. In addition, COALESCE handles data types differently. To understand this difference, let’s start with a different example:

The Class and Style columns are defined as NCHAR, and the ProductSubcategoryID column as INT. The SELECT statement returns the following results:

ProductName

Class

Style

ProductSubcategoryID

LL Shell

L

NULL

NULL

Now lets modify our code to include the COALESCE function:

The database engine now generates an error when it tries to convert the Class value to an INT:

It turns out that COALESCE uses the type with the highest precedence, based on the submitted values. Because the ProductSubcategoryID column takes an INT and an INT takes precedence over NCHAR (which is treated as NVARCHAR), the database engine tries to convert L to an INT, which of course is not possible.

In such cases, we might try changing the order of the columns we pass to the COALESCE function:

But we will again receive a conversion area. The database engine does not care about the order of the arguments, only which types take precedence over others:

The ISNULL function works differently:

In this case, the Class data type sets the standard, so the database engine has no problem returning the results:

ProductName

Class

ProductSubcategoryID

FirstNotNull

LL Shell

L

NULL

L

Now let’s reverse the order of the columns we pass to ISNULL:

The database engine will once again try to convert L to an INT and generate a conversion error. Be sure that, whenever you’re using these functions, you are well aware of the possibility for conversion errors.

Failure #6: Forgetting about the NULLIF function

While we’re on the topic of NULL-related functions, it’s worth noting another one that garners less attention, but can still be useful: the NULLIF function. We don’t seem to come across this one as often as the other functions, in part because there does not seem to be as many use cases for NULLIF as there are the others. Basically, the function returns a NULL value if two expressions are equal. If they are not equal, the function returns the first expression. Here’s an example:

All we’re doing here is passing the two columns into the function and returning the results, as shown in the following table:

ProductName

MakeFlag

FinishedGoodsFlag

NullIfEqual

Top Tube

1

0

1

Rear Derailleur Cage

0

0

NULL

HL Road Frame – Black, 58

1

1

NULL

Sport-100 Helmet, Red

0

1

0

We can achieve the same results by using a CASE expression instead of the NULLIF function:

As with COALESCE, the NULLIF function helps to simplify our statements, but the CASE expression makes the logic clearer to the rest of the world.

Perhaps one of the most useful places to use the NULLIF function is within a mathematical expression in which an error would be generated if a zero were encountered. Dividing by zero is a good example of this. Let’s start with the following SELECT statement:

The database engine runs the calculation with no problem and returns the following results:

ProductName

StandardCost

ListPrice

NetDiff

HL Road Frame – Black, 58

1059.31

1431.50

2.8461

Now let’s pull the data from a row in which the StandardCost and ListPrice values are zero:

This time, we get a divide-by-zero error:

This is where NULLIF comes it. By returning a NULL in place of a zero, we can at least prevent the database engine from generating an error:

Now we at least get results we can live with.

ProductName

StandardCost

ListPrice

NetDiff

Rear Derailleur Cage

0.00

0.00

NULL

We could have also used the WHERE clause to filter our rows that would have given us zero, but then we would not be able to view any of the data’s row. The NULLIF function at least gives us data to work with and makes it easy to do so.

Failure #7: Assuming that 2 + NULL = 2

As great as the NULLIF function is for handling certain situations, such as avoiding divide-by-zero errors, having NULL show up in our numeric calculations is often less than optimal. It’s not that SQL Server does anything surprising, but rather that you need to know what’s coming should you run into issues with your calculations.

This all goes back to the idea that NULL is a non-value. It is nothing. It merely indicates that any real data value is missing. Consequently, when we try to add a value to nothing or subtract from nothing or multiply by nothing or divide by nothing or do anything by nothing, we end up with nothing, or NULL. Take a look at this SELECT statement:

We’re simply multiplying the Weight value by 4.22 and getting the following results:

ProductName

Weight

FreightCost

Long-Sleeve Logo Jersey, L

NULL

NULL

Long-Sleeve Logo Jersey, XL

NULL

NULL

HL Road Frame – Red, 62

2.30

9.7060

HL Road Frame – Red, 44

2.12

8.9464

Not surprisingly, whenever we try to multiply by NULL, we get NULL. What choice does the database engine have? Now let’s throw a little addition in there to see what happens:

As before, when Weight contains a real value, we get a real value in return, but when NULL enters the equation, we always end up with NULL:

ProductName

Weight

FreightCost

Long-Sleeve Logo Jersey, L

NULL

NULL

Long-Sleeve Logo Jersey, XL

NULL

NULL

HL Road Frame – Red, 62

2.30

22.3660

HL Road Frame – Red, 44

2.12

21.6064

One approach we can take (when circumstances make it reasonable) is to define a default value that we can return in the event the source data is NULL:

In this case, we’re using ISNULL to return a default value of 4.95 should our calculation result in a NULL value, which gives us the following results:

ProductName

Weight

FreightCost

Long-Sleeve Logo Jersey, L

NULL

4.9500

Long-Sleeve Logo Jersey, XL

NULL

4.9500

HL Road Frame – Red, 62

2.30

9.7060

HL Road Frame – Red, 44

2.12

8.9464

SQL Server does what it should do when it encounters NULL. It returns NULL. We just need to be prepared for this eventuality. In some cases, having our expressions return a NULL value is fine for our circumstances, but at other times, we’ll want to adjust our code to accommodate those wayward missing values. Otherwise a whole lot of customers might end up getting free shipping because the weight was initially not a consideration.

Failure #8: Aggregating data without taking NULL into account

As with numeric calculations, we must be prepared to handle aggregations that could include NULL values. However, unlike mathematic operations, the database engine eliminates NULL values from the calculations before performing the actual aggregations, at least at the most basic level. Let’s start with a simple SELECT statement:

The statement returns the following results, which include two rows with a Weight value of NULL:

ProductName

Weight

Long-Sleeve Logo Jersey, L

NULL

Long-Sleeve Logo Jersey, XL

NULL

HL Road Frame – Red, 62

2.30

HL Road Frame – Red, 44

2.12

HL Road Frame – Red, 48

2.16

HL Road Frame – Red, 52

2.20

Suppose we now try to find the average weight for these rows:

The database engine will perform the aggregation with no problem by first eliminating the NULL rows from the mix and then returning a result of 2.195000. However, the results also include something else, a warning message about the NULL values:

We can verify that the NULL rows are being eliminated by instead using the COUNT aggregate function on the Weight column:

This time the database engine returns a value of 4, indicating that the two NULL rows have been eliminated. We will again receive the warning message.

All aggregate functions eliminate the NULL values prior to preforming the actual calculations except for the COUNT(*) function:

In this case, the database engine includes all rows, no matter where the NULL values lie, and will generate no warning messages.

The fact that SQL Server handles aggregations this way is not a bad thing. But it’s something you need to understand. If you’re not familiar with the data, your queries might be eliminating rows without understanding the full implications. You can consider using a function such as ISNULL to replace the NULL values, but you still risk skewing your results. As with many situations, handling NULL values can take a special touch. For example, detailed analytics will require, at the very least, knowledge of the number of NULL values in a particular data set.

As a side note, those who don’t appreciate having their queries return warning messages like those we’ve seen here, can use the SET ANSI_WARNINGS statement to turn warnings off when executing an aggregation:

Although this will give us the same results without the message, turning off this option can affect other operations. For example, modifying data on indexed views or on tables with indexed computed columns will fail if the option is turned off. So tread carefully if you consider this route.

Also be aware that when you start grouping and aggregating data, you might not get exactly the results you expect when that data includes NULL values. For example, the following SELECT statement groups by the ProductSubcategoryID column in our trusty Product table:

The following table shows the first 10 rows in our result set:

ProdSubID

AvgWeight

FullCount

ColCount

NULL

434.285714

209

7

1

25.058750

32

32

2

17.448139

43

43

3

27.680909

22

22

4

NULL

8

0

5

187.000000

3

3

6

317.000000

2

2

7

NULL

1

0

8

603.333333

3

3

9

151.500000

2

2

The full result set represents all 504 rows in the Product table, including rows whose ProductSubcategoryID and Weight columns are NULL. If we were to dig into the averages and the individual column counts for each group, we would discover that the NULL values are not included, but the result set as a whole includes all the data. Again, I point this out not to suggest that there is a problem with the way SQL Server groups and aggregates data, but only to demonstrate that we need to be aware of what’s happening with our queries when NULL values are involved. In this case, by including both types of counts in each group, we’re providing a handy way to provide insight into our source data.

Failure #9: Forgetting about NULL in your Boolean expressions

Boolean expressions that involve NULL values can be especially tricky because they introduce what is called three-valued logic. Typically, when an expression is evaluated, it returns either TRUE or FALSE. However, when NULL enters the mix, we get a third possibility: UNKNOWN. As we’ve already covered, SQL Server marks a value as NULL if it does not exist. As a result, a Boolean expression that tries to evaluate a condition based on a missing value results in UNKNOWN.

The main thing to keep in mind with Boolean expressions is that your result set will include only those rows in which the expression evaluates to TRUE. If the expression evaluates to FALSE or UNKNOWN, the row is omitted.

The specifics of three-valued logic can seem unwieldy when taking account the various combinations that can result from using the AND, OR, and NOT operators, but you can find a handy reference to help sort all that out be referring to the SQL Server help topic “Null Values.” In the meantime, lets look at a few examples that demonstrate what happens when NULL values enter the Boolean mix, starting with the following temporary table:

The following results show the data that’s now stored in the #products table:

ProductID

ProductName

Weight

Class

945

Front Derailleur

88.00

NULL

946

LL Touring Handlebars

NULL

L

947

HL Touring Handlebars

NULL

H

948

Front Brakes

317.00

NULL

949

LL Crankset

600.00

L

950

ML Crankset

635.00

M

951

HL Crankset

575.00

H

952

Chain

NULL

NULL

953

Touring-2000 Blue, 60

27.90

M

954

Touring-1000 Yellow, 46

25.13

H

Now let’s use the AND operator to create a simple Boolean expression in our WHERE clause:

As expected, the statement returns the results shown in the following table:

ProductID

ProductName

Weight

Class

951

HL Crankset

575.00

H

Notice that the query engine excludes the row for product ID 947. Even though the Class value is H, the Weight value is NULL, so the expression evaluates to UNKNOWN. And of course, row 952 is excluded because both the Weight and Class values are NULL.

Now let’s rerun the statement, but this time use the OR operator for our Boolean expression:

As expected, our results look quite different:

ProductID

ProductName

Weight

Class

947

HL Touring Handlebars

NULL

H

949

LL Crankset

600.00

L

950

ML Crankset

635.00

M

951

HL Crankset

575.00

H

954

Touring-1000 Yellow, 46

25.13

H

In this case, only one of the two Boolean conditions needs to evaluate to TRUE, resulting in many more returned rows. Even row 947 is included, even through the Weight value is NULL. Because the OR operator is used and the second condition evaluates to TRUE, that row is included.

This time around, let’s use the AND NOT operator in our statement:

Once again, we get very different results:

ProductID

ProductName

Weight

Class

949

LL Crankset

600.00

L

950

ML Crankset

635.00

M

Not surprising, the only rows to be included are those in which both Boolean conditions evaluate to TRUE. All the FALSE and UNKNOWN values are left behind.

As with aggregate functions, SQL Server’s treatment of Boolean expressions in itself is not a bad thing. You just need to be aware of what can happen with your data when NULL values are involved. Data can be eliminated that you might expect to be included, leading to unintended consequences in subsequent analytics and decision-making.

Failure #10: Not accounting for NULL in your string expressions

Up to this point, we’ve touched little upon string values and what happens when NULL enters the mix, but at least some coverage in this area is warranted, beyond what we saw when working with Boolean expressions.

Suppose we start with the following SELECT statement:

The statement returns the results shown here:

ProductID

ProductName

Color

835

ML Road Frame-W – Yellow, 44

Yellow

836

ML Road Frame-W – Yellow, 48

Yellow

837

HL Road Frame – Black, 62

Black

838

HL Road Frame – Black, 44

Black

839

HL Road Frame – Black, 48

Black

840

HL Road Frame – Black, 52

Black

841

Men’s Sports Shorts, S

Black

842

Touring-Panniers, Large

Grey

843

Cable Lock

NULL

844

Minipump

NULL

845

Mountain Pump

NULL

Now suppose when want to filter our data set even further by returning all rows with a Color value other than black:

Now our SELECT statement returns the following results:

ProductID

ProductName

Color

835

ML Road Frame-W – Yellow, 44

Yellow

836

ML Road Frame-W – Yellow, 48

Yellow

842

Touring-Panniers, Large

Grey

We might have expected the results to include those rows whose Color value is NULL. However, we’re once again faced with the issue of trying to evaluate an expression against what is essentially a missing value. A “not black” value cannot be equated with a “missing” value; consequently, those rows are omitted.

Concatenating string values follows similar logic:

Whenever NULL shows up, we end up with nothing (NULL) in our NewProductNumber column:

ProductName

ProductNumber

Color

NewProductNumber

ML Road Frame-W – Yellow, 44

FR-R72Y-44

Yellow

FR-R72Y-44_Yellow

ML Road Frame-W – Yellow, 48

FR-R72Y-48

Yellow

FR-R72Y-48_Yellow

HL Road Frame – Black, 62

FR-R92B-62

Black

FR-R92B-62_Black

HL Road Frame – Black, 44

FR-R92B-44

Black

FR-R92B-44_Black

HL Road Frame – Black, 48

FR-R92B-48

Black

FR-R92B-48_Black

HL Road Frame – Black, 52

FR-R92B-52

Black

FR-R92B-52_Black

Men’s Sports Shorts, S

SH-M897-S

Black

SH-M897-S_Black

Touring-Panniers, Large

PA-T100

Grey

PA-T100_Grey

Cable Lock

LO-C100

NULL

NULL

Minipump

PU-0452

NULL

NULL

Mountain Pump

PU-M044

NULL

NULL

Once again, the bottom line is that you must be prepared to handle NULL values if they could in any way impact your results. For example, you might try throwing an ISNULL into the mix:

Now the statement returns the following results:

ProductName

ProductNumber

Color

NewProductNumber

ML Road Frame-W – Yellow, 44

FR-R72Y-44

Yellow

FR-R72Y-44_Yellow

ML Road Frame-W – Yellow, 48

FR-R72Y-48

Yellow

FR-R72Y-48_Yellow

HL Road Frame – Black, 62

FR-R92B-62

Black

FR-R92B-62_Black

HL Road Frame – Black, 44

FR-R92B-44

Black

FR-R92B-44_Black

HL Road Frame – Black, 48

FR-R92B-48

Black

FR-R92B-48_Black

HL Road Frame – Black, 52

FR-R92B-52

Black

FR-R92B-52_Black

Men’s Sports Shorts, S

SH-M897-S

Black

SH-M897-S_Black

Touring-Panniers, Large

PA-T100

Grey

PA-T100_Grey

Cable Lock

LO-C100

NULL

LO-C100_neutral

Minipump

PU-0452

NULL

PU-0452_neutral

Mountain Pump

PU-M044

NULL

PU-M044_neutral

At least this way we have something to work with other than NULL.

Failure #11: Joining tables on nullable columns

Another area where we might get results other than what we expect is when we join data based on nullable columns. Because NULL value are considered non-values and because the database engine cannot match non-values, NULL values will show up in the join column only for outer joins. To demonstrate, let’s start by creating and populating a couple tables:

Notice that the CategoryAltID column in both tables contains NULL values. Now let’s create an inner join between those tables, based on that column:

Although the CategoryAltID column contains NULL values in both tables, they are not considered equal (for all the reasons we’ve discussed), which means they’re not included in the result set:

ProductID

ProductName

pCatAltID

cCatAltID

CategoryNumber

945

Front Derailleur

10

10

FR-R72Y-44

946

LL Touring Handlebars

20

20

FR-R72Y-48

948

Front Brakes

20

20

FR-R72Y-48

949

LL Crankset

10

10

FR-R72Y-44

Now let’s revise the statement to include a left outer join:

This time our results include all rows from the Product table, including those whose CategoryAltID column is NULL in both tables:

ProductID

ProductName

pCatAltID

cCatAltID

CategoryNumber

945

Front Derailleur

10

10

FR-R72Y-44

946

LL Touring Handlebars

20

20

FR-R72Y-48

947

HL Touring Handlebars

NULL

NULL

NULL

948

Front Brakes

20

20

FR-R72Y-48

949

LL Crankset

10

10

FR-R72Y-44

950

ML Crankset

NULL

NULL

NULL

The query engine includes NULL rows not because they’re considered matches, but rather because that’s the nature of an outer join and the way it returns all rows from the “outer” table. We can also try our statement with a full outer join:

Now our results include everything, regardless of the NULL values:

ProductID

ProductName

pCatAltID

cCatAltID

CategoryNumber

945

Front Derailleur

10

10

FR-R72Y-44

946

LL Touring Handlebars

20

20

FR-R72Y-48

947

HL Touring Handlebars

NULL

NULL

NULL

948

Front Brakes

20

20

FR-R72Y-48

949

LL Crankset

10

10

FR-R72Y-44

950

ML Crankset

NULL

NULL

NULL

NULL

NULL

NULL

NULL

FR-R92B-62

NULL

NULL

NULL

NULL

FR-R92B-52

If you consider the logic behind the query engine’s approach to NULL values, it’s not surprising we get the results we do. Like any place where NULL shows up, it comes down to understanding the data we could potentially be dealing with and what that means to the bigger picture. Whenever NULL enters into the equation, we must take it into consideration so we don’t end up with query results very different from what we expect. When we join data, a NULL in one column can cause us to miss important data in other columns, all because we did not think through the possible outcomes.

Failure #12: Creating a unique index without taking NULL into account

Up to this point, I’ve tried to emphasize how SQL Server uses NULL only to indicate that a value is missing. We cannot compare or add or multiply NULL values because we cannot compare or add or multiply missing values. When it comes to unique indexes, however, SQL Server seems to have a double standard, treating NULL values as equal values and consequently permitting only one NULL value in a unique index.

To demonstrate, let’s create and populate a simple table:

Now let’s try to create a unique index on the ProductNumber column:

Because the column contains multiple NULL values, the database engine returns the following error:

This might seem contradictory behavior on the part of SQL Server, given the non-value status it imposes on NULL everywhere else. After all, how can missing values be considered duplicates if there is no way to compare them?

Because of this limitation, we have to be particularly careful when we create our tables and indexes prior to populating them. We could be faced with similar error messages if we try to insert a row with a duplicate NULL value.

Fortunately, since SQL Server 2008, we’ve been able to create filtered indexes that let us sidestep the duplicate NULL issue:

The index is now created without a hitch. We can even insert additional duplicate NULL values:

Creating a filtered index is easy enough to do and is a handy way to sidestep the limitations of a unique index. Just be sure to refer to the SQL Server help topic “Create Filtered Indexes” to make certain you’re familiar with the limitations and restrictions on these types of indexes.

Failure #13: Sorting data on a nullable column

A result set that is not in the order you want might not seem such a big deal, but it can still be annoying when you sort your results and end up with a bunch of NULL values at the beginning. For example, the following SELECT statement sorts the data by the Color column:

The statement gives us the following results:

ProductID

ProductName

Color

843

Cable Lock

NULL

844

Minipump

NULL

845

Mountain Pump

NULL

837

HL Road Frame – Black, 62

Black

838

HL Road Frame – Black, 44

Black

839

HL Road Frame – Black, 48

Black

840

HL Road Frame – Black, 52

Black

841

Men’s Sports Shorts, S

Black

842

Touring-Panniers, Large

Grey

835

ML Road Frame-W – Yellow, 44

Yellow

836

ML Road Frame-W – Yellow, 48

Yellow

We can instead sort the data in descending order, but that would put the colors themselves in reverse order. Another work-around is to slip a CASE expression into our WHERE clause to trick the database engine:

The extra expression assigns 1 to the NULL rows and 0 to the rest so they’re sorted first. From there, the normal sorting occurs, as shown in the following results:

ProductID

ProductName

Color

837

HL Road Frame – Black, 62

Black

838

HL Road Frame – Black, 44

Black

839

HL Road Frame – Black, 48

Black

840

HL Road Frame – Black, 52

Black

841

Men’s Sports Shorts, S

Black

842

Touring-Panniers, Large

Grey

835

ML Road Frame-W – Yellow, 44

Yellow

836

ML Road Frame-W – Yellow, 48

Yellow

843

Cable Lock

NULL

844

Minipump

NULL

845

Mountain Pump

NULL

This might not be an earth-shattering solution, but it’s an easy enough way to move those NULL values to the bottom, if you happen to prefer them there.