T-SQL Tuesday #154 – SQL Server 2022, IS DISTINCT FROM

Comments 0

Share to social media

Despite my recent job change from full time T-SQL code jockey/architect to website editor for Simple-Talk, I will always be at heart at T-SQL programmer. While I will mostly be writing professionally in the foreseeable future will in support of an article, I will also continue writing code for several databases that I use for multiple personal reason housed on a SQL Server Express server on the desk beside me.

The Invitation: T-SQL Tuesday #154 Invitation – SQL Server 2022 (glennsqlperformance.com)

When I saw the topic of this T-SQL Tuesday was to write about what I have been doing with SQL Server 2022, I figured I would note the two things I have been doing recently. First, I wrote a chapter about security changes in SQL Server 2002 for a forthcoming book entitled “SQL Server 2022 Administration Inside Out” for Pearson with a great group of people that should be out later this year, early next at the latest. There are a few things I found out writing that chapter that I am keen to use, one of them being the more granular UNMASK permissions for the Dynamic Data Masking feature, but since I won’t be writing multi-user production code in the future, I probably won’t be masking any data, much less need granular masking capabilities.

The other thing I have been doing is trying out some of the new features coming in SQL Server 2022 that I will absolutely be using even in my hobby databases. There are tons of new features in 2022, as there always is. But the ones that excite me are the T-SQL improvements. In this article I am going to highlight 1 feature that is immediately a standout.

IS (NOT) DISTINCT FROM

The MOST exciting change from a T-SQL standpoint is: IS NOT DISTINCT FROM. This feature solves an age-old issue for T-SQL programmers and is worth its weight in gold. It is basically an equals comparison operator like =, but treats NULL as an individual value. Unlike =, this new operator returns only TRUE or FALSE, but not UNKNOWN. Writing queries that compare to values that can contain NULL is tedious, mostly because of code like the following:

The fact that both of these comparisons return False is confusing at times even to me, and I have written on the whole NULL comparison and negating NULL values things about as many times as I have dealt with it in production code. But using IS DISTINCT FROM, this is no longer the case:

Where this is really important is doing a query where you are looking for differences between two sets of data (often for a merge type operation). So consider the following table (from WideWorldImporters, which you can get here) :

This returns 37940 total rows and 11048 rows with a NULL population value. Now, let’s join the table to itself on the PK and the population value

This returns 26892 rows, which you can do the math, is 37940-11048. Looking at this, without thinking about NULL values (who does initially?), this has to return every row in the table. But clearly not. Usually this becomes obvious when a few customers living in one of those cities isn’t showing up on a report (or maybe even not getting their shipments.)

The pre-SQL Server 2022 way of handling this properly this was to do something like this:

Now we have checked the either they are the same value, or they both have a value of NULL. This query returns every row in the table, but it is kind of tricky code. And looking for differences is even more difficult, because you have to check to see if the values are different, if column 1 is null and column2 is not, and again vice versa. Another way this is often done is to change the population comparison to

Which is safe from a correctness standpoint (assuming you can coalesce your values to something that is 100% not possible), but not from a performance one. This eliminates index seek utilization for these columns and makes it slower. That isn’t always an issue, but for larger data sets, you may end up with more scans than you hoped.

Using the new syntax, we can simply write this as:

The name of the operator might be a little bit confusing because of the words FROM and DISTINCT, it really makes sense. DISTINCT has a seemingly different usage here, but really it is the same meaning. If the value is the same, it is not distinct from one another, and if it is different, it is distinct. And the DISTINCT operator in the SELECT clause honors NULL values as a single bucket too. Now, go back and change the previous query to IS DISTINCT FROM and 0 rows will be returned.

This feature would have saved me many many hours over the years! Is this alone a reason to upgrade to SQL Server 2022 alone? Since I have never been the one to write those checks, and I use the free Express edition for my hobby databases… I can say an unqualified “Yes” to that!

 

Load comments

About the author

Louis Davidson

Simple Talk Editor

See Profile

Louis is the editor of this Simple-Talk website. Prior to that, has was a corporate database developer and data architect for a non-profit organization for 25 years! Louis has been a Microsoft MVP since 2004, and is the author of a series of SQL Server Database Design books, most recently Pro SQL Server Relational Database Design and Implementation.