Despite my recent job change from full time T-SQL code jockey/architect to website editor for Simple-Talk, I will always be at heart at T-SQL programmer. While I will mostly be writing professionally in the foreseeable future will in support of an article, I will also continue writing code for several databases that I use for multiple personal reason housed on a SQL Server Express server on the desk beside me.
The Invitation: T-SQL Tuesday #154 Invitation – SQL Server 2022 (glennsqlperformance.com)
When I saw the topic of this T-SQL Tuesday was to write about what I have been doing with SQL Server 2022, I figured I would note the two things I have been doing recently. First, I wrote a chapter about security changes in SQL Server 2002 for a forthcoming book entitled “SQL Server 2022 Administration Inside Out” for Pearson with a great group of people that should be out later this year, early next at the latest. There are a few things I found out writing that chapter that I am keen to use, one of them being the more granular UNMASK
permissions for the Dynamic Data Masking feature, but since I won’t be writing multi-user production code in the future, I probably won’t be masking any data, much less need granular masking capabilities.
The other thing I have been doing is trying out some of the new features coming in SQL Server 2022 that I will absolutely be using even in my hobby databases. There are tons of new features in 2022, as there always is. But the ones that excite me are the T-SQL improvements. In this article I am going to highlight 1 feature that is immediately a standout.
IS (NOT) DISTINCT FROM
The MOST exciting change from a T-SQL standpoint is: IS NOT DISTINCT FROM
. This feature solves an age-old issue for T-SQL programmers and is worth its weight in gold. It is basically an equals comparison operator like =, but treats NULL
as an individual value. Unlike =, this new operator returns only TRUE
or FALSE
, but not UNKNOWN
. Writing queries that compare to values that can contain NULL
is tedious, mostly because of code like the following:
1 2 |
SELECT CASE WHEN 1 = NULL THEN 'True' Else 'False' end, CASE WHEN NOT(1 = NULL) THEN 'True' Else 'False' end |
The fact that both of these comparisons return False is confusing at times even to me, and I have written on the whole NULL
comparison and negating NULL
values things about as many times as I have dealt with it in production code. But using IS DISTINCT FROM
, this is no longer the case:
1 2 3 4 |
SELECT CASE WHEN 1 IS DISTINCT FROM NULL THEN 'True' Else 'False' end, CASE WHEN NOT 1 IS DISTINCT FROM NULL THEN 'True' Else 'False' end |
Where this is really important is doing a query where you are looking for differences between two sets of data (often for a merge type operation). So consider the following table (from WideWorldImporters
, which you can get here) :
1 2 3 4 |
SELECT COUNT(*), SUM(CASE WHEN LatestRecordedPopulation IS NULL THEN 1 ELSE 0 END) FROM Application.Cities; |
This returns 37940 total rows and 11048 rows with a NULL population value. Now, let’s join the table to itself on the PK and the population value
1 2 3 4 5 6 |
SELECT * FROM Application.Cities JOIN Application.Cities AS C2 ON C2.CityID = Cities.CityID WHERE.LatestRecordedPopulation = Cities.LatestRecordedPopulation; |
This returns 26892 rows, which you can do the math, is 37940-11048. Looking at this, without thinking about NULL
values (who does initially?), this has to return every row in the table. But clearly not. Usually this becomes obvious when a few customers living in one of those cities isn’t showing up on a report (or maybe even not getting their shipments.)
The pre-SQL Server 2022 way of handling this properly this was to do something like this:
1 2 3 4 5 6 7 8 |
SELECT * FROM Application.Cities JOIN Application.Cities AS C2 ON C2.CityID = Cities.CityID AND C2.LatestRecordedPopulation = Cities.LatestRecordedPopulation OR (C2.LatestRecordedPopulation IS NULL AND Cities.LatestRecordedPopulation IS NULL); |
Now we have checked the either they are the same value, or they both have a value of NULL
. This query returns every row in the table, but it is kind of tricky code. And looking for differences is even more difficult, because you have to check to see if the values are different, if column 1 is null and column2 is not, and again vice versa. Another way this is often done is to change the population comparison to
1 2 |
AND COALESCE(C2.LatestRecordedPopulation.-100) = COALESCE(Cities.LatestRecordedPopulation,-100) |
Which is safe from a correctness standpoint (assuming you can coalesce your values to something that is 100% not possible), but not from a performance one. This eliminates index seek utilization for these columns and makes it slower. That isn’t always an issue, but for larger data sets, you may end up with more scans than you hoped.
Using the new syntax, we can simply write this as:
1 2 3 4 5 6 7 |
SELECT * FROM Application.Cities JOIN Application.Cities AS C2 ON C2.CityID = Cities.CityID AND C2.LatestRecordedPopulation IS NOT DISTINCT FROM Cities.LatestRecordedPopulation; |
The name of the operator might be a little bit confusing because of the words FROM
and DISTINCT
, it really makes sense. DISTINCT
has a seemingly different usage here, but really it is the same meaning. If the value is the same, it is not distinct from one another, and if it is different, it is distinct. And the DISTINCT
operator in the SELECT
clause honors NULL
values as a single bucket too. Now, go back and change the previous query to IS DISTINCT FROM
and 0 rows will be returned.
This feature would have saved me many many hours over the years! Is this alone a reason to upgrade to SQL Server 2022 alone? Since I have never been the one to write those checks, and I use the free Express edition for my hobby databases… I can say an unqualified “Yes” to that!
Load comments