{"id":98661,"date":"2023-11-20T16:13:08","date_gmt":"2023-11-20T16:13:08","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=98661"},"modified":"2023-11-20T13:23:09","modified_gmt":"2023-11-20T13:23:09","slug":"strategies-for-queries-against-bit-columns","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/t-sql-programming-sql-server\/strategies-for-queries-against-bit-columns\/","title":{"rendered":"Strategies for queries against bit columns"},"content":{"rendered":"<p>Recently someone posted a question where they couldn&#8217;t quite figure out how to construct a predicate based on a <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">bit<\/code> parameter. They tried to write a procedure like this, which wouldn&#8217;t parse, of course:<\/p>\n<pre class=\"theme:tomorrow-night font:consolas font-size:14 line-height:16 decode-attributes:false tab-convert:true lang:tsql decode:true whitespace-before:1 whitespace-after:1\"> CREATE PROCEDURE dbo.whatever\r\n   @flag bit = 0\r\n AS\r\n   SELECT * FROM dbo.tablename\r\n   WHERE\r\n     IF @flag = 1\r\n         flag_column = 1\r\n     IF @flag &lt;&gt; 1\r\n         flag_column = 0;<\/pre>\n<p>I explained that you can&#8217;t have control-of-flow inside a SQL statement like that, at least not in T-SQL. And that the way you should do it is as follows, if the table is sensible and the <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">bit<\/code> column doesn&#8217;t allow <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">NULL<\/code>:<\/p>\n<pre class=\"theme:tomorrow-night font:consolas font-size:14 line-height:16 decode-attributes:false tab-convert:true lang:tsql decode:true whitespace-before:1 whitespace-after:1\"> ... FROM dbo.tablename\r\n    WHERE flag_column = @flag;<\/pre>\n<p>And then &#8211; because the user didn&#8217;t include the table definition &#8211; I added that if the column <em>does<\/em> allow <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">NULL<\/code>, one way would be:<\/p>\n<pre class=\"theme:tomorrow-night font:consolas font-size:14 line-height:16 decode-attributes:false tab-convert:true lang:tsql decode:true whitespace-before:1 whitespace-after:1\"> ... FROM dbo.tablename\r\n    WHERE COALESCE(flag_column, 0) = @flag;<\/pre>\n<p>Someone immediately mentioned that the latter option was not sargable. Yes, that&#8217;s absolutely true. For <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">bit<\/code> columns, I generally assume there isn&#8217;t an index. But on the other hand, I have always been an advocate for writing queries as if a supporting index were there; even though it might not exist yet, someone could create it tomorrow.<\/p>\n<h3>An example<\/h3>\n<p>Let&#8217;s consider a table like this, with 10 rows where <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">flag_column = 1<\/code>, 85 rows where it is <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">0<\/code>, and 5 rows where it is <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">NULL<\/code>:<\/p>\n<pre class=\"theme:tomorrow-night font:consolas font-size:14 line-height:16 decode-attributes:false tab-convert:true lang:tsql decode:true whitespace-before:1 whitespace-after:1\">CREATE TABLE dbo.tablename\r\n (\r\n   id          int IDENTITY,\r\n   flag_column bit,\r\n   filler      char(4000) NOT NULL DEFAULT '',\r\n   CONSTRAINT  PK_tn PRIMARY KEY(id)\r\n );\r\n\r\n INSERT dbo.tablename(flag_column) \r\n   SELECT TOP (10) 1    FROM sys.all_columns\r\n   UNION ALL \r\n   SELECT TOP (85) 0    FROM sys.all_columns\r\n   UNION ALL\r\n   SELECT TOP (5)  NULL FROM sys.all_columns;<\/pre>\n<p>The plan for the query with <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">COALESCE<\/code> looks like this (<code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">@flag = 0<\/code> on the left, <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">@flag = 1<\/code> on the right):<\/p>\n<p><img decoding=\"async\" style=\"width: 66%; min-width: 570px;\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/10\/bit-plan-01.png\" alt=\"Scans all around\" \/><\/p>\n<p>In the unlikely event you have an index that leads on <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">flag_column<\/code>:<\/p>\n<pre class=\"theme:tomorrow-night font:consolas font-size:14 line-height:16 decode-attributes:false tab-convert:true lang:tsql decode:true whitespace-before:1 whitespace-after:1\"> CREATE INDEX IX_flag ON dbo.tablename(flag_column);<\/pre>\n<p>The optimizer might still ignore the index and still use a clustered index scan, depending on what parameter value was passed in on first compile. So, it may be beneficial to write the query this way (assuming that <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">NULL<\/code> and <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">0<\/code> are equivalent):<\/p>\n<pre class=\"theme:tomorrow-night font:consolas font-size:14 line-height:16 decode-attributes:false tab-convert:true lang:tsql decode:true whitespace-before:1 whitespace-after:1\"> ... FROM dbo.tablename\r\n    WHERE flag_column = @flag\r\n      OR (@flag = 0 AND flag_column IS NULL)\r\n      OPTION (RECOMPILE);<\/pre>\n<p>Now, not everyone likes query hints, but in this case we get a much more favorable plan &#8211; however, only for the <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">@flag = 1<\/code> case. With <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">@flag = 0<\/code>, we still get a clustered index scan:<\/p>\n<p><img decoding=\"async\" style=\"width: 50%; min-width: 420px;\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/10\/bit-plan-02.png\" alt=\"Seek for @flag = 1\" \/><\/p>\n<p>And depending on several other factors, this plan would only be chosen when scanning the narrow index <em>and<\/em> looking up the additional data nets less work than just scanning the whole table.<\/p>\n<p>And again, even if that index doesn&#8217;t exist yet, having the query formulated <em>as if it were there<\/em>, since it can do no worse than the <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">COALESCE<\/code> approach, is safer and more forward-compatible.<\/p>\n<h3>So then I started thinking&#8230;<\/h3>\n<p>This sent me in a spiral thinking about how we rarely create indexes where the key leads with a <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">bit<\/code> column, and why that is.<\/p>\n<p>Let&#8217;s recap: sargability is only a concern in this case if there is a valid index to use (and one that has a chance at covering the rest of the query) and that the index is useful enough to be considered even for a scan, never mind a seek, depending on how much of the table matches the (probably sniffed and cached) parameter value and how well the index covers the query. Given <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">SELECT *<\/code>, not very likely, unless this was a very narrow table.<\/p>\n<p>Rarely is a <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">bit<\/code> column a good candidate for a leading index key because the selectivity just isn&#8217;t there. If SQL Server is going to have to scan an estimated 50% of the index anyway, and then perform lookups for every row for all the non-covered columns, it&#8217;s just not going to pick the index. The exception is when the data is skewed much more heavily toward <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">0<\/code> or <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">1<\/code>.<\/p>\n<p>And in that case, a <em>filtered<\/em> index is potentially better. But a filtered index wouldn&#8217;t be considered using the above query because the plan generated for the parameterized query has to be able to satisfy parameter values of both <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">0<\/code> <em>and<\/em> <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">1<\/code>. It&#8217;s not necessarily beneficial to create both filtered indexes, because only one of them will be desirable depending on data skew, and they won&#8217;t be useful unless you also use <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">OPTION (RECOMPILE)<\/code>. I&#8217;m not afraid of that hint but, in situations like this where we want the filtered index to be chosen, but we don&#8217;t want to add query hints, we&#8217;ve resorted to interpolating the parameter value into the query text (either in the application code, or using dynamic SQL) or using branching.<\/p>\n<h3>&#8230;let&#8217;s try a filtered index<\/h3>\n<p>Let&#8217;s drop the original index, and create a new filtered index catering to the case where we know it will be most useful (<code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">@flag = 1<\/code>):<\/p>\n<pre class=\"theme:tomorrow-night font:consolas font-size:14 line-height:16 decode-attributes:false tab-convert:true lang:tsql decode:true whitespace-before:1 whitespace-after:1\"> DROP INDEX IX_Flag ON dbo.tablename;\r\n\r\n CREATE INDEX IX_Flag_1 ON dbo.tablename(flag_column)\r\n   WHERE flag_column = 1;<\/pre>\n<p>If we run the two original queries again (<code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">WHERE flag_column = @flag;<\/code> and <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">WHERE COALESCE(flag_column, 0) = @flag;<\/code>), at least without <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">OPTION (RECOMPILE)<\/code>, we get a clustered index scan; neither query considers the filtered index. To do that without the query hint, you&#8217;d need to build the query text <em>without parameters<\/em> in the application, or use one of the following constructs inside the procedure:<\/p>\n<pre class=\"theme:tomorrow-night font:consolas font-size:14 line-height:16 decode-attributes:false tab-convert:true lang:tsql decode:true whitespace-before:1 whitespace-after:1\"> IF @flag = 1\r\n BEGIN\r\n   SELECT * FROM dbo.tablename\r\n     WHERE flag_column = 1;\r\n END\r\n ELSE\r\n BEGIN\r\n   SELECT * FROM dbo.tablename\r\n     WHERE flag_column = 0\r\n        OR flag_column IS NULL;\r\n END\r\n\r\n \/* or - sql injection jokes aside please *\/\r\n\r\n DECLARE @sql nvarchar(max) = N'SELECT * FROM dbo.tablename\r\n   WHERE flag_column = ' + CASE @flag \r\n      WHEN 1 THEN N'1;' ELSE\r\n      N'0 OR flag_column IS NULL;' END;\r\n    \r\n EXEC sys.sp_executesql @sql;<\/pre>\n<p>For the <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">@flag = 0<\/code> case, we still get a clustered index scan, as expected. For the <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">@flag = 1<\/code> case, we get a slightly more pleasing seek on the filtered index, accompanied by a key lookup to get the remainder of the data:<\/p>\n<p><img decoding=\"async\" style=\"width: 66%; min-width: 570px;\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/10\/bit-plan-03a.png\" alt=\"Index seek with a lookup\" \/><br \/>\n<em>(The warning on the left is a misguided missing index recommendation for a non-filtered index; on the right, we have a benign unmatched index warning, which you can <a href=\"https:\/\/sqlperformance.com\/2019\/12\/t-sql-queries\/filtered-indexes-forced-parameterization\" target=\"_blank\" rel=\"noopener\">read more about here<\/a>.)<\/em><\/p>\n<h3>But I don&#8217;t like key lookups, either&#8230;<\/h3>\n<p>That key lookup will be less and less attractive to the optimizer the more the table grows and the wider that lookup becomes. At a certain point, SQL Server will deem it too expensive, and go back to a scan. If we want to try to eliminate the lookup, we can stop using <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">SELECT *<\/code> and only select covered columns, or we can re-create the index with the additional column(s) in the <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">INCLUDE<\/code>:<\/p>\n<pre class=\"theme:tomorrow-night font:consolas font-size:14 line-height:16 decode-attributes:false tab-convert:true lang:tsql decode:true whitespace-before:1 whitespace-after:1\"> CREATE INDEX IX_Flag_1 ON dbo.tablename(flag_column)\r\n    INCLUDE(filler)\r\n    WHERE flag_column = 1\r\n    WITH (DROP_EXISTING = ON);<\/pre>\n<p>Now the execution plan for the <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">@flag = 1<\/code> case looks like this (still with the benign unmatched index warning):<\/p>\n<p><img decoding=\"async\" style=\"width: 50%; min-width: 350px;\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/10\/bit-plan-04.png\" alt=\"Index seek and no lookup\" \/><\/p>\n<h3>Final thoughts<\/h3>\n<p>None of these approaches is wrong; index tuning is always a balance of art and science and often involves subjective trade-offs. This just highlights that we need to be careful about using bit columns that are going to be involved in a significant portion of our workload, particularly if the distribution is not even.<\/p>\n<p>For <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">bit<\/code> columns specifically, think about whether the column should allow <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">NULL<\/code>, and why. The query above would be a lot simpler to optimize if it could only be <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">0<\/code> or <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">1<\/code>.<\/p>\n<p>You should also consider that in cases where you use a filtered index but the key involves other columns, you can really help the optimizer out by adding the filtering column(s) to the end of the key list (this is <a href=\"https:\/\/learn.microsoft.com\/en-us\/sql\/relational-databases\/indexes\/create-filtered-indexes#Restrictions\" target=\"_blank\" rel=\"noopener\">mentioned in the documentation<\/a>). Without it, it can be harder to use tricks to persuade the optimizer to choose your filtered index, something I&#8217;ll address in a future post.<\/p>\n<p>Oh, and we never use <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">SELECT *<\/code> in production code, right?<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Recently someone posted a question where they couldn&#8217;t quite figure out how to construct a predicate based on a bit parameter. They tried to write a procedure like this, which wouldn&#8217;t parse, of course: CREATE PROCEDURE dbo.whatever @flag bit = 0 AS SELECT * FROM dbo.tablename WHERE IF @flag = 1 flag_column = 1 IF&#8230;&hellip;<\/p>\n","protected":false},"author":341115,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[53,143531],"tags":[5332,4183],"coauthors":[158980],"class_list":["post-98661","post","type-post","status-publish","format-standard","hentry","category-featured","category-t-sql-programming-sql-server","tag-bit","tag-t-sql"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/98661","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/341115"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=98661"}],"version-history":[{"count":22,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/98661\/revisions"}],"predecessor-version":[{"id":100565,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/98661\/revisions\/100565"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=98661"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=98661"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=98661"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=98661"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}