{"id":98602,"date":"2023-12-20T20:25:31","date_gmt":"2023-12-20T20:25:31","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=98602"},"modified":"2023-10-26T20:40:52","modified_gmt":"2023-10-26T20:40:52","slug":"counting-more-efficiently","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/t-sql-programming-sql-server\/counting-more-efficiently\/","title":{"rendered":"Counting more efficiently"},"content":{"rendered":"<p>Nearly a decade ago, I wrote a post called &#8220;<a href=\"https:\/\/sqlperformance.com\/2014\/10\/t-sql-queries\/bad-habits-count-the-hard-way\" target=\"_blank\" rel=\"noopener\">Bad habits : Counting rows the hard way<\/a>.&#8221; In that post, I talked about how we can use SQL Server&#8217;s metadata to instantly retrieve the row count for a table. Typically, people do the following, which has to read the entire table or index:<\/p>\n<pre class=\"theme:tomorrow-night font:consolas font-size:14 line-height:16 decode-attributes:false tab-convert:true lang:tsql decode:true whitespace-before:1 whitespace-after:1\"> DECLARE @c int = (SELECT COUNT(*) FROM dbo.TableName);<\/pre>\n<p>To largely avoid size-of-data constraints, we can instead use <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">sys.partitions<\/code>.<\/p>\n<div style=\"padding-left: 24px; margin: auto 0 24px 8px; border-left: 3px solid #aaa; color: #888;\">Note: We could use <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">OBJECT_ID()<\/code> instead of a join, but that function <a href=\"https:\/\/sqlblog.org\/2014\/09\/04\/s1-bad-habits-metadata-helper-functions\" target=\"_blank\" rel=\"noopener\">doesn&#8217;t observe isolation semantics<\/a>, so can cause blocking &#8211; or be a victim. A potential compromise is to create <a href=\"https:\/\/www.mssqltips.com\/sqlservertip\/3449\/making-sql-server-metadata-queries-easier-with-these-new-views\/?utm_source=AaronBertrand\" target=\"_blank\" rel=\"noopener\">standardized metadata views<\/a>, but I&#8217;ll leave that as an exercise for the reader.<\/div>\n<pre class=\"theme:tomorrow-night font:consolas font-size:14 line-height:16 decode-attributes:false tab-convert:true lang:tsql decode:true whitespace-before:1 whitespace-after:1\"> DECLARE @object int = (SELECT o.object_id \r\n    FROM sys.objects AS o\r\n      INNER JOIN sys.schemas AS s\r\n        ON o.[schema_id] = s.[schema_id]\r\n      WHERE o.name = N'TableName'\r\n        AND s.name = N'dbo');\r\n\r\n DECLARE @c int = (SELECT SUM([rows]) \r\n    FROM sys.partitions\r\n      WHERE index_id IN (0,1)\r\n        AND object_id = @object);<\/pre>\n<p>That&#8217;s great when you want to count the whole table without size-of-entire-table reads. It gets more complicated if you need to retrieve the count of rows that meet &#8211; or don&#8217;t meet &#8211; some criteria. Sometimes an index can help, but not always, depending on how complex the criteria might be.<\/p>\n<h4>An example<\/h4>\n<p>One example is the soft delete, where we have a <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">bit<\/code> column like <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">IsActive<\/code> that defaults to <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">1<\/code>, and only gets updated to <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">0<\/code> when a row gets &#8220;soft deleted.&#8221; Consider a table like this:<\/p>\n<pre class=\"theme:tomorrow-night font:consolas font-size:14 line-height:16 decode-attributes:false tab-convert:true lang:tsql decode:true whitespace-before:1 whitespace-after:1\"> CREATE TABLE dbo.Users\r\n (\r\n   UserID   int        NOT NULL,\r\n   Filler   char(2000) NOT NULL DEFAULT '',\r\n   IsActive bit        NOT NULL DEFAULT 1,\r\n   CONSTRAINT PK_Users PRIMARY KEY (UserID)\r\n );\r\n\r\n -- insert 100 rows:\r\n INSERT dbo.Users(UserID) \r\n   SELECT TOP (100) ROW_NUMBER() OVER (ORDER BY object_id)\r\n   FROM sys.all_columns\r\n   ORDER BY object_id; \r\n\r\n -- soft delete 3 rows:\r\n UPDATE dbo.Users SET IsActive = 0 WHERE UserID IN (66, 99, 4);<\/pre>\n<p>The metadata alone can&#8217;t tell us how many <em>active<\/em> users we have; only the total. But sometimes we want to know how many are active and how many are inactive. I have seen users implement such a query like this:<\/p>\n<pre class=\"theme:tomorrow-night font:consolas font-size:14 line-height:16 decode-attributes:false tab-convert:true lang:tsql decode:true whitespace-before:1 whitespace-after:1\"> SELECT [status] = 'Active', [count] = COUNT(*) \r\n   FROM dbo.Users WHERE IsActive = 1\r\n UNION ALL\r\n SELECT [status] = 'Inactive', [count] = COUNT(*) \r\n   FROM dbo.Users WHERE IsActive = 0;<\/pre>\n<p>Results:<\/p>\n<pre class=\"theme:tomorrow-night font:consolas font-size:14 line-height:16 decode-attributes:false tab-convert:true lang:rexx decode:true whitespace-before:1 whitespace-after:1\"> status       count\r\n -----------|-------|\r\n Active     |    97 |\r\n Inactive   |     3 |<\/pre>\n<p>Without any other indexes, this is achieved with two full scans:<\/p>\n<p><a href=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/10\/ab_count_plan_1.png\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/10\/ab_count_plan_1.png\" alt=\"A union produces two full scans\" width=\"1910\" height=\"706\" \/><\/a><br \/>\n<em style=\"color: #777; font-size: 0.875rem;\">A union produces two full scans<\/em><\/p>\n<p>A slightly more efficient query, that only scans the entire table once, is a conditional aggregation:<\/p>\n<pre class=\"theme:tomorrow-night font:consolas font-size:14 line-height:16 decode-attributes:false tab-convert:true lang:tsql decode:true whitespace-before:1 whitespace-after:1\"> SELECT Active   = COUNT(CASE IsActive WHEN 1 THEN 1 END),\r\n        InActive = COUNT(CASE IsActive WHEN 0 THEN 1 END)\r\n FROM dbo.Users;<\/pre>\n<p>Results:<\/p>\n<pre class=\"theme:tomorrow-night font:consolas font-size:14 line-height:16 decode-attributes:false tab-convert:true lang:rexx decode:true whitespace-before:1 whitespace-after:1\" style=\"width: 75%; min-width: 320px;\"> Active    Inactive\r\n --------|-----------|\r\n      97 |         3 |<\/pre>\n<p><a href=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/10\/ab_count_plan_2.png\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"alignnone size-full\" style=\"width: 75%; min-width: 320px;\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/10\/ab_count_plan_2.png\" alt=\"Conditional aggregation: just one scan\" \/><\/a><br \/>\n<em style=\"color: #777; font-size: 0.875rem;\">Conditional aggregation: just one scan<\/em><\/p>\n<h4>Why not create an index?<\/h4>\n<p>Conventional wisdom has told us to not create indexes involving <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">bit<\/code> columns, because they&#8217;re not very selective. If you&#8217;re using such a column to highlight the minority (e.g. only a tiny fraction of users ever get soft deleted), then an index on just the <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">IsActive<\/code> column wouldn&#8217;t be extremely useful:<\/p>\n<pre class=\"theme:tomorrow-night font:consolas font-size:14 line-height:16 decode-attributes:false tab-convert:true lang:tsql decode:true whitespace-before:1 whitespace-after:1\"> CREATE INDEX ix_IsActive ON dbo.Users(IsActive);<\/pre>\n<p>The above queries would still have to look at all the rows (the conditional aggregate would still perform a full scan, and the <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">UNION<\/code> would seek twice, at this low row count, but that could get worse at scale). Both plans:<\/p>\n<p><a href=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/10\/ab_count_plan_3a.png\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/10\/ab_count_plan_3a.png\" alt=\"An index might not be overly helpful\" width=\"1812\" height=\"1027\" \/><\/a><br \/>\n<em style=\"color: #777; font-size: 0.875rem;\">An index might not be overly helpful<\/em><\/p>\n<p>But what if we had a <em>filtered<\/em> index on just the minority case (where <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">IsActive = 0<\/code>)?<\/p>\n<pre class=\"theme:tomorrow-night font:consolas font-size:14 line-height:16 decode-attributes:false tab-convert:true lang:tsql decode:true whitespace-before:1 whitespace-after:1\"> DROP INDEX ix_IsActive ON dbo.Users;\r\n\r\n CREATE INDEX ix_IsInactive ON dbo.Users(IsActive) WHERE IsActive = 0;<\/pre>\n<p>This doesn&#8217;t help our existing queries much at all. The <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">UNION<\/code> can scan this smaller index to get the inactive count, but still has to scan the entire table for the active count. And the conditional aggregate still has to perform a full scan:<\/p>\n<p><a href=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/10\/ab_count_plan_4a.png\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/10\/ab_count_plan_4a.png\" alt=\"Even a filtered index might not be overly helpful\" width=\"1812\" height=\"1027\" \/><\/a><br \/>\n<em style=\"color: #777; font-size: 0.875rem;\">Even a filtered index might not be overly helpful<\/em><\/p>\n<p>The filtered index brings the two queries closer, but both still have performance tied to the size of the table.<\/p>\n<p>That said&#8230;<\/p>\n<h3>The filtered index buys us something else<\/h3>\n<p>Here&#8217;s a secret: with the filtered index, we can now satisfy both counts without ever touching the table at all. Assuming <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">IsActive<\/code> can&#8217;t be <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">NULL<\/code>, then <em>active<\/em> user count is simply the <em>total<\/em> subtract the <em>inactive<\/em> user count. So this query gets the same results without size-of-entire-index reads:<\/p>\n<pre class=\"theme:tomorrow-night font:consolas font-size:14 line-height:16 decode-attributes:false tab-convert:true lang:tsql decode:true whitespace-before:1 whitespace-after:1\"> WITH idx AS\r\n  (\r\n    SELECT i.[object_id], i.index_id\r\n    FROM sys.indexes AS i\r\n      INNER JOIN sys.objects AS o\r\n      ON i.[object_id] = o.[object_id]\r\n      INNER JOIN sys.schemas AS s\r\n      ON o.[schema_id] = s.[schema_id]\r\n      WHERE o.name = N'Users'\r\n        AND s.name = N'dbo'\r\n        AND \r\n        (\r\n          i.index_id IN (0,1)\r\n          OR i.filter_definition = N'([IsActive]=(0))'\r\n        )\r\n  ), agg AS\r\n  (\r\n    SELECT Total    = SUM(CASE WHEN p.index_id IN (0,1) THEN p.[rows] END), \r\n           Inactive = SUM(CASE WHEN p.index_id &gt; 1      THEN p.[Rows] END)\r\n    FROM sys.partitions AS p\r\n    INNER JOIN idx \r\n      ON p.[object_id] = idx.[object_id]\r\n      AND p.index_id = idx.index_id\r\n  )\r\n  SELECT Active = Total - Inactive, \r\n         Inactive\r\n  FROM agg;<\/pre>\n<p>Now, the query is a lot more complicated than it absolutely has to be, and produces a plan that is far more elaborate than you might expect (so elaborate I&#8217;m not going to show it). <strong>But you can simplify.<\/strong> If you know the <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">index_id<\/code> for the filtered index (in my case, <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">3<\/code>), for example, and are confident in <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">OBJECT_ID<\/code>:<\/p>\n<pre class=\"theme:tomorrow-night font:consolas font-size:14 line-height:16 decode-attributes:false tab-convert:true lang:tsql decode:true whitespace-before:1 whitespace-after:1\"> DECLARE @filter tinyint = 3;\r\n\r\n WITH agg AS\r\n (\r\n   SELECT Total    = SUM(CASE WHEN index_id &lt;&gt; @filter THEN [rows] END), \r\n          Inactive = SUM(CASE WHEN index_id =  @filter THEN [Rows] END)\r\n   FROM sys.partitions\r\n   WHERE [object_id] = OBJECT_ID(N'dbo.Users')\r\n     AND index_id IN (0, 1, @filter)\r\n )\r\n SELECT Active = Total - Inactive, \r\n        Inactive\r\n FROM agg;<\/pre>\n<p>You might not want to memorize the <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">index_id<\/code>, or hard-code that in production, since indexes can be dropped and re-created in any order. But if you can trust the index <em>name<\/em> to be stable, and for the filter definition to be accurate, you could replace the first line with the following:<\/p>\n<pre class=\"theme:tomorrow-night font:consolas font-size:14 line-height:16 decode-attributes:false tab-convert:true lang:tsql decode:true whitespace-before:1 whitespace-after:1\"> DECLARE @filter tinyint;\r\n\r\n SELECT @filter = index_id \r\n   FROM sys.indexes\r\n   WHERE name = N'ix_IsInactive'\r\n     AND has_filter = 1\r\n     AND [object_id] = OBJECT_ID(N'dbo.Users');<\/pre>\n<p><em>In either case, you may want error handling to indicate if the index wasn&#8217;t found.<\/em><\/p>\n<h3>Other examples<\/h3>\n<p>I used <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">bit<\/code> columns here to dovetail off an earlier post entitled &#8220;<a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/t-sql-programming-sql-server\/strategies-for-queries-against-bit-columns\/\">Strategies for queries against bit columns<\/a>.&#8221; But this technique isn&#8217;t limited to one of two values with significant skew; it can be applied in many more scenarios.<\/p>\n<p>For example, I&#8217;ve had dashboards that pull subsets of tables to give counts of things such as some related set of error status codes, and then a filtered index like <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">WHERE error_code IN (1, 12, 18456)<\/code>. Now the dashboard can go right to the metadata and pull a single count directly.<\/p>\n<p>You can do similar things with date ranges, e.g. have a filtered index re-created at the beginning of each month (or any other period), with a hard-coded start range, to simulate partitioning (at least for the current &#8220;partition&#8221;) without all the work. The metadata for that index will always show the total count of rows for the current month.<\/p>\n<p>And you could combine these as well; for example, a filtered index against some set of error codes this month. Or a filtered index for <em>each<\/em> error code this month, if they&#8217;re important independently, too.<\/p>\n<h3>Conclusion<\/h3>\n<p>In spite of ugly-looking underlying execution plans against system objects, this technique should be much faster than querying the table &#8211; particularly at scale. The only exception might be when the table is small or otherwise always in memory, <em>and<\/em> <code style=\"font-family: consolas; background: #e4e4e4; padding: 3px 4px 1px 4px; border-radius: 3px;\">sys.partitions<\/code> is massive (e.g. you have a ridiculous number of tables, partitions, and indexes). Going after the metadata is a technique you may want to consider if you&#8217;re performing a lot of queries for counts, filtered or not.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Nearly a decade ago, I wrote a post called &#8220;Bad habits : Counting rows the hard way.&#8221; In that post, I talked about how we can use SQL Server&#8217;s metadata to instantly retrieve the row count for a table. Typically, people do the following, which has to read the entire table or index: DECLARE @c&#8230;&hellip;<\/p>\n","protected":false},"author":341115,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[53,143531],"tags":[4296,147174],"coauthors":[158980],"class_list":["post-98602","post","type-post","status-publish","format-standard","hentry","category-featured","category-t-sql-programming-sql-server","tag-metadata","tag-t-sql-coding"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/98602","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/341115"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=98602"}],"version-history":[{"count":37,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/98602\/revisions"}],"predecessor-version":[{"id":98866,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/98602\/revisions\/98866"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=98602"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=98602"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=98602"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=98602"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}