{"id":2181,"date":"2016-03-07T00:00:00","date_gmt":"2016-03-07T00:00:00","guid":{"rendered":"https:\/\/test.simple-talk.com\/uncategorized\/when-auto_update_statistics-doesnt-happen\/"},"modified":"2021-09-29T16:21:22","modified_gmt":"2021-09-29T16:21:22","slug":"when-auto_update_statistics-doesnt-happen","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/t-sql-programming-sql-server\/when-auto_update_statistics-doesnt-happen\/","title":{"rendered":"When AUTO_UPDATE_STATISTICS Doesn&#8217;t Happen"},"content":{"rendered":"<div class=\"article-content\">\n<p class=\"start\">If you are a  conscientious database developer, you will be checking the execution plan of  your queries to see if any optimization improvements are possible. You might &#160;expect an update of outdated statistics  after loading a bunch of data into a table, but discover that with the &#160;query you are executing, the cached plan  does not then update the statistics.<\/p>\n<p>When you look into  the Online Help of Microsoft SQL Server about statistics update, you will find  the following statement: &#8220;The query optimizer checks for out-of-date statistics <b>before compiling a query<\/b> and <b>before executing a cached query plan<\/b>.  Before compiling a query, the query optimizer uses the columns, tables, and  indexed views in the query predicate to determine which statistics might be  out-of-date. Before executing a cached query plan, the Database Engine verifies  that the query plan references up-to-date statistics.&#8221; (<a href=\"https:\/\/msdn.microsoft.com\/en-us\/library\/bb522682.aspx\">https:\/\/msdn.microsoft.com\/en-us\/library\/bb522682.aspx<\/a>).<\/p>\n<p>This article will  describe the &#8220;intelligence&#8221; of the SQL Server query optimizer when it comes to  the decision that it doesn&#8217;t actually need updated statistics to determine an  appropriate plan for a query and so doesn&#8217;t update an out-of-date statistics  object.<\/p>\n<h1>Why does SQL Server use statistics?<\/h1>\n<p>The query optimizer  of Microsoft SQL Server is a cost-based optimizer. The calculation of costs  depend on several facets of information. One of the most important of these is  the information about the number of rows that are expected to be returned when  the query is executed. This information is gleaned from what is called  &#8216;statistics&#8217;. If statistics are not accurate enough, the query optimizer of  Microsoft SQL Server may choose an inappropriate execution plan and if it  overestimates the number of records, it will allocate too much memory for the  execution.<\/p>\n<p>The  AUTO_UPDATE_STATISTICS option is a database setting. When this option is set,  the query optimizer checks whether the statistics are outdated in a statistics  object before basing a plan on it. Statistics are judged to be outdated when,  roughly, the following condition is true:<\/p>\n<ul>\n<li>The index  \/ statistic has more than 500 unique entries<\/li>\n<li>&gt;= 500 +  20% of cardinality values have been changed<br \/> (<a href=\"https:\/\/support.microsoft.com\/en-us\/kb\/195565\">https:\/\/support.microsoft.com\/en-us\/kb\/195565<\/a>)<\/li>\n<\/ul>\n<h3>Note<\/h3>\n<p>The demo scripts in  this article will use undocumented trace flags. If you want to replay the demos  please remember not to do the replay in a production environment. The following  trace flags will be used within the scripts:<\/p>\n<ul>\n<li>TF 3604:&#160;&#160;&#160;&#160;&#160;&#160;&#160; The output of information will be rerouted to the client instead of the  error log<\/li>\n<\/ul>\n<p><strong>SQL Server &lt;= 2012<\/strong><\/p>\n<ul>\n<li>TF 9204:&#160;&#160;&#160;&#160;&#160;&#160;&#160; statistics which end up being fully loaded and used<a id=\"ftnref1\" href=\"http:\/\/sqlblog.com\/blogs\/paul_white\/archive\/2011\/09\/21\/how-to-find-the-statistics-used-to-compile-an-execution-plan.aspx\"><span class=\"MsoFootnoteReference\">[1]<\/span><\/a><\/li>\n<li>TF 9292:&#160;&#160;&#160;&#160;&#160;&#160;&#160; statistics objects which are considered &#8216;interesting&#8217; by the query  optimizer<a id=\"ftnref2\" href=\"http:\/\/sqlblog.com\/blogs\/paul_white\/archive\/2011\/09\/21\/how-to-find-the-statistics-used-to-compile-an-execution-plan.aspx\"><span class=\"MsoFootnoteReference\">[2]<\/span><\/a><\/li>\n<\/ul>\n<p> \t<strong>SQL Server 2014<\/strong><\/p>\n<ul>\n<li>TF 2363&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; statistics being loaded<\/li>\n<\/ul>\n<h1>Test environment<\/h1>\n<p>The test environment  is a Microsoft SQL Server 2012 EE (11.0.6020) and uses a table called <b>dbo.Customer <\/b>with app. 10,500  records. If you run the example with SQL&#160;Server&#160;2014 make sure to use the  appropriate TF in this case. The table uses two indexes. A unique clustered  index is on the <b>ID <\/b>attribute and an  additional non-unique non-clustered index is used for the attribute <b>ZIP.<\/b><\/p>\n<pre>-- Create the demo table \nIF OBJECT_ID(N'dbo.Customer', N'U') IS NOT NULL\n&#160;&#160; DROP TABLE dbo.Customer;\n&#160;&#160; GO\n&#160;\nCREATE TABLE dbo.Customer&#160; \n(\n&#160;&#160; Id&#160;&#160;&#160;&#160; INT&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; NOT NULL IDENTITY (1, 1),&#160; \n&#160;&#160; Name&#160;&#160; VARCHAR(100) NOT NULL,&#160; \n&#160;&#160; Street VARCHAR(100) NOT NULL,&#160; \n&#160;&#160; ZIP&#160;&#160;&#160; CHAR(5)&#160;&#160;&#160;&#160;&#160; NOT NULL,&#160; \n&#160;&#160; City&#160;&#160; VARCHAR(100) NOT NULL \n);\nGO&#160; \n&#160;\n-- and fill it with ~10,000 records \nINSERT INTO dbo.Customer WITH (TABLOCK)\n(Name, Street, ZIP, CIty)\nSELECT 'Customer ' + CAST(message_id AS VARCHAR(10)),&#160; \n&#160;&#160;&#160;&#160;&#160;&#160; 'Street ' + CAST(severity AS VARCHAR(10)),&#160; \n&#160;&#160;&#160;&#160;&#160;&#160; severity * 1000,&#160; \n&#160;&#160;&#160;&#160;&#160;&#160; LEFT(text, 100)&#160; \nFROM&#160;&#160; sys.messages&#160; \nWHERE&#160; language_id = 1033;&#160; \nGO&#160; \n&#160;\n-- than we create two indexes for accurate statistics \nCREATE UNIQUE INDEX ix_Customer_ID ON dbo.Customer (Id);\nCREATE NONCLUSTERED INDEX ix_Customer_ZIP ON dbo.Customer (ZIP);\nGO \n&#160;\n-- what statistics will be used by different queries\n-- result of implemented statistics \nSELECT S.object_id, \n&#160;&#160;&#160;&#160;&#160;&#160; S.name, \n&#160;&#160;&#160;&#160;&#160;&#160; DDSP.last_updated, \n&#160;&#160;&#160;&#160;&#160;&#160; DDSP.rows, \n&#160;&#160;&#160;&#160;&#160;&#160; DDSP.modification_counter \nFROM&#160;&#160; sys.stats AS S \n&#160;&#160;&#160;&#160;&#160;&#160; CROSS APPLY sys.dm_db_stats_properties(S.object_id, S.stats_id) AS DDSP \nWHERE&#160; S.object_id = OBJECT_ID(N'dbo.Customer', N'U'); \nGO\n<\/pre>\n<p> <img loading=\"lazy\" decoding=\"async\" width=\"528\" height=\"86\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/2381-clip_image001.png\" alt=\"2381-clip_image001.png\" \/><\/p>\n<p>The image shows the  two statistics objects with accurate values. The [modification_counter]  indicates no actual changes to the cardinality values. The clustered index is a  unique index but the nonclustered index is not.  &#160;You&#8217;d then probably want to investigate the distribution of data within  this latter non-clustered non-unique index. The following command must be  executed in order to see the data in the histogram of the statistics:<\/p>\n<pre>-- show the distribution of data in the statistics \nDBCC SHOW_STATISTICS ('dbo.Customer', 'ix_Customer_ZIP') WITH HISTOGRAM; \nGO\n<\/pre>\n<p> <img loading=\"lazy\" decoding=\"async\" width=\"583\" height=\"311\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/2381-clip_image002.png\" alt=\"2381-clip_image002.png\" \/><\/p>\n<p>The distribution of  the cardinal values is irregular. While the ZIP 12000 is represented with only  one record the ZIP 16000 has a distribution of more than 7,500 records in the  index. The number of rows returned by a query that uses this index will vary  wildly depending on the value(s) of the ZIP requested in the <code>WHERE<\/code> clause. A  parameterized query against the ZIP could therefore easily cause &#8216;parameter  sniffing&#8217; problems because but the &#8216;parameter sniffing&#8217; issue might be better as  the topic for another article.<\/p>\n<h1>Queries<\/h1>\n<p>When the table has  been created and filled with data, the test queries can start. Two parameterized  queries will be fired against the table using sp_executesql. Both queries will  address different indexes. Both queries will use the &#8220;=&#8221; operator for the  predicate for a highly selective result.<\/p>\n<pre>-- TF for SQL Server 2012!\nDBCC TRACEON (3604, 9204, 9292);\nGO\n&#160;\nDECLARE @stmt NVARCHAR(1000) = N'SELECT * FROM dbo.Customer WHERE Id = @Id;';\nDECLARE @parm NVARCHAR(100) = N'@Id INT';\nEXEC sp_executesql @stmt, @parm, 10;\nGO\n&#160;\nDECLARE @stmt NVARCHAR(1000) = N'SELECT * FROM dbo.Customer WHERE ZIP = @ZIP;';\nDECLARE @parm NVARCHAR(100) = N'@ZIP CHAR(5)';\nEXEC sp_executesql @stmt, @parm, '18000';\nGO\n&#160;\nDBCC TRACEOFF (3604, 9204, 9292);\nGO\n<\/pre>\n<p>The first query used  the index [ix_customer_id] for an<code> INDEX SEEK<\/code> and the second query used a  performant <code>INDEX SEEK<\/code> on the index <code>[ix_customer_zip<\/code>] as is shown by the  execution plans.<\/p>\n<h2> <img loading=\"lazy\" decoding=\"async\" width=\"602\" height=\"462\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/2381-clip_image003.png\" alt=\"2381-clip_image003.png\" \/>Unique  Index<\/h2>\n<p>A query with an  &#8220;=&#8221;-operator on a unique index will always result in an <code>INDEX SEEK<\/code> with a  predictable result of ONE record. We can be confident of this prediction.<\/p>\n<h2>Non-Unique Index<\/h2>\n<p>The query on a  specified ZIP can lead to a widely-varying number of records in the result set.  This can lead to different execution plans depending on the prediction for this  number. If there is a high cardinality the query optimizer will always choose an  <code>INDEX SEEK<\/code>. If the amount of data is likely to be greater, an <code>INDEX SCAN<\/code> may be  the result, because the overall plan would be likely to perform better.<\/p>\n<p>In contrast to the  usage of a unique index, a non-unique index for an uneven distribution provides  more of a problem for the optimizer in devising a good execution strategy  because the predicted number of records returned will be different for every  single value of the predicate!<\/p>\n<h1>Changing data for outdated statistics<\/h1>\n<p>With 10,557 actual  records in the statistics data, a minimum of 1,612 entries (<code>INSERT, UPDATE,  DELETE<\/code>) have to be changed to mark the statistics as being outdated: 10.557 *  20% + 500. The upcoming code will insert 4,000 additional records into the  table.<\/p>\n<pre>-- additional 4,000 records will be inserted into the table \n-- to make the stats become outdated!\nINSERT INTO dbo.Customer WITH (TABLOCK)&#160; \n(Name, Street, ZIP, City)\nSELECT TOP 4000 \n&#160;&#160;&#160;&#160;&#160;&#160; 'Customer ' + CAST(message_id AS VARCHAR(10)),&#160; \n&#160;&#160;&#160;&#160;&#160;&#160; 'Street ' + CAST(severity AS VARCHAR(10)),&#160; \n&#160;&#160;&#160;&#160;&#160;&#160; severity * 1000,&#160; \n&#160;&#160;&#160;&#160;&#160;&#160; LEFT(text, 100)&#160; \nFROM&#160;&#160; sys.messages&#160; \nWHERE&#160; language_id = 1033;&#160; \nGO\n<\/pre>\n<p> <img loading=\"lazy\" decoding=\"async\" width=\"534\" height=\"87\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/2381-clip_image004.png\" alt=\"2381-clip_image004.png\" \/><\/p>\n<p>The result shows  that 4,000 records for each index have been modified. This counter value remains  in the statistics until the query optimizer runs another query against the  relevant index object and checks the number of changes that have occurred since  the statistics were last updated. If the number of changes is higher than the  threshold, the statistics will then be updated and the query plan based on the  old values will become invalid. A new plan will be generated, based on the new  statistic values. The Online Help of Microsoft SQL Server tells us that the  statistics for both indexes will be refreshed when the queries will be executed  again.<\/p>\n<pre>DBCC TRACEON (3604, 9204, 9292);\nGO\n&#160;\nDECLARE @stmt NVARCHAR(1000) = N'SELECT * FROM dbo.Customer WHERE Id = @Id;';\nDECLARE @parm NVARCHAR(100) = N'@Id INT';\nEXEC sp_executesql @stmt, @parm, 10;\nGO\n&#160;\nDECLARE @stmt NVARCHAR(1000) = N'SELECT * FROM dbo.Customer WHERE ZIP = @ZIP;';\nDECLARE @parm NVARCHAR(100) = N'@ZIP CHAR(5)';\nEXEC sp_executesql @stmt, @parm, '18000';\nGO\n&#160;\nDBCC TRACEOFF (3604, 9204, 9292);\nGO\n<\/pre>\n<p> <img loading=\"lazy\" decoding=\"async\" width=\"601\" height=\"49\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/2381-clip_image006.jpg\" alt=\"2381-clip_image006.jpg\" \/><\/p>\n<p>The result may be  surprising. The first query didn&#8217;t force an update of the statistics; these have  not been touched again by the query optimizer. The second query against the ZIP  has forced an update of the underlying statistics. The statistics have been  found as &#8220;interesting&#8221; and therefore checked. A closer look to the statistics  object of the indexes shows the situation.<\/p>\n<pre>SELECT S.object_id, \n&#160;&#160;&#160;&#160;&#160;&#160; S.name, \n&#160;&#160;&#160;&#160;&#160;&#160; DDSP.last_updated, \n&#160;&#160;&#160;&#160;&#160;&#160; DDSP.rows, \n&#160;&#160;&#160;&#160;&#160;&#160; DDSP.modification_counter \nFROM&#160;&#160; sys.stats AS S \n&#160;&#160;&#160;&#160;&#160;&#160; CROSS APPLY sys.dm_db_stats_properties(S.object_id, S.stats_id) AS DDSP \nWHERE&#160; S.object_id = OBJECT_ID(N'dbo.Customer', N'U'); \nGO\n<\/pre>\n<p> <img loading=\"lazy\" decoding=\"async\" width=\"528\" height=\"83\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/2381-clip_image007.png\" alt=\"2381-clip_image007.png\" \/><\/p>\n<p>While the statistics  for<code> ix_Customer_ZIP<\/code> have been updated and the new records are now part of the  [rows] the outdated statistics for the unique index has not been touched.<\/p>\n<h1>What is the reason for that behavior?<\/h1>\n<div class=\"indent\">\n<h3>Unique index<\/h3>\n<p>When a unique index  is requested on its key attribute, it is not necessary for Microsoft SQL Server  to obtain any statistics when the requested data are searched by an &#8220;=&#8221;  operator. In this case Microsoft SQL Server will always &#8220;know\/estimate&#8221; that  only one record will be returned. If the operator for the predicate is forcing a  range scan in the index the statistics need to be accurate because for one  reason; the amount of data in between two predicates &#8220;may&#8221; vary! For the above  example, the statistics would not have been required in order to execute the  query because a compiled plan is already available in the plan cache. Microsoft  SQL Server checks for a valid plan in the plan cache and will reuse it if it is  available. A further check for statistics is not necessary because the plan with  all its estimates is stable.<\/p>\n<h3>Non-Unique Index<\/h3>\n<p>For the second query  against the ZIP-Code, the stability looks doubtful. The index does not guarantee  unique values. The amount of data returned for each key value in the index may  vary. The plan is &#8216;unstable&#8217; in terms of the amount of data returned by the  query using the plan with different values for a key attribute. If the plan is  not stable, Microsoft SQL Server has to refresh the information about the  statistics in order to determine whether a new plan is required. Therefore it  checks the statistics before using the cached plan, finds that the threshold for  auto-refresh of the statistics has been exceeded and so updates the underlying  statistics.<\/p>\n<\/div>\n<h1>Conclusion<\/h1>\n<p>Statistics are  important and necessary for well-performing queries. Outdated statistics that no  longer reflect the current distribution of data in the table are very often the  reason for badly-performing queries because the query optimizer chooses the best  plan based on the information in the statistics.<\/p>\n<p>The query optimizer  can recognize when a reload of updated statistics is required. If the predicate  of a query runs against a unique key value of the index, the query processor is  aware of it and does not load the statistics again. If the query is dealing with  &#8220;unsafe&#8221; predicates which require <code>INDEX SCAN<\/code>s, statistics have to be reloaded  because the amount of records may change.<\/p>\n<h2>References<\/h2>\n<ul>\n<li> \t\t[1]&amp;[2] <a href=\"http:\/\/sqlblog.com\/blogs\/paul_white\/archive\/2011\/09\/21\/how-to-find-the-statistics-used-to-compile-an-execution-plan.aspx\"> \t\tHow to find the statistics used to compile an execution plan<\/a><\/li>\n<\/ul>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>When your SQL Server database is set to have its statistics automatically updated, you will probably conclude that, whenever the distribution statistics are out-of-date, they will be updated before the next query is executed against that index or table. Curiously, this isn&#8217;t always the case. What actually happens is that the statistics only gets updated if  needed by the query optimiser to determine an effective query plan.&hellip;<\/p>\n","protected":false},"author":205038,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143531],"tags":[4168,5084,4150,4151,4252],"coauthors":[],"class_list":["post-2181","post","type-post","status-publish","format-standard","hentry","category-t-sql-programming-sql-server","tag-database","tag-optimiser","tag-sql","tag-sql-server","tag-t-sql-programming"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/2181","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/205038"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=2181"}],"version-history":[{"count":6,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/2181\/revisions"}],"predecessor-version":[{"id":92510,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/2181\/revisions\/92510"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=2181"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=2181"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=2181"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=2181"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}