{"id":769,"date":"2010-01-07T00:00:00","date_gmt":"2010-01-07T00:00:00","guid":{"rendered":"https:\/\/test.simple-talk.com\/uncategorized\/13-things-you-should-know-about-statistics-and-the-query-optimizer\/"},"modified":"2021-09-29T16:22:02","modified_gmt":"2021-09-29T16:22:02","slug":"13-things-you-should-know-about-statistics-and-the-query-optimizer","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/t-sql-programming-sql-server\/13-things-you-should-know-about-statistics-and-the-query-optimizer\/","title":{"rendered":"13 Things You Should Know About Statistics and the Query Optimizer"},"content":{"rendered":"<div id=\"pretty\">\n<p class=\"start\">You don&#8217;t you think thirteen is a good number? Well, here in Brazil we&#8217;ve a famous soccer player and coach called <a href=\"http:\/\/translate.google.co.uk\/translate?hl=en&amp;sl=pt&amp;u=http:\/\/pt.wikipedia.org\/wiki\/M%25C3%25A1rio_Jorge_Zagallo&amp;ei=jR0hS8SNCoiI4gbv_4DtCQ&amp;sa=X&amp;oi=translate&amp;ct=result&amp;resnum=1&amp;ved=0CA4Q7gEwAA&amp;prev=\/search%3Fq%3DZagalo%26hl%3Den%26client%3Dfirefox-a%26rls%3D\">Mario Zagalo<\/a> who really likes it. This obviously has nothing at all to do with Query Optimizer, so let&#8217;s get started with the good stuff.<\/p>\n<p>When we are talking about query plans and optimization, there are some things that you really should know. The SQL Query Optimizer employs many techniques to create a optimal execution plan, and today I&#8217;ll present some terminology and features used.<\/p>\n<hr \/>\n<h2>1 &#8211; Selectivity<\/h2>\n<p>We can define selectivity as:<\/p>\n<p class=\"indented\"><em>The degree to which one value can be differentiated within a wider group of similar values<\/em>.<\/p>\n<p>For instance, when Mario Zagalo was a coach of Brazilian soccer team, one of his hardest jobs was select the players for each match. Brazil is a huge country and, as you probably know, we have a lot of very good players. So, the selectivity of good players in Brazil is very low, because there are a lot of them. If I ask to you to select just players who would make good attackers, you would probably return to me with a lot of guys, and returning with many players lowers the selectivity.<\/p>\n<p>In database terms, suppose we have a table called <em>Internationally_Renown_Players<\/em> with a column called <em>Country<\/em>. If we write:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT * FROM Internationally_Renown_Players WHERE Country= 'Brazil'<\/pre>\n<p>\u00a0&#8230; then we can say that this query is not very selective, because many rows will be returned (<strong>Update<\/strong>: Sadly, this didn&#8217;t help us in the 2010 World Cup). However, if we write:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT * FROM Internationally_Renown_Players WHERE Location = 'Tuvalu'\r\n<\/pre>\n<p>&#8230; then this query will be very selective, because it will return very few rows (I know if they have a soccer team, but they have yet to really make a name for themselves.)<\/p>\n<p>Another good (and broader interest) example would be to have a table called Customers, with columns for Gender and, say, Passport_Number. The selectivity of the Gender column is very low because we can&#8217;t do much filtering with just the <em>F<\/em> or <em>M<\/em> values, and so many rows will be returned by any query using it. by contrast, a query filtering with the Passport_Number column will always return just one row, and so the selectivity of that column is very high.<\/p>\n<h4>Why You Should Know This<\/h4>\n<p><strong>\u00a0<\/strong>The Query Optimizer use Selectivity information to create the execution plans, and \u00a0can decide on an optimal execution plan based on the selectivity of a given column. It is also clearly good practice is create your indexes on the columns with a highest selectivity level. That means it is better to create a index on a <em>Name<\/em> column than a Gender column. It is also better create a <a href=\"http:\/\/www.simple-talk.com\/sql\/learn-sql-server\/sql-server-index-basics\/\">composite index<\/a> in a order to make use of the most selective column first, and thus increase the chances that the SQL Server will actually <em>use<\/em> that index.<\/p>\n<hr \/>\n<h2>2. Density<\/h2>\n<p>The term &#8220;<em>density<\/em>&#8221; comes from physics, and is calculated by the dividing the mass of a substance by the volume it occupies, as represented mathematically below.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/909-13things_clip_image002.jpg\" alt=\"909-13things_clip_image002.jpg\" width=\"64\" height=\"44\" \/><\/p>\n<p class=\"caption\">Where D = Density, M = Mass and V = Volume.<\/p>\n<p>The explanation can be stretched, as is the case with &#8220;<em>Geographic Density<\/em>&#8221; (something we&#8217;ve become used to hearing and understanding); For example, the geographic density of Brazil is calculated by dividing the number of habitants (or the &#8216;mass&#8217;) by the size of the geographic area (the &#8216;volume&#8217;, which is 187,000,000 people divided by 8,514,215.3 km2, which gives us 21.96 habitants per km2. In SQL Server, we can interpret this as:<\/p>\n<p class=\"intended\"><em>The more dense a column is, more rows that column returns for a given query. <\/em><\/p>\n<p>Take note that is exactly the <em>opposite<\/em> of selectivity, for which a higher value means <em>less<\/em> rows. To calculate the density of a column, you can run the following query:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT (1.0 \/ COUNT(DISTINCT &lt;ColumnName&gt;)) FROM &lt;TableName&gt; <\/pre>\n<p>&#8230; The\u00a0larger a number that query returns, the more &#8216;dense&#8217; your column is, and the more duplications it contains. With this number, the QO can determine two important facts about the frequency of data of a column, and to explain this better, let&#8217;s create an example table.<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">CREATE TABLE Test(Col1 INT)\r\nGO\r\nDECLARE @i INT\r\n\u00a0\u00a0\u00a0\u00a0SET @I = 0\r\nWHILE @i &lt; 5000\r\n\u00a0\u00a0\u00a0\u00a0BEGIN\r\n\u00a0\u00a0\u00a0\u00a0INSERT INTO Test VALUES(@i)\r\n\u00a0\u00a0\u00a0\u00a0INSERT INTO Test VALUES(@i)\r\n\u00a0\u00a0\u00a0\u00a0INSERT INTO Test VALUES(@i)\r\n\u00a0\u00a0\u00a0\u00a0INSERT INTO Test VALUES(@i)\r\n\u00a0\u00a0\u00a0\u00a0INSERT INTO Test VALUES(@i)\r\n\u00a0\u00a0\u00a0\u00a0SET @i = @i + 1\r\n\u00a0\u00a0\u00a0\u00a0END\r\nGO\r\nCREATE STATISTICS Stat ON Test(Col1)\r\nGO\u00a0\u00a0<\/pre>\n<p>As you can see, we have a table with 25,000 rows, and each value is duplicated across five rows; so the density is calculated as 1.0 \/ 5000 = 0.0002. \u00a0The first question we can answer that that information is: &#8220;How many unique values do we have in the column &#8216;Col1&#8217;?&#8221; <em>Just<\/em> using the density information we can make the following, 1.0 \/ 0.0002 = 5000. The second question is: &#8220;What is the average number of duplication per value in the &#8216;Col1&#8217; column?&#8221; Using the density information we can calculate: 0.0002 * 25000 = 5 (Which is exactly the average of duplicated values in this case).<\/p>\n<h4><strong>Why You Should Know This<\/strong><\/h4>\n<p>Using the table created above, let&#8217;s see where the SQL Server uses that information in practice. Take the following query:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">DECLARE @I INT\r\nSET @I = 999999\r\nSELECT * \r\n\u00a0\u00a0\u00a0\u00a0FROM Test\r\n\u00a0\u00a0\u00a0\u00a0WHERE Col1 = @I<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/909-13things_clip_image002_0000.jpg\" alt=\"909-13things_clip_image002_0000.jpg\" width=\"317\" height=\"121\" \/><\/p>\n<p>The SQL Server will use the information about the average density to estimate how many rows will be returned (in this case, 5 rows). We will see more about <em>why<\/em> the SQL uses the density information in the section 7.<\/p>\n<p>Another example:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT COUNT(*) FROM Test\r\n\u00a0\u00a0\u00a0\u00a0GROUP BY Col1<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/909-13things_clip_image004.jpg\" alt=\"909-13things_clip_image004.jpg\" width=\"508\" height=\"100\" \/><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/909-13things_clip_image006.jpg\" alt=\"909-13things_clip_image006.jpg\" width=\"489\" height=\"125\" \/><\/p>\n<p>In the first step, the Table Scan operator estimates that 25000 rows will be returned, and then after the Hash Mach Operator applies the Group By, only <em>5000<\/em> rows will be returned. In that case, SQL Server uses the information about how many unique rows are in the column.<\/p>\n<p>The density information used to give the QO an impression of how many duplicated rows exist in a column, which allows it to chose the optimal operator or join algorithm in an execution plan. In certain cases, the QO also uses density information to guess how many rows will be returned by a query &#8211; look at section 5 for more details.\u00a0<\/p>\n<hr \/>\n<h2>3. Cardinality<\/h2>\n<p>This is used to measure the number of rows which satisfy one condition. For instance, imagine one table with 500 rows, and a query with <em>&#8220;WHERE NAME = &#8216;Joao'&#8221;<\/em> &#8211; the Optimizer goes to the statistics and reads from the histogram that &#8216;Joao&#8217; represents 5% of the table, so the cardinality is (5% x 500) = 25. In the execution plan, we can think of the cardinality as the &#8220;estimated number of rows&#8221; displayed in the tool tips of each operator. I probably don&#8217;t need to spell it out, but a bad estimation of cardinality can easily result in an inefficient plan.<\/p>\n<h4>Why you should know that<\/h4>\n<p>In my opinion, the cardinality of a column is one of the most important pieces of information to the creation of an efficient execution plan. To create an execution plan, we really need to know (or at least be able to make a decent estimate about) how many rows will be returned to each operator. \u00a0For example, a join query which returns a lot of rows would be better processed with a hash join; But if the QO doesn&#8217;t know about the cardinality, it can mistakenly chose to use a Loop Join or, even worst, chose to order the key columns to make use of a merge join!<\/p>\n<hr \/>\n<h2>4. SARG<\/h2>\n<p>SARGs (Search Arguments) are the conditions of your query; The value you include in<em> WHERE = &#8220;&#8230;&#8221;<\/em> is the value passed to the search. With that information, the Optimizer will estimate how many rows will be returned and what is the best way(plan) to read this data. In the other words, based on the SARG, the QO analyses the selectivity, density and cardinality to get the best possible execution plan.<\/p>\n<h4>Why you should know that<\/h4>\n<p>Queries with good SARGs are used to filter returned data, as when less data is selected, fewer pages will need to be read, and that will manifest itself as a huge performance gain.<\/p>\n<hr \/>\n<div class=\"intended\">\n<h2>Interlude<\/h2>\n<p>Many SQL Server books and Query Processor references uses these four core terms described \u00a0above (Selectivity, Density, Cardinality and SARG) to explain the behavior of a query plan, so hopefully this knowledge will make things a little clearer for you in the future.<\/p>\n<\/div>\n<hr \/>\n<h2>5. Foldable Expressions<\/h2>\n<p>Early on in the optimization phase, the QO tries to change your query to evaluate an expression, and then change that expression to a constant. A constant is a literal value like such as &#8216;1&#8217;, &#8216;Peter&#8217;, &#8216;10.50&#8217;, &#8216;20090101&#8217;, etc. This method is known as Constant Folding.<\/p>\n<p>A simple example of Constant Folding is the expression &#8220;<em>&lt;column&gt; = 1+1&#8243;<\/em>; early in the optimization process, the QO would change that expression to &#8220;<em>&lt;column&gt; = 2<\/em>&#8220;, and then use the value &#8216;2&#8217; as a <strong>SARG<\/strong>. Another interesting example is that the QO would change <em>&#8220;&lt;column&gt; = REPLACE(&#8216;123xxx&#8217;, &#8216;xxx&#8217;, &#8216;456&#8217;)&#8221;<\/em> to simply <em>&#8220;&lt;column&gt; = &#8216;123456&#8217;&#8221;<\/em>.<\/p>\n<h4>Why You Should Know This<\/h4>\n<p>It is good to know where the QO can do this, because that can make our queries a little easier to write. It is also worth bearing in mind that an exception is made for large object types. If the output type of the folding process is a large object type (text, image, nvarchar(max), varchar(max), or varbinary(max)), then SQL Server does <em>not<\/em> fold the expression.<\/p>\n<hr \/>\n<h2>6. Non-foldable Expressions<\/h2>\n<p>An expression is considered non-foldable when the SQL Server cannot evaluate the expression, which causes a bad estimation of cardinality and a bad execution plan. We will see some examples in the next section (7) of when the use of some expressions blocks the use of statistics for a column.<\/p>\n<h4>Why You Should Know This<\/h4>\n<p>Well basically, you should know to avoid the use of these constructions in your query, as they&#8217;ll just make life more difficult. To avoid this, instead of using, say <em>ABS(-1000),<\/em> you could use <em>(-1000 * -1)<\/em>, or pass the expression to a variable and then use that variable in your query, as in:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT * FROM Tab1\r\nWHERE Col1 &lt; ABS(-78150)\r\n\r\n-- Use a variable with the recompile clause, to force the QO read the value of @Var\r\nDECLARE @Var INT\r\nSET @Var = ABS(-1000)\r\n\r\nSELECT * FROM CONVE002\r\nWHERE ID_Pessoa &lt; @Var\r\nOPTION(RECOMPILE))<\/pre>\n<hr \/>\n<h2>7. What Happens When the Query Optimizer Can&#8217;t Estimate the Cardinality of a SARG?<\/h2>\n<p>The cardinality, density and selectivity of a value or a column are the heart of the matter, and the QO will always try to use these pieces of information to get an execution plan. When possible during the optimization stage, the QO tries to simplify your search arguments into constants. We&#8217;ve already covered this section 5, and we know that:<\/p>\n<ul>\n<li>In the optimization stage, the QO changes <em>&#8220;WHERE Value = 1 + 1&#8221;<\/em> to <em>&#8220;WHERE Value = 2&#8221;<\/em>, and with that 2 calculated, it tries to read the statistics to get the information necessary to generate a good execution plan.<\/li>\n<li>The QO changes <em>&#8220;WHERE &lt;column&gt; = REPLACE(&#8216;123XXX&#8217;, &#8216;XXX&#8217;, &#8216;456&#8217;)&#8221;<\/em> to <em>&#8220;WHERE &lt;column&gt; = &#8216;123456&#8217;&#8221;<\/em>.<\/li>\n<\/ul>\n<p>However, the QO sometimes can&#8217;t identify (read <a href=\"http:\/\/blogs.msdn.com\/queryoptteam\/archive\/2006\/03\/31\/565991.aspx\">sniff<\/a>) the value of a search argument for any of a variety of reasons. I&#8217;ll demonstrate a few:<\/p>\n<ul>\n<li>Pay attention to when a constant uses a function, because some functions prevent the QO&#8217;s estimations. The functions listed below are used by the Query Optimizer to change your constants to a literal, and if you are using a different function you should run a test to be sure about what the QO is going to do:\n<p> <strong>Lower, Upper, RTrim, Datediff, Dateadd, Datepart, Substring, Charindex, Len, SUser_SName, IsNull, GetDate, GetUTCDate, Replace, Left, Right;<br \/>\n <\/strong><br \/>\n Some examples of what I mean:<\/p>\n<ul>\n<li><em>&#8220;WHERE &lt;column&gt; = ABS(-1000)&#8221; &#8211;<\/em> to make this SARG able to be used into the Optimizer, you could change this command to <em>&#8220;WHERE &lt;column&gt; = -1000 * -1&#8221;<\/em>. Using * -1 instead of the ABS function.<\/li>\n<li><em>&#8220;WHERE &lt;column&gt; = ROUND(123.4545, 2)&#8221;<\/em>: Results in a bad estimation of the cardinality.<\/li>\n<li><em>&#8220;WHERE &lt;column&gt; = LEN(88888888)&#8221;<\/em>: Results in a good estimation of the cardinality.<\/li>\n<\/ul>\n<\/li>\n<li>When you are using local variables, for instance:\n<pre class=\"theme:ssms2012 lang:tsql\">DECLARE @I INT \r\nSELECT * FROM &lt;TABLE&gt; \r\nWHERE &lt;COLUMN&gt; = @I<\/pre>\n<\/li>\n<li>When you are using scalar-valued user-defined functions.\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT * FROM &lt;TABLE&gt; \r\nWHERE &lt;COLUMN&gt; = dbo.fn_FormatText('Text') <\/pre>\n<\/li>\n<li>When you are using a SubQuery; for instance, <em>&#8220;WHERE &lt;column&gt; = (select &lt;column&gt; from &lt;table&gt;)&#8221;<\/em> \u00a0will prevent an estimation because the value of the subquery is not known at the optimization stage.<\/li>\n<\/ul>\n<p>When the optimizer can&#8217;t identify the cardinality of a value, it uses a feature called Magic Density (I love this name), which is more commonly known by the name &#8220;Guess&#8221;. In other words, it&#8217;ll try to predict how many rows will be returned, and to do that, it uses the density of one column or \u00a0a hard-coded percent (which I&#8217;ll come to in a moment). Let&#8217;s take a look at some examples of how it does this &#8216;magic&#8217; in SQL Server 2005:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">-- For \"=\", uses COUNT(*) * column Density, translating COUNT(*) * (1.0 \/ (COUNT (DISTINCT &lt;column&gt;))) \r\nSELECT * \r\nFROM &lt;TABLE&gt; \r\nWHERE &lt;COLUMN&gt; = @I\r\n\r\n-- For \"BETWEEN\", uses 9%, translating (COUNT(*) * 9.0) \/ 100 \r\nSELECT * \r\nFROM &lt;TABLE&gt; \r\nWHERE &lt;COLUMN&gt; BETWEEN @I AND @X \r\n\r\n-- For \"&gt;, &gt;=, &lt; and &lt;=\", uses 30%, translating (COUNT(*) * 30.0) \/ 100 \r\nSELECT * \r\nFROM &lt;TABLE&gt; \r\nWHERE &lt;COLUMN&gt; \"&gt;, &gt;=, &lt; and &lt;=\" @I<\/pre>\n<p><strong>Note :<\/strong> If there are no statistics for a column and the equality operator is used, that means the SQL can&#8217;t know the density of the column, so it uses a 10 percent fixed value.<\/p>\n<h4>Why You Should Know This<\/h4>\n<p>This knowledge is important because with it you can avoid these mistakes and help the QO to create a optimal execution plan. As I said, estimation of cardinality is the heart of the matter, so always keep eye on that.<\/p>\n<hr \/>\n<h2>8. Contradiction Detections<\/h2>\n<p>Contradiction Detection is an optimization that recognizes when a query is written in such a way that it will never return any rows at all. When the QO identifies such a query, it rewrites it to remove whichever part \u00a0contained the contradiction. Here you can see a quick example of such a contradiction:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT * \r\n\u00a0\u00a0\u00a0\u00a0FROM Tab1\r\n\u00a0\u00a0\u00a0\u00a0WHERE Tab1_Col1 BETWEEN 10 \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0AND 0\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0OPTION(RECOMPILE)<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/909-13things_clip_image002_0001.jpg\" alt=\"909-13things_clip_image002_0001.jpg\" width=\"257\" height=\"130\" \/><\/p>\n<p>Looking at the plan, you&#8217;ll see it does not access the table <em>Tab1<\/em>, but instead that the plan uses a Constant Scan operator. It&#8217;s worth noticing that I use the RECOMPILE hint to force SQL Server <em>not<\/em> to use parameterization, because that can prevent the contradiction from being detected, and hence the optimization from taking place.<\/p>\n<h4>Why You Should Know This<\/h4>\n<p>This optimization is very interesting because it can neatly avoid the use of large amount of resources. Identifying the contradiction means less wasted CPU time, less memory used, fewer locks and so on. In section 10 we&#8217;ll will see another interesting detection of contradictions.<\/p>\n<hr \/>\n<h2>9. Foreign Keys<\/h2>\n<p>The Query Optimizer is smart enough to identify when a query is accessing a table that doesn&#8217;t need to be accessed. For instance, if I have a <em>Customers<\/em> table and an <em>Orders<\/em> table, a Foreign Key from Orders to Customers tells me that every Order has a Customer. So, if I have a query doing a join with the Customers and Orders tables, it might be that I don&#8217;t need to read all the columns of a table. This code shows that behavior:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">IF OBJECT_ID('Tab2') IS NOT NULL\r\n\u00a0\u00a0\u00a0\u00a0BEGIN\r\n\u00a0\u00a0\u00a0\u00a0DROP TABLE Tab2\r\n\u00a0\u00a0\u00a0\u00a0END\r\nIF OBJECT_ID('Tab1') IS NOT NULL\r\n\u00a0\u00a0\u00a0\u00a0BEGIN\r\n\u00a0\u00a0\u00a0\u00a0DROP TABLE Tab1\r\n\u00a0\u00a0\u00a0\u00a0END\r\nGO\r\nCREATE TABLE Tab1 (Tab1_Col1 Integer NOT NULL PRIMARY KEY, Tab1_Col2 CHAR(200))\r\nCREATE TABLE Tab2 (Tab2_Col1 Integer NOT NULL PRIMARY KEY, Tab1_Col1 Integer NOT NULL, \u00a0\u00a0\u00a0\u00a0Tab2_Col2 CHAR(200))\r\nALTER TABLE Tab2 ADD CONSTRAINT fk FOREIGN KEY (Tab1_Col1) REFERENCES Tab1(Tab1_Col1)\r\nGO\r\n\u00a0\u00a0\u00a0\u00a0-- Fine, the execution plan won't use Tab1\r\nSELECT Tab2.*\r\n\u00a0\u00a0\u00a0\u00a0FROM Tab2\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0INNER JOIN Tab1\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0ON Tab1.Tab1_Col1 = Tab2.Tab1_Col1<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/909-13things_clip_image002_0002.jpg\" alt=\"909-13things_clip_image002_0002.jpg\" width=\"372\" height=\"141\" \/><\/p>\n<p>As we can see, the QO create a execution plan that access just the Tab2 table, because I have a foreign key (FK) that assures the all rows on Tab2 are linked to a valid row into Tab1. Be aware that the foreign key <em>must<\/em> be Trusted. You can have a nontrusted foreign key, but in that case the QO will read both tables. You can check if an FK is trusted by reading the sys.foreign_keys table, in which the is_not_trusted column will tell us if an FK is trusted or not.<\/p>\n<p>This is undoubtedly a very cool feature, but in my opinion there are actually many ways it could be improved. For instance, the QO will <em>only<\/em> use this behavior if you have a single-column foreign key; if you have a multi-column foreign key, it will read both tables. I&#8217;ve send an email to my mentor and great friend <a href=\"http:\/\/blogs.msdn.com\/conor_cunningham_msft\/\">Conor Cunningham<\/a> discussing some points which, in my opinion, could be enhanced; he blogs about it, <a href=\"http:\/\/blogs.msdn.com\/conor_cunningham_msft\/archive\/2009\/11\/12\/conor-vs-foreign-key-join-elimination.aspx\">so that is a mandatory extended read if you want to know more<\/a>.<\/p>\n<h4>Why You Should Know This<\/h4>\n<p>I would like to make two important points. The first is a warning; when you write your queries, be sure to include all columns in the join, even when you are using a key column in the WHERE clause. <br \/>\n Last month I was working in a performance problem, and I found a query that was not using an index properly, ad so I naturally decided to investigate what was happening. The problem was that the QO was not using the index because the developer had not specified all the columns of the multi-column foreign key in one join. Take a look in my blog <a href=\"http:\/\/translate.google.com.br\/translate?js=y&amp;prev=_t&amp;hl=pt-BR&amp;ie=UTF-8&amp;layout=1&amp;eotf=1&amp;u=http%3A%2F%2Ffabianosqlserver.spaces.live.com%2Fblog%2Fcns!52EFF7477E74CAA6!1698.entry&amp;sl=pt&amp;tl=en\">here<\/a> and <a href=\"http:\/\/translate.google.com.br\/translate?hl=pt-BR&amp;sl=pt&amp;tl=en&amp;u=http%3A%2F%2Ffabianosqlserver.spaces.live.com%2Fblog%2Fcns!52EFF7477E74CAA6!1699.entry\">here<\/a> to see the example (the originally posts are in Portuguese, but I think Google translation did a good job here). You can also Look at the <a href=\"http:\/\/blogs.msdn.com\/conor_cunningham_msft\/archive\/2009\/11\/12\/conor-vs-foreign-key-join-elimination.aspx\">Conor&#8217;s blog<\/a> to see another example.<\/p>\n<p>The second point is about views; we rarely see this type of syntax (example above, in the Foreign Key section), where we select just the columns of one table, but as you can see, you can have a view which accesses both tables. When you write code using that view, you don&#8217;t need to use <em>all<\/em> the columns, just a column from <em>one<\/em> table, so in that case this feature will be well-sed.<\/p>\n<hr \/>\n<h2>10. Check Constraints<\/h2>\n<p>Check Constraints are used by the QO to identify contradictions. For example, if you have a trusted check constraint, and one query using the columns which belong to that check constraint, the QO can use the constraint to quickly validate the expression. Let&#8217;s take a look at an example.<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">IF OBJECT_ID('Tab1') IS NOT NULL\r\n\u00a0\u00a0\u00a0\u00a0BEGIN\r\n\u00a0\u00a0\u00a0\u00a0DROP TABLE Tab1\r\n\u00a0\u00a0\u00a0\u00a0END\r\nGO\r\nCREATE TABLE Tab1 (Col1 Integer NOT NULL PRIMARY KEY,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0Status CHAR(1))\r\nALTER TABLE Tab1 ADD CONSTRAINT ck CHECK(Status = 'S' \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0OR Status = 'N')\r\nGO\r\nSELECT *\r\n\u00a0\u00a0\u00a0\u00a0FROM Tab1\r\n\u00a0\u00a0\u00a0\u00a0WHERE Status = 'X'\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0OPTION(RECOMPILE)<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/909-13things_clip_image002_0003.jpg\" alt=\"909-13things_clip_image002_0003.jpg\" width=\"301\" height=\"138\" \/><\/p>\n<p>The QO knows that it&#8217;s impossible for a row to exist with an &#8216;X&#8217; value in the status column, so it creates a plan which doesn&#8217;t access the table at all.<\/p>\n<h4>Why You Should Know This<\/h4>\n<p>If you don&#8217;t have them already, be sure to create your check constraints and make them trusted. \u00a0Just like with the foreign keys, the QO will <em>only<\/em> use that information if the constraints are trusted, and you can look that information up in sys.check_constraints. Another interesting point concerns parameterization; Notice that I used the RECOMPILE hint to force SQL Server to not use parameterization because, as before, that can avoid the detection of the contradiction in the first place.<\/p>\n<hr \/>\n<h2>11. Non-Updating Updates<\/h2>\n<p>Now this is a very interesting topic &#8211; Don&#8217;t you ever ask yourself what happens when you update a column with the same value as already exists? As we know, when a column belongs to an index, we need to keep that data updated on the clustered index and all non-clustered indexes that uses the column. Well, this is exactly where the Non-Updating Updates optimizations come into play.<\/p>\n<p>Since SQL Server 2005, the QO has had an optimization that verifies if the value in a non-clustered index has changed before updating it. Why is that so important? Because non-clustered indexes are <em>very<\/em> expensive and their cost increases exponentially depending on how many non-clustered index reference a given column.<\/p>\n<p>Generally speaking, updates to a column which belongs to a non-clustered index are performed in several steps. For instance, first use the cluster index to locate the value which needs to be updated. Second, update that value in the clustered index. Third, locate the value in the non-clustered index. Fourth, update the non-clustered index (a process which will be split to a delete action and an insert one) with the new value. Let&#8217;s go see some examples:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">IF OBJECT_ID('Tab1') IS NOT NULL\r\n\u00a0\u00a0\u00a0\u00a0BEGIN\r\n\u00a0\u00a0\u00a0\u00a0DROP TABLE Tab1\r\n\u00a0\u00a0\u00a0\u00a0END\r\nGO\r\nCREATE TABLE Tab1 (Tab1_Col1 Integer NOT NULL PRIMARY KEY, Tab1_Col2 CHAR(200))\r\nCREATE INDEX ix_Test ON Tab1(Tab1_Col2)\r\nINSERT INTO Tab1 (Tab1_Col1, Tab1_Col2) \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0VALUES(1,'')\r\nSELECT OBJECT_NAME(a.OBJECT_ID) Tabela, b.name, leaf_insert_count, leaf_delete_count, leaf_update_count\r\n\u00a0\u00a0\u00a0\u00a0FROM sys.dm_db_index_operational_stats(DB_ID(), OBJECT_ID('Tab1'), 2, NULL) AS a\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0INNER JOIN sys.indexes b\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0ON a.OBJECT_ID = b.OBJECT_ID\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0AND a.index_id = b.index_id<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/909-13things_clip_image002_0004.jpg\" alt=\"909-13things_clip_image002_0004.jpg\" width=\"451\" height=\"71\" \/><\/p>\n<p>The select on the dmv sys.dm_db_index_operational_stats show us the number of inserts that occur on the ix_Test index. Now let&#8217;s make an update on the Tab1_Col2 column, which has a non-clustered (NC) index.<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">UPDATE Tab1 SET Tab1_Col2 = 'ABC'\r\nSELECT OBJECT_NAME(a.OBJECT_ID) Tabela, b.name, leaf_insert_count, leaf_delete_count, leaf_update_count\r\n\u00a0\u00a0\u00a0\u00a0FROM sys.dm_db_index_operational_stats(DB_ID(), OBJECT_ID('Tab1'), 2, NULL) AS a\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0INNER JOIN sys.indexes b\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0ON a.OBJECT_ID = b.OBJECT_ID\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0AND a.index_id = b.index_id<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/909-13things_clip_image004_0000.jpg\" alt=\"909-13things_clip_image004_0000.jpg\" width=\"438\" height=\"65\" \/><\/p>\n<p>Now we have 2 values inserted on the NC index &#8211; next, let&#8217;s make an update using the same &#8216;ABC&#8217; value.<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">UPDATE Tab1 SET Tab1_Col2 = 'ABC'\r\nSELECT OBJECT_NAME(a.OBJECT_ID) Tabela, b.name, leaf_insert_count, leaf_delete_count, leaf_update_count\r\n\u00a0\u00a0\u00a0\u00a0FROM sys.dm_db_index_operational_stats(DB_ID(), OBJECT_ID('Tab1'), 2, NULL) AS a\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0INNER JOIN sys.indexes b\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0ON a.OBJECT_ID = b.OBJECT_ID\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0AND a.index_id = b.index_id\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/909-13things_clip_image004_0001.jpg\" alt=\"909-13things_clip_image004_0001.jpg\" width=\"438\" height=\"65\" \/><\/p>\n<p>You can run the update as many times as you want; The QO is smart enough to identify the value &#8216;ABC&#8217; is the same as what is stored in the non-clustered index, so it detects the unchanged value and avoids the update.<\/p>\n<h4>Why You Should Know This<\/h4>\n<p>The &#8216;<em>SQL Server 2008 Internals<\/em>&#8216; book by Kalen Delaney has an interesting point regarding this optimization. Many programming databases have a common paradigm of how to make updates, and if I write an update dynamically using just the rows which are changed, I will have one query plan for each updated column.<\/p>\n<p>To fix that inefficiency, we can write one query using all columns, and pass the column values as parameters (which is the usual way), so the same plan will be reused. But if we write one update using all the columns of a table, even the columns which have not changed, we can end up with a problem when it comes to updating all the non-clustered indexes with a value which has not changed. So this features is very interesting from the point of view of avoiding gross inefficiency.<\/p>\n<hr \/>\n<h2>12. RowModCtr<\/h2>\n<p>There is a column in the sysindexes table where we can see how many changes have been made to a column or a table since the last statistics update. In SQL Server 2005, when an INSERT, DELETE or UPDATE occurs in a column which belongs to a set of statistic, SQL Server updates this value (RowModCtr) in the sysindexes table. In SQL Server 2000, this column is updated when events occurs on a table level, not a column level. In both cases, when the statistics are updated, the RowModCtr column is updated to zero.<\/p>\n<h4>Why You Should Know This<\/h4>\n<p>If you want to have more control over when UPDATE STATISTICS occurs, you can use this column to see how many changes occur in your table, or column, and with that information in hand you can decide whether to update a statistic or not. For example, using this column, you can create a maintenance job to keep the statistics updated because, if a statistic is <em>outdated<\/em>, the QO can decide to update it during the creation of an execution plan , which causes a small (and avoidable) delay.<\/p>\n<hr \/>\n<h2>13. When is the Auto-Update to Statistics Triggered?<\/h2>\n<p>Every time a column which belongs to a statistic get a sufficient quantity of modifications, the SQL Server starts \u00a0&#8216;Auto-Update Statistics&#8217; to keep the data current. This works in this manner:<\/p>\n<ul>\n<li>If the cardinality of a table is less than six and the table is in the tempdb database, auto update after every six modifications to the table.<\/li>\n<li>If the cardinality of a table is greater than 6, but less than or equal to 500, update statistics every 500 modifications.<\/li>\n<li>If the cardinality of a table is greater than 500, update statistics when (500 + 20 percent of the table) changes have occurred.<\/li>\n<li>For table variables, a cardinality change does not trigger an auto-update of the statistics.<\/li>\n<\/ul>\n<p><strong>Question:<\/strong> what can I do to know when an auto update has started?<br \/>\n There are many ways; Here are a few:<\/p>\n<ul>\n<li>If the column rowmodctr was zero, that means an auto update statistics just run, or the table is empty.<\/li>\n<li>In Profiler you can select the SP:StmtCompleted and SP:StmtStarting columns, and if an auto update statistics is triggered you will see \u00a0code which looks like:\n<pre class=\"theme:ssms2012 lang:tsql\">SELECT StatMan([SC0])\r\n\u00a0\u00a0\u00a0\u00a0FROM (\r\n\u00a0\u00a0\u00a0\u00a0SELECT TOP 100 PERCENT &lt;COLUMN&gt; AS [SC0] \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0FROM [dbo].&lt;TABLE&gt; WITH (READUNCOMMITTED)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0ORDER BY [SC0] ) AS _MS_UPDSTATS_TBL<\/pre>\n<\/li>\n<li>You can enable the 205 trace flag, and then SQL Server will write that information into the error log.<\/li>\n<li>The 8721 trace flag\u00a0 will write that information into the error log, too.<\/li>\n<\/ul>\n<h4>Why You Should Know This<\/h4>\n<p>The information about when and why the auto update statistics is triggered is all about SQL Server internals, and that knowledge can make your maintenance jobs more accurate. Sometimes the auto update can take a huge amount of time in the creation of the execution plan, so some problems can be avoided and fixed by analyzing this point.<\/p>\n<hr \/>\n<p>I appreciate any and all feedback, and I really hope you&#8217;ve enjoyed reading this. But that&#8217;s not all folks; do you wanna know more? I&#8217;ve put together a long, juicy list of many of my preferred articles and posts, so have a good read if you want to know more about the Query Optimizer.<\/p>\n<ul>\n<li><a href=\"http:\/\/sqlblog.com\/blogs\/ben_nevarez\/archive\/2009\/08\/04\/the-query-optimizer-and-contradiction-detection.aspx\">The Query Optimizer and Contradiction Detection<\/a><\/li>\n<li><a href=\"http:\/\/blogs.msdn.com\/conor_cunningham_msft\/archive\/2009\/11\/12\/conor-vs-foreign-key-join-elimination.aspx\">Conor vs. FOREIGN KEY join elimination<\/a><\/li>\n<li><a href=\"http:\/\/blogs.msdn.com\/ianjo\/archive\/2005\/11\/10\/491543.aspx\">Use Function Results<\/a><\/li>\n<li><a href=\"http:\/\/blogs.msdn.com\/queryoptteam\/archive\/2006\/07\/07\/659453.aspx\">Non updating updates<\/a><\/li>\n<li><a href=\"http:\/\/blogs.msdn.com\/craigfr\/archive\/2007\/08\/15\/optimized-non-clustered-index-maintenance.aspx\">Optimized Non-clustered Index Maintenance<\/a><\/li>\n<li><a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms175933.aspx\">Troubleshooting Poor Query Performance: Constant Folding and Expression Evaluation During<br \/>\n Cardinality<\/a><\/li>\n<li><a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms175933.aspx\">Estimation<\/a><\/li>\n<li><a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms181034.aspx\">Troubleshooting Poor Query Performance: Cardinality Estimation<\/a><\/li>\n<li><a href=\"http:\/\/support.microsoft.com\/kb\/195565\/en-us\">Statistical maintenance functionality (autostats) in SQL Server<\/a><\/li>\n<li><a href=\"http:\/\/support.microsoft.com\/kb\/169642\/en-us\">INF: Search Arguments That Determine Distribution Page Usage<\/a><\/li>\n<li><a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/aa902688(SQL.80).aspx\">Statistics Used by the Query Optimizer in Microsoft SQL Server 2000<\/a><\/li>\n<li><a href=\"https:\/\/technet.microsoft.com\/en-us\/library\/dd535534%28v=sql.100%29.aspx?f=255&amp;MSPPError=-2147217396\">Statistics Used by the Query Optimizer in Microsoft SQL Server 2005<\/a><\/li>\n<li><a href=\"https:\/\/technet.microsoft.com\/en-gb\/library\/cc966425.aspx\">Batch Compilation, Recompilation, and Plan Caching Issues in SQL Server 2005<\/a><\/li>\n<li><a href=\"http:\/\/blogs.msdn.com\/ianjo\/archive\/2005\/11\/10\/491538.aspx\">Use Parameters or Literals for Query Inputs<\/a><\/li>\n<li><a href=\"http:\/\/blogs.msdn.com\/queryoptteam\/archive\/2006\/03\/31\/565991.aspx\">I Smell a Parameter!<\/a><\/li>\n<li><a href=\"http:\/\/yukonspace.blogspot.com\/2007\/10\/how-to-update-statistics-for-large.html\">How to update statistics for large databases<\/a><\/li>\n<li><a href=\"http:\/\/blogs.msdn.com\/psssql\/archive\/2009\/01\/22\/how-it-works-statistics-sampling-for-blob-data.aspx\">How It Works: Statistics Sampling for BLOB data<\/a><\/li>\n<li><a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms190397.aspx\">Index Statistics<\/a><\/li>\n<li><a href=\"http:\/\/support.microsoft.com\/kb\/325024\">Support WebCast: Effective Indexing and Statistics with SQL 2000<\/a><\/li>\n<\/ul>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Fabiano launches into a sound technical explanation of the way that the query optimiser works in SQL Server with a mention of  Brazilian Soccer stars and young ladies on Copacabana beach. You&#8217;ll never quite think of statistics, execution plans, and the query optimiser the same way again after reading this, but we think you&#8217;ll understand them better.&hellip;<\/p>\n","protected":false},"author":65554,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143531],"tags":[4178,5084,5076,4150,4151,4252],"coauthors":[6809],"class_list":["post-769","post","type-post","status-publish","format-standard","hentry","category-t-sql-programming-sql-server","tag-bi","tag-optimiser","tag-query-optimizer","tag-sql","tag-sql-server","tag-t-sql-programming"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/769","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/65554"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=769"}],"version-history":[{"count":6,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/769\/revisions"}],"predecessor-version":[{"id":78572,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/769\/revisions\/78572"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=769"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=769"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=769"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=769"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}