{"id":1341,"date":"2012-05-21T00:00:00","date_gmt":"2012-05-21T00:00:00","guid":{"rendered":"https:\/\/test.simple-talk.com\/uncategorized\/sql-server-prefetch-and-query-performance\/"},"modified":"2021-06-03T16:44:12","modified_gmt":"2021-06-03T16:44:12","slug":"sql-server-prefetch-and-query-performance","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/performance-sql-server\/sql-server-prefetch-and-query-performance\/","title":{"rendered":"SQL Server Prefetch and Query Performance"},"content":{"rendered":"<div id=\"pretty\">\n<h2>Introduction<\/h2>\n<p class=\"start\">Prefetching is often ignored by developers and DBAs when they are analyzing performance problems. After you&#8217;ve read this article I hope you will understand how prefetching works and why, and under what circumstances, this is so important for speeding-up your queries. I hope you&#8217;ll also be interested in the techniques and tools I&#8217;ve used to troubleshoot the problem, as they are useful for tracking a lot of other performance problems.<\/p>\n<p>In short, prefetching is used to execute I\/O operations in parallel, in a way that is somewhat similar to a read-ahead mechanism. It is used in the nested loops execution plan when there are more than a threshold number of rows in <span class=\"style18\">the outer input table.<\/span>&#160; &#160;You can see whether prefetch is being used &#160;by viewing the property <code>WithOrderedPrefetch<\/code> or <code>WithUnorderedPrefetch<\/code> in the nested loops operator. Prefetching can be ordered or not depending on your query. In this article I&#8217;ll show the behavior of unordered prefetching. If you&#8217;re interested, you can use the links in the end of this article to read more about ordered prefetching.<\/p>\n<h2>Setting the environment<\/h2>\n<p>To demonstrate prefetching, I&#8217;ll start by creating a 500MB database on a USB flash drive:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">USE Master\nGO\nCREATE DATABASE TestPrefetching ON PRIMARY \n( NAME = N'TestPrefetching', FILENAME = N'H:\\TestPrefetching.mdf' , SIZE = 512000KB , FILEGROWTH = 1024KB )\n&#160;LOG ON \n( NAME = N'TestPrefetching_log', FILENAME = N'H:\\TestPrefetching_log.ldf' , SIZE = 1024KB , FILEGROWTH = 10%)\nGO\nALTER DATABASE TestPrefetching SET RECOVERY SIMPLE\nGO\n<\/pre>\n<p>With the database created, let&#8217;s create two tables in it, one called <b>TestTab1<\/b> and another called <b>TestTab2<\/b>. After I create the tables, I&#8217;ll update the column <b>ID_Tab1<\/b> with some random values. It is a very simple database in which the <b>TestTab2 <\/b>has a column related to the<b> TestTab1<\/b>.<\/p>\n<p>After I create and populate the tables I&#8217;ll also update the statistics with fullscan to guarantee that the statistics are updated with the best precision.<\/p>\n<p>The following script took almost 6 mins to run in my notebook.<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">USE TestPrefetching\nGO\nIF OBJECT_ID('TestTab1') IS NOT NULL\n&#160; DROP TABLE TestTab1\nGO\nCREATE TABLE TestTab1 (ID Int IDENTITY(1,1) PRIMARY KEY, \n&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; Col1 Char(5000), \n&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; Col2 Char(1250),\n&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; Col3 Char(1250),\n&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; Col4 Numeric(18,2))\nGO\n-- 6 mins to run\nINSERT INTO TestTab1 (Col1, Col2, Col3, Col4)\nSELECT TOP 1000 NEWID(), NEWID(), NEWID(), ABS(CHECKSUM(NEWID())) \/ 10000000.\n&#160; FROM sysobjects a\n&#160;CROSS JOIN sysobjects b\n&#160;CROSS JOIN sysobjects c\n&#160;CROSS JOIN sysobjects d\nGO 30\nCREATE INDEX ix_Col4 ON TestTab1(Col4)\nGO\nIF OBJECT_ID('TestTab2') IS NOT NULL\n&#160; DROP TABLE TestTab2\nGO\nCREATE TABLE TestTab2 (ID Int IDENTITY(1,1) PRIMARY KEY,\n&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; ID_Tab1 Int,\n&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; Col1 Char(5000), \n&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; Col2 Char(1250),\n&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; Col3 Char(1250))\nGO\nINSERT INTO TestTab2 (ID_Tab1, Col1, Col2, Col3)\nSELECT TOP 1000 0, NEWID(), NEWID(), NEWID()\n&#160; FROM sysobjects a\n&#160;CROSS JOIN sysobjects b\n&#160;CROSS JOIN sysobjects c\n&#160;CROSS JOIN sysobjects d\nGO 10\nCREATE INDEX ix_ID_Tab1 ON TestTab2(ID_Tab1)\nGO\n&#160;\nDECLARE @MenorValor Int = 1, @MaiorValor Int = 5000, @i Int\nSET @i = @MenorValor + ABS(CHECKSUM(NEWID())) % (@MaiorValor - @MenorValor)\n;WITH CTE_1\nAS\n(\n&#160; SELECT ID, @MenorValor + ABS(CHECKSUM(NEWID())) % (@MaiorValor - @MenorValor) AS Col1\n&#160;&#160;&#160; FROM TestTab1\n)\nUPDATE TestTab2 SET ID_Tab1 = CTE_1.Col1\n&#160; FROM TestTab2\n&#160;INNER JOIN CTE_1\n&#160;&#160;&#160; ON CTE_1.ID = TestTab2.ID\nGO\nUPDATE STATISTICS TestTab1 WITH FULLSCAN\nUPDATE STATISTICS TestTab2 WITH FULLSCAN\n<\/pre>\n<p>Let&#8217;s now check on the size of the tables we&#8217;ve created.<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">-- Checking table size\nsp_spaceused TestTab1\nGO\nsp_spaceused TestTab2\nGO\n<\/pre>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1495-wpzncw7h.jpg\" alt=\"1495-wpzncw7h.jpg\" \/><\/p>\n<p>As we can see, the table TestTab1 has 241 MB of data, and the table TestTab2 has 80 MB of data.<\/p>\n<h2>Testing<\/h2>\n<p>Now that we&#8217;ve created the test scenario, we can execute the join between the two tables.<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">CHECKPOINT\nDBCC DROPCLEANBUFFERS WITH NO_INFOMSGS\nGO\nSELECT TestTab1.Col4, TestTab2.Col1\n&#160; FROM TestTab1 WITH(index=ix_Col4)\n&#160;INNER JOIN TestTab2\n&#160;&#160;&#160; ON TestTab1.ID = TestTab2.ID_Tab1\n&#160;WHERE TestTab1.Col4 &lt; 0.8\nOPTION (RECOMPILE)\n<\/pre>\n<p>The query above takes less than 1 second to finish and return 42 rows. Here are the results and the actual execution plan:<\/p>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1495-img4A.jpg\" alt=\"1495-img4A.jpg\" \/><\/p>\n<p class=\"caption\">Query Results<\/p>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1495-img4B.jpg\" alt=\"1495-img4B.jpg\" \/><\/p>\n<p class=\"caption\">Execution plan with 114 rows being returned from table TestTab1<\/p>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1495-img4C.jpg\" alt=\"1495-img4C.jpg\" \/><\/p>\n<p class=\"caption\">Prefetching is being executed by the Nested Loops operator<\/p>\n<p>As we can see it is a very straightforward execution plan with two Nested Loops operators. Let&#8217;s start investigating it in more detail.<\/p>\n<p>So &#160;how does the loop join work? For each row from the &#8220;outer table&#8221; (in this case the table TestTab1), search for the match in the &#8220;inner table&#8221; (TestTab2). For more details about how joins work, look at the links in the final section of this article.<\/p>\n<p>There are 114 rows being returned from the TestTab1. For each row returned from the table TestTab1, SQL Server will execute an Index Seek on the TestTab2. This seek is a random read because the rows are being retrieved in the order of TestTab1.Col4.<\/p>\n<p>To optimize the unordered I\/O operations on the table TestTab2, it triggers many asynchronous I\/O operations in parallel, rather than executing one I\/O operation per row read on table TestTab1. You can see prefetching is being used because the nested loops property <code>WithUnorderedPrefetch<\/code> is set to true. <\/p>\n<p>To compare the performance of this query without Prefetch, I&#8217;ll use the traceflag 8744 to disable the prefetch, &#160;and the command <code>QUERYTRACEON<\/code> to disable it only in the query scope.<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">CHECKPOINT\nDBCC DROPCLEANBUFFERS WITH NO_INFOMSGS\nGO\n&#160;\nSELECT TestTab1.Col4, TestTab2.Col1\n&#160; FROM TestTab1\n&#160;INNER JOIN TestTab2\n&#160;&#160;&#160; ON TestTab1.ID = TestTab2.ID_Tab1\n&#160;WHERE TestTab1.Col4 &lt; 0.8\nOPTION (RECOMPILE, QUERYTRACEON 8744)\n<\/pre>\n<p>The query above also run under 1 second, the only difference from the first query we ran is that I&#8217;m now using the trace flag to disable the prefetch. Even though prefetch was not used (you can confirm it looking at the nested loops operator properties, the property), there was no performance difference. <\/p>\n<p>Without prefetching, SQL Server will trigger the I\/O requests at something similar to &#8220;row-by-row&#8221;. In other words, for every row that is read on TestTab1, &#160;it will request the I\/O on the table TestTab2: When the I\/O finishes, &#160;it will request another I\/O operation for the second row and so on&#8230;<\/p>\n<p>It is important to understand here that SQL Server always read pages from memory (buffer cache), it means if a page is not in memory, it will request the page to the disk subsystem, put it in memory and then, read from memory.<\/p>\n<p>When using prefetching, many threads will request the required I\/O operations concurrently, so that &#160;the I\/O requests are likely to finish earlier, because it os running many requests at the same time. Like read-ahead, it means that when data needed to be joined, the chances of a page to be in cache (because the I\/O operation already had finished and the page is in the buffer) is very closely to happen, and it means faster joins.<\/p>\n<p>The key point here is to understand that because the pages are put in memory earlier in the query execution, when the join is processed it already has the pages in memory and don&#8217;t need to wait for the pages being read from disk. Again, it is the same principle from read-ahead.<\/p>\n<h2>Testing under disk pressure<\/h2>\n<p>To make things a little more interesting, what if we simulate the same behavior in a pressure disk subsystem? Let&#8217;s do it.<\/p>\n<p>To simulate disk pressure I&#8217;ll use the SQLIO tool to read some data on my flash drive. SQLIO is a tool provided by Microsoft which can be used to determine the I\/O capacity of a given configuration. It is very easy to setup and can give you valuable information about your disk system.<\/p>\n<p>To understand more about how to use SQLIO read the link in the end of the article.<\/p>\n<p>I&#8217;ll run the SQLIO with the following parameters:<\/p>\n<pre>sqlio.exe -kR -t16 -dH -s600 -b64<\/pre>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1495-img4F.jpg\" alt=\"1495-img4F.jpg\" \/><\/p>\n<p>If you had never used SQLIO, here is the translation for the parameters used above:<\/p>\n<ul>\n<li>&#8211;<strong>kR<\/strong> means I&#8217;ll simulate Reads.  <\/li>\n<li><strong>-t16<\/strong> means I want 16 threads running in parallel.  <\/li>\n<li><strong>-dH<\/strong> is the specification for the disk I want to use, in this case H:  <\/li>\n<li><strong>-s600<\/strong> is the time in seconds I want to run the tests.  <\/li>\n<li><strong>-b64<\/strong> is the block size I want to read the data, in this case I&#8217;m using 64kb <\/li>\n<\/ul>\n<p>Now I have SQLIO using 16 threads reading data on my flash drive, so let&#8217;s run the query with prefetching again:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">CHECKPOINT\nDBCC DROPCLEANBUFFERS WITH NO_INFOMSGS\nGO\nSELECT TestTab1.Col4, TestTab2.Col1\n&#160; FROM TestTab1\n&#160;INNER JOIN TestTab2\n&#160;&#160;&#160; ON TestTab1.ID = TestTab2.ID_Tab1\n&#160;WHERE TestTab1.Col4 &lt; 0.8\nOPTION (RECOMPILE)\n<\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1495-img51.jpg\" alt=\"1495-img51.jpg\" \/><\/p>\n<p>We can see from the profiler results that the disk the query is taking just 1.3 seconds to run, even with all that SQLIO stressing,<\/p>\n<p>Now let&#8217;s see the query without prefetching:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">CHECKPOINT\nDBCC DROPCLEANBUFFERS WITH NO_INFOMSGS\nGO\nSELECT TestTab1.Col4, TestTab2.Col1\n&#160; FROM TestTab1\n&#160;INNER JOIN TestTab2\n&#160;&#160;&#160; ON TestTab1.ID = TestTab2.ID_Tab1\n&#160;WHERE TestTab1.Col4 &lt; 0.8\nOPTION (RECOMPILE, QUERYTRACEON 8744)\n<\/pre>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1495-img52.jpg\" alt=\"1495-img52.jpg\" \/><\/p>\n<p>Now the same query without prefetch is taking 7.5 seconds to run; it is 7 times slower. Because the I\/O operations are not running in advance, it is taking more time to finish.<\/p>\n<p>Another good way to analyze the prefetching performance is looking at the <b>Current Disk Queue Leng<\/b>th disk counter in the performance monitor. We can easily see that, when the query is using prefetching, the number of I\/O requests in the queue is much higher than the same query without prefetching.<\/p>\n<p>Let&#8217;s use the same query, but, this time, we force &#160;more disk reads: In other words, let&#8217;s join more rows.<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">CHECKPOINT\nDBCC DROPCLEANBUFFERS WITH NO_INFOMSGS\nSELECT TestTab1.Col4, TestTab2.Col1\n&#160; FROM TestTab1 WITH(index=ix_Col4)\n&#160;INNER JOIN TestTab2\n&#160;&#160;&#160; ON TestTab1.ID = TestTab2.ID_Tab1\n&#160;WHERE TestTab1.Col4 &lt; 50\nOPTION (RECOMPILE, LOOP JOIN)\nGO\nCHECKPOINT\nDBCC DROPCLEANBUFFERS WITH NO_INFOMSGS\nSELECT TestTab1.Col4, TestTab2.Col1\n&#160; FROM TestTab1 WITH(index=ix_Col4)\n&#160;INNER JOIN TestTab2\n&#160;&#160;&#160; ON TestTab1.ID = TestTab2.ID_Tab1\n&#160;WHERE TestTab1.Col4 &lt; 50\nOPTION (RECOMPILE, LOOP JOIN, QUERYTRACEON 8744)\nGO\n&#160;\n<\/pre>\n<p>In the following screenshot, you can see the perfmon counter when the queries were running.<\/p>\n<p class=\"illustration\">&#160; <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1495-clip_image010.jpg\" width=\"602\" height=\"203\" alt=\"1495-clip_image010.jpg\" \/><\/p>\n<p>Looking at the counters we can see that when prefetching is used, SQL Server is triggering many read-ahead page operations, and the disk queue increases to perform the advanced I\/O operations.<\/p>\n<p>The first query finishes fast, and the query without prefetching is taking much more time to finish.<\/p>\n<h2>The real world problem<\/h2>\n<p>All of this can happen in your database when a plan without prefetching is re-used. As we know, SQL Server saves the plan of a query in the plan cache after compiling it. What if the plan saved in the cache is not using the prefetch, and then another execution reuses this plan? <\/p>\n<p>If you know this can happens, in other words, sometimes I&#8217;ll join just a few rows, and another times I&#8217;ll join many rows, you should use a query hint (<code>OPTIMIZE FOR,<\/code> or <code>RECOMPILE)<\/code> to make sure you are getting the right plan.<\/p>\n<p>You can see this in action on the following link: <a href=\"http:\/\/blog.sqlworkshops.com\/index.php\/prefetch\/\">http:\/\/blog.sqlworkshops.com\/index.php\/prefetch\/<\/a><i><\/i><\/p>\n<p>It is important to mention that even though prefetching looks awesome for a system that is under IO pressure, it has a cost. When prefetching is used, SQL Server acquires locks on the index seek even when run&#160;at read uncommitted isolation level and, if necessary due to blocking operators in the plan, holds these locks until the end of the statement rather than releasing them immediately. You can read more about it in the <a href=\"http:\/\/blogs.msdn.com\/b\/craigfr\/archive\/2007\/06\/07\/read-committed-and-bookmark-lookup.aspx\">Craig<\/a> blog post. So be aware that, in highly concurrent applications, &#160;this extra locking can degrade performance.<\/p>\n<h2>Conclusion <\/h2>\n<p>Read-ahead prefetching is a very nice feature and should be used when needed. In other words, if you have an environment with waits related to I\/O operations, a <em>current queue<\/em> performance counter low and <em>avg disk reads per sec<\/em> high it can mean that you are not prefetching data properly, the <em>readahead pages\/sec <\/em>performance counter can also be useful here. In real world disk subsystems, problems can happen for many other reasons, but if you are facing a query without prefetching in the plan, and waiting for disk, this article may help you to tune it. Don&#8217;t miss the opportunity to read all the links from the references to understand it better.<\/p>\n<p>That&#8217;s all folks.<\/p>\n<h2>References:<\/h2>\n<ul>\n<li><a href=\"http:\/\/blogs.msdn.com\/b\/craigfr\/archive\/2006\/07\/26\/679319.aspx\">Nested Loops Join<\/a>  <\/li>\n<li><a href=\"http:\/\/blogs.msdn.com\/b\/craigfr\/archive\/2008\/10\/07\/random-prefetching.aspx\">Random Prefetching<\/a>  <\/li>\n<li><a href=\"http:\/\/support.microsoft.com\/kb\/920093\/en-us\">Tuning options for SQL Server 2005 and SQL Server 2008 when running in high performance workloads<\/a>  <\/li>\n<li><a href=\"http:\/\/www.brentozar.com\/archive\/2008\/09\/finding-your-san-bottlenecks-with-sqlio\/\">SQLIO Tutorial: How to Test Disk Performance<\/a><span class=\"MsoHyperlink\"><\/span>  <\/li>\n<li><a href=\"http:\/\/blogs.msdn.com\/b\/craigfr\/archive\/2007\/06\/07\/read-committed-and-bookmark-lookup.aspx\">Read Committed and Bookmark <\/a><\/li>\n<\/ul>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Prefetching can make a surprising difference to SQL Server query execution times where there is a high incidence of waiting for disk i\/o operations, but the benefits come at a cost. Mostly, the Query Optimizer gets it right, but occasionally there are queries that would benefit from tuning. &hellip;<\/p>\n","protected":false},"author":65554,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143529],"tags":[4206,4150,4151],"coauthors":[],"class_list":["post-1341","post","type-post","status-publish","format-standard","hentry","category-performance-sql-server","tag-performance","tag-sql","tag-sql-server"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/1341","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/65554"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=1341"}],"version-history":[{"count":6,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/1341\/revisions"}],"predecessor-version":[{"id":91208,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/1341\/revisions\/91208"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=1341"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=1341"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=1341"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=1341"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}