{"id":1992,"date":"2015-05-08T00:00:00","date_gmt":"2015-05-07T00:00:00","guid":{"rendered":"https:\/\/test.simple-talk.com\/uncategorized\/optimizing-batch-process-in-sql-server\/"},"modified":"2021-09-29T16:21:27","modified_gmt":"2021-09-29T16:21:27","slug":"optimizing-batch-process-in-sql-server","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/t-sql-programming-sql-server\/optimizing-batch-process-in-sql-server\/","title":{"rendered":"SQL Server Batch Process Optimization: Auto-Growth, Locks, Log Flushes, and Index Fragmentation"},"content":{"rendered":"<p><b>Optimizing a SQL Server batch process is rarely a single-step fix. This article walks through the full tuning sequence on a representative batch-load workload, measuring the impact of each change: (1) pre-sizing the data and log files to eliminate auto-growth events during the load; (2) reducing lock contention by sizing transactions correctly (too many small transactions cause log flush overhead; one giant transaction causes log bloat and long locks); (3) diagnosing WRITELOG waits and choosing between Simple and Full recovery model trade-offs; (4) eliminating the auto-growth events that reappear when transaction sizing changes; (5) dropping or rebuilding indexes appropriately during bulk load to avoid fragmentation; (6) finding the chunk size that balances log-flush frequency against transaction duration. Each step includes the diagnostic query used to identify the problem, the specific change made, and the measured execution time before and after.<\/b><\/p>\n<div class=\"article-content\">\n<p class=\"start\"> \tYou often need to run batch SQL processes on a database as a SQL Agent task. ETL tasks, for example, are common on commercial databases; Data Warehouses also use batch process to import and transform data; OLTP systems have batches to process information. <\/p>\n<p> \tI have worked in development teams that have given developers complete autonomy to create and run batch processes on the database servers. Because the many optimizations that can, and should, be done for a batch process so that it runs faster and more efficiently, a better result would often be achieved if developers can work together with DBAs. This is because a slow-running batch process can have a knock-on effect, draining CPU and IO resources, blocking other processes and locking shared database assets. Team work between developers and DBAs can avoid this, making for much better batch processes. <\/p>\n<p> \tI will show an example of optimizing a batch process, and explain what you need to analyze while creating these process and I hope I will prove to you that this should be more than a developer task, because the DBA can make a positive contribution. <\/p>\n<h1>Test Environment<\/h1>\n<p> \tDevelopment and testing is generally done on an isolated development server with a single user. The focus is on the creation of the batch process before the batch is deployed to production. <\/p>\n<p> \tOur test environment has the <strong>adventureworks2012<\/strong> database, but we will create a new database during the tests and import information from the <strong>adventureworks2012<\/strong> database. <\/p>\n<p> \tThe script to create the test environment is this: <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">-- Creating the database\ncreate database Testbatch on\n(name=data,filename='c:\\banco\\dados.mdf',size=5mb) log on\n(name=log, filename='c:\\banco\\log.ndf', size=1mb)\ngo\n\nuse testbatch\ngo\n\n-- Creating a table with primary key and clustered index\nCREATE TABLE produtos\n(\n\tId int identity(1,1) not null primary key,\n\tProductId INT NOT NULL, \n\tProductSubcategoryId INT NULL,\n\tUnitPrice MONEY NOT NULL\n)\nGO\n-- Creating a non-clustered index\ncreate index indsub on produtos(productsubcategoryid)\ngo<\/pre>\n<p> \tHowever, because we will change the database environment during the test, we will drop the database and create the entire environment again after each test. <\/p>\n<p> \tThe example batch that we will use for the test will be the following: <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">declare @i bigint\ndeclare @icnt int\nselect @icnt=1\n-- Let's repeat the insert's 3 times 3 x 1032192\nwhile @icnt&lt;=3\nbegin\n\tselect @i=0\n-- This is the number of records resulting from the cross apply bellow\n\twhile @i&lt;1032192\n\tbegin\n\t\twith qry as\n\t\t(SELECT\n\t\t\tp.ProductID,\n\t\t\tp.ProductSubcategoryId,\n\t\t\tROW_NUMBER() OVER \n\t\t\t(\n\t\t\t\tORDER BY\n\t\t\t\t\tn.number\n\t\t\t) as unitprice            \n\t\t\t--- It's a fake value that help to control the inserts\n\t\tFROM Adventureworks2012.production.Product AS p\n\t\tCROSS APPLY master..spt_values AS n   \n\t\t\t--- This table is used to create test data\n\t\tWHERE\n\t\t\tn.type = 'p')\n\t\tINSERT INTO Produtos select * from qry where \n\t\tunitprice &gt;=@i and unitprice &lt;@i+1000  \n\t\t-- The inserts will be done in chunks of 1 thousand records\n\n\tselect @i=@i+1000\t\n\tend \n\tselect @icnt=@icnt+1\nend\n<\/pre>\n<p> \tOur batch will insert millions of rows into our table <strong>Produtos<\/strong>, to simulate an ETL data-import operation that inserts many blocks of 1000 records. It works by not only importing the rows from adventureworks2012, but also doing a CROSS APPLY with <strong>spt_values<\/strong> table in master. This table contains numbers between 1 and 2049 that we can use for data simulation, exactly as we are doing. <\/p>\n<p> \tWe will also change this batch during the tests, to improve both the performance and efficiency of the batch process. <\/p>\n<h2>Metrics for the Test<\/h2>\n<h2> Results<\/h2>\n<p> \tFor each test run, we will gather some counters so that we can compare the results. <\/p>\n<h4>Execution Time<\/h4>\n<p> \tThe execution time can vary a lot relative to your system, but the relative differences in time between the executions is likely to be similar. <\/p>\n<h4>Wait Stats<\/h4>\n<p> \tWe will check wait stats, using the following queries: <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">select top 30 * from sys.dm_os_wait_stats\norder by waiting_tasks_count desc\n<\/pre>\n<p> \tWe will also clear wait stats between the tests. This isn&#8217;t the kind of thing you should do in production, but of course you will run this in a test environment. <\/p>\n<p> \tThe instructions to clear the wait stats: <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">dbcc sqlperf('sys.dm_os_wait_stats',clear)\n<\/pre>\n<h4>Performance Counter<\/h4>\n<p> \tWe will also use Performance Monitor to check <strong>Lock Request\/sec<\/strong> counter. We will see how important this counter is. <\/p>\n<p> \tYou can see in the image below how to select this counter in performance monitor. <\/p>\n<p> \t<img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/2184-Screenshot%20(199)-b493d067-d8fe-42d8-b98b-89e16ac75dc5.png\" alt=\"2184-Screenshot%20(199)-b493d067-d8fe-42\" \/><\/p>\n<h4>Log Flushes<\/h4>\n<p> \tLast but not least, we will check the number of log flushes that are happening during the batch. A Log flush is the name given to the event where SQL Server writes the log cache to the log file and it happens after every commit transaction or when the log buffer is full. <\/p>\n<p> \tThis is a more difficult metric. We will need to create an Extended Events session to check this information. <\/p>\n<p> \t&#8216;Extended events&#8217; provide a less intrusive event system introduced in SQL Server 2008 that allow us to capture several kinds of events, much more than SQL Profiler, which is now deprecated. <\/p>\n<p> \tWe will create the XE session with the following instruction: <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">CREATE EVENT SESSION [logFlush] ON SERVER \nADD EVENT sqlserver.log_flush_start(\n    ACTION(sqlserver.database_name)) \nADD TARGET package0.histogram(SET filtering_event_name=N'sqlserver.log_flush_start',\n\t\t\t\tsource=N'database_id',source_type=(0))\nGO\n<\/pre>\n<p> \tThe histogram object in the XE session will give us totals for a database specified by the <strong>database_id<\/strong> field. We will retrieve this information using the following query: <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">SELECT \n    db_name( n.value('(value)[1]', 'int')) AS chave,\n    n.value('(@count)[1]', 'int') AS EventCount\nFROM\n\t(SELECT CAST(target_data as XML) target_data\n\t\tFROM sys.dm_xe_sessions AS s \n\t\t\tJOIN sys.dm_xe_session_targets t\n\t\t\t\tON s.address = t.event_session_address\n\t\tWHERE s.name = 'LogFlush'\n\t\t\tAND t.target_name = 'histogram') as tab\nCROSS APPLY target_data.nodes('HistogramTarget\/Slot') as q(n)\n<\/pre>\n<p> \tAs you will have noticed, the results of the histogram are stored in <strong>sys.dm_xe_session_targets<\/strong> in a XML field. <\/p>\n<h2>Cleanup Sequence<\/h2>\n<p> \tWe will also need a cleanup sequence to ensure the independence of each test execution. This will be the following sequence: <\/p>\n<ol>\n<li>Drop the database if it already exists<\/li>\n<li>Run the script to create the environment<\/li>\n<li>Clear the wait stats<\/li>\n<li>Clear the performance monitor window<\/li>\n<li>Clear the SQL Server cache\n<pre class=\"lang:tsql theme:ssms2012\">dbcc freeproccachedbcc \ndropcleanbuffers\n<\/pre>\n<\/li>\n<li>Stop\/start the logFlush XE session<br \/> \t In the first execution, let&#8217;s only start the session:\n<pre class=\"lang:tsql theme:ssms2012\">alter event session logflush on server state=start<\/pre>\n<\/li>\n<\/ol>\n<h1>First Execution<\/h1>\n<p> \tWe will use the\ttime taken by the first execution as a reference to see how much we can improve the batch, but of course it will depend considerably on your test environment. Let&#8217;s do the cleanup steps I&#8217;ve just shown, and then execute the batch. <\/p>\n<p> \t<strong><em>The first execution took 08:04m<\/em><\/strong> <\/p>\n<h1>The first problem: Auto-Growth<\/h1>\n<p> \tThe first thing to notice is that the size of both the data file and log file has increased after the execution of the batch, the data file much more than the log file. The database has auto-growth on and the amount of growth was determined by the database auto-growth configuration. <\/p>\n<p> \tYou can use <strong>sp_helpdb<\/strong> procedure to get the result in the image below. <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">exec sp_helpdb testbatch<\/pre>\n<p> \t<img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/2184-Screenshot%20(200)-eccd52b2-e178-4501-8d6e-d92807b6c1b4.png\" alt=\"2184-Screenshot%20(200)-eccd52b2-e178-45\" \/><\/p>\n<p> \tAuto-growth is there to prevent disasters, and so is a great backstop, but we should try to avoid it happening in the middle of batch executions, because it makes our batch a lot slower. We need the auto-growth configured to prevent a disaster because it would be much worse to have our batch aborted by lack of space, but we will try to reduce the number of auto-growth events occurring during the batch. We can do this by doing the following: <\/p>\n<ol>\n<li>We will change the database and log file size to accommodate all the new data before the execution of the batch. The DBA can schedule this change to happen just before the execution.<\/li>\n<li>We will change the auto-growth configuration so that the files will grow in bigger increments. If we fail to predict the size needed for the files, we will still minimize the number of auto-growths during the batch execution.<\/li>\n<\/ol>\n<p> \tWe can check all the auto-growth events that happened in the server with the following script: <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">DECLARE @path NVARCHAR(260);\n\n-- First: Retrieve the physical path where\n-- the trace files are\nSELECT \n@path = REVERSE(SUBSTRING(REVERSE([path]), \nCHARINDEX('\\', REVERSE([path])), 260)) + N'log.trc'\nFROM sys.traces\nWHERE is_default = 1;\n\n-- Second: Use sys.fn_trace_gettable to read\n-- the information in the trace files\nSELECT \n\t\tDatabaseName,\n\t\t[FileName],\n\t\tSPID,\n\t\tDuration,\n\t\tStartTime,\n\t\tEndTime,\n\t\tFileType = CASE EventClass \n\t\t\tWHEN 92 THEN 'Data'\n\t\t\tWHEN 93 THEN 'Log'\n\t\tEND\nFROM sys.fn_trace_gettable(@path, DEFAULT)\n\tWHERE\n\tEventClass IN (92,93) -- 92: Data File, 93: Log File\nORDER BY\nStartTime DESC;\n<\/pre>\n<p> \tThis script is using a system default server trace that is enabled by default and has the auto-growth information. The trace records its data in the file <strong>log.trc<\/strong> and we are reading this file using the function <strong>sys.fn_trace_gettable<\/strong>. <\/p>\n<p> \tLet&#8217;s change this query a bit to get the total number of auto-growth events only for our database and only during the last execution of our batch: <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">DECLARE @path NVARCHAR(260);\n\nSELECT \n@path = REVERSE(SUBSTRING(REVERSE([path]), \nCHARINDEX('\\', REVERSE([path])), 260)) + N'log.trc'\nFROM sys.traces\nWHERE is_default = 1;\n\nwith qry as (\n\t\tSELECT \n\t\t\tDatabaseName,\n\t\t\t[FileName],\n\t\t\tSPID,\n\t\t\tDuration,\n\t\t\tStartTime,\n\t\t\tEndTime,\n\t\t\tFileType = CASE EventClass \n\t\t\t\tWHEN 92 THEN 'Data'\n\t\t\t\tWHEN 93 THEN 'Log'\n\t\t\tEND\n\t\tFROM sys.fn_trace_gettable(@path, DEFAULT)\n\t\tWHERE\n\t\tEventClass IN (92,93) )\n\tselect filetype, count(*) as growth\n-- Total of file growths\n\tfrom qry\n\twhere\n\t\t-- only information from our database \n\t\tdatabasename='testbatch' and \n\t\t-- only last 15 minutes\n\t\tstarttime &gt; DATEADD(MINUTE,-15,getdate())\n\t\t-- Grouping by filetype (data\/log)\n\tgroup by filetype \n<\/pre>\n<p> \tNotice that we are getting the auto-growth events from only the last fifteen minutes, making it easier to retrieve the events triggered by our batch. <\/p>\n<p> \tIn our example, there was 84 auto-growth for the data and 13 auto-growth for the log file. These auto-growth events are making our batch slower. <\/p>\n<p> \t<img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/2184-Screenshot%20(236)-f5e992de-cc8c-4508-92e3-9de12dd9a326.png\" alt=\"2184-Screenshot%20(236)-f5e992de-cc8c-45\" \/><\/p>\n<p> \tThe solution is to make the files bigger for the data before you run the batch. <\/p>\n<p> \tHere is the first proof that you would benefit from having the advice of a DBA when you plan the batch process: The DBA will help to predict the impact of the batch import of data on the size of your data and log files and, if necessary, correct the growth settings. <\/p>\n<h2>Planning the size of the data and log files<\/h2>\n<p> \tThe new CREATE DATABASE instruction for the test system will be like this: <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">create database Testbatch on\n(name=data,filename='c:\\banco\\dados.mdf',size=100mb,filegrowth=100mb) \nlog on\n(name=log, filename='c:\\banco\\log.ndf', size=10mb,filegrowth=10mb)\ngo\n<\/pre>\n<p> \tNotice that we are not only creating the database with bigger files but we also are creating the files with a bigger filegrowth value. If for some reason our calculations were wrong and the auto-growth still happen during the execution of the batch, it will happen fewer times. <\/p>\n<p> \tIn our example we are just changing the <strong>create database<\/strong> instruction, but for a production environment the DBA need to plan to change the size of existing files. The instructions would be like this: <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">alter database testbatch modify file (name='data', size=100, filegrowth=100)\n\nalter database testbatch modify file (name='log', size=10, filegrowth=10)\n<\/pre>\n<p> \tIn our test environment, it&#8217;s easy to predict the size: We already executed the script once, so the size needed is the final size of the data files after the first execution, with an allowance for variation. <\/p>\n<p> \tThis kind of calculation is the exception, because in test environment you mostly haven&#8217;t the same amount of data you have in production. The DBA will need to do some calculations to predict the amount of space that will be necessary for the data. <\/p>\n<p> \tIn our example, the <strong>INT<\/strong> fields has 4 bytes each (<strong>ID<\/strong>, <strong>ProductID<\/strong>, <strong>ProductSubCategoryID<\/strong>). The money field has 8 bytes (<strong>UnitPrice<\/strong>). The row has 20 bytes and one more to control the null field, making 21 bytes in all. <\/p>\n<p> \tEach page has 8kb, so we will have 380 rows per page. This is an approximate calculation. <\/p>\n<p> \tWe are inserting a total of 3.096.576 (three millions, ninety-six thousands, five hundreds seventy-six) records. 3.096.576\/380 = 8149 pages needed for these records. <\/p>\n<p> \tCalculating the space: 8149 * 8kb= 65.192Kb. This isn&#8217;t the final amount because we aren&#8217;t calculating the index pages and a few more details, but already give to you an idea about how this calculation can be done. <\/p>\n<div class=\"indented tips\"> <strong>Side Note &#8211; Why autogrowth is the first problem to check? <\/strong> <\/p>\n<p> You are developing a batch process and checking how it performs. If you don&#8217;t check autogrowth first, each execution will have different results and  it will become too difficult to analyze the result. <\/p>\n<p> For example, the first execution could have a result with a lot of autogrowth during the execution, whereas the second execution could have a very different result because the files already have the space needed for the data. <\/p>\n<p> There are plenty of possible results, the best way is to check autogrowth first and avoid it during the batch execution.  <\/p><\/div>\n<h1>Second Execution<\/h1>\n<p> \tLet&#8217;s now repeat the steps to do a new execution with the larger data file and log file sizes: <\/p>\n<ol>\n<li>Drop the database if it already exists<\/li>\n<li>Run the script to create the environment<\/li>\n<li>Clear the wait stats<\/li>\n<li>Clear SQL Server cache<\/li>\n<li>Clear the performance monitor window<\/li>\n<li>Stop\/start the logFlush XE session\n<pre class=\"lang:tsql theme:ssms2012\">alter event session logflush on server state=stop\nalter event session logflush on server state=start<\/pre>\n<\/li>\n<\/ol>\n<p> \t<strong><em>Second Execution time: 07:38<\/em><\/strong> <\/p>\n<p> \tRunning the query again to retrieve the total number of auto-growth (you may need to adjust the minutes from last fifteen to last nine minutes) you will receive an empty result. <\/p>\n<h1>Monitoring the Auto-Growth<\/h1>\n<p> \tLet&#8217;s say something went wrong in the planning steps. How could you know that one batch already in production is suffering from the auto-growth during the batch? <\/p>\n<p> \tHere some solutions you can use: <\/p>\n<ol>\n<li>Data Collector can create a graphical report that demonstrated the data file and log growth. You can read more about the data collector in my article <a href=\"https:\/\/www.simple-talk.com\/sql\/database-administration\/centralize-your-database-monitoring-process\/?utm_source=simpletalk&amp;utm_medium=weblink&amp;utm_content=optbatchproc\">&#8216;Centralize Your Database Monitoring Process&#8217;<\/a><\/li>\n<li>You can add the same query that we used to check the auto-growth events as a new step in your batch, adding the result to a new table that you will need to check later.<\/li>\n<li>You can use performance monitor to check the auto-growth. There is a counter named &#8216;Log file Growth&#8217; and another one named &#8216;Data File(s) Size(kb)&#8217;. You can use Performance Monitor to capture this information to a file and check the file later.<\/li>\n<\/ol>\n<p class=\"illustration\"> \t<img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/2184-Screenshot%20(233)-32c8a6af-4594-4ce1-bfa8-a484bfce650e.png\" alt=\"2184-Screenshot%20(233)-32c8a6af-4594-4c\" \/><\/p>\n<h1>Second Problem: Locks<\/h1>\n<p> \tBy observing the results of &#8216;Lock acquired\/sec&#8217; in performance monitor, we would notice a huge number of locks. Does our batch run concurrently with other tasks? If this is an overnight batch performing an ETL task during a quiet time, we probably wouldn&#8217;t need this number of locks because there is less risk of inadvertently blocking another process. <\/p>\n<p class=\"illustration\"> \t<img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/2184-Screenshot%20(202)-f36c4f9f-5d68-4900-9a78-adc502d2142c.png\" alt=\"2184-Screenshot%20(202)-f36c4f9f-5d68-49\" \/><\/p>\n<p> \tThe locks have two sources: The INSERT statement in the batch and the SELECT statement in the batch. One solution for this is to use the <strong>tablockx<\/strong> query hint, so instead of getting a lock for each row, we would get a lock for the entire table. Notice that I will not use <strong>nolock<\/strong> for the select statement. Although <strong>nolock<\/strong> is theoretically a possible solution for SELECTs, it is likely to be dangerous because, in some circumstances, it can result in wrong data. <\/p>\n<p> \tThe new script will be like this: <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">-- It's basically the same logic than\n-- our previous batch, with only a few\n-- changes\ndeclare @i bigint\ndeclare @icnt int\nselect @icnt=1\nwhile @icnt&lt;3\nbegin\n\tselect @i=0\n\twhile @i&lt;1032192\n\tbegin\n\t\twith qry as\n\t\t(SELECT\n\t\t\tp.ProductID,\n\t\t\tp.ProductSubcategoryId,\n\t\t\tROW_NUMBER() OVER \n\t\t\t(\n\t\t\t\tORDER BY\n\t\t\t\t\tn.number\n\t\t\t) as unitprice\n\t\t-- tablockx in the tables will avoid a bunch of\n\t\t-- locks in table Product and spt_values\n\t\tFROM Adventureworks2012.production.Product (tablockx) AS p\n\t\tCROSS APPLY master..spt_values (tablockx) AS n\n\t\tWHERE\n\t\t\tn.type = 'p')\n\t\t-- tablockx in the insert will avoid a bunch of locks \n\t\t-- in table Produtos\n\t\tINSERT Produtos with (tablockx) \n\t\t\t\tselect ProductID,ProductSubCategoryId,UnitPrice\n    from qry where \n\t\t\t    unitprice &gt;=@i and unitprice &lt;@i+1000\n\n\tselect @i=@i+1000\n\tend \n\tselect @icnt=@icnt+1\nend\n<\/pre>\n<p> \tLet&#8217;s do the cleanup again and repeat the execution: <\/p>\n<ol>\n<li>Drop the database if it already exists<\/li>\n<li>Run the script to create the environment<\/li>\n<li>Clear the wait stats<\/li>\n<li>Clear the SQL Server cache<\/li>\n<li>Clear the performance monitor window<\/li>\n<li>Clear and start the logFlush XE session.<\/li>\n<\/ol>\n<p> \t<strong><em>Third Execution time: 05:41<\/em><\/strong> <\/p>\n<p> \tYou can observe in performance monitor a huge decrease in the number of <strong>lock requests\/sec<\/strong> in the database, and this is reflected in the execution time. <\/p>\n<p class=\"illustration\"> \t<img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/2184-Screenshot%20(204)-073c9bdf-d83c-4812-9a6a-dc497a59eb05.png\" alt=\"2184-Screenshot%20(204)-073c9bdf-d83c-48\" \/><\/p>\n<h1>Monitoring Locks in Production<\/h1>\n<p> \tTo monitor locks in production, you need to create a baseline. This means you need to be continuously monitoring the number of locks, so you set a baseline, the regular amount of locks in your system then you will notice anything different, especially when you deploy new batches in production. <\/p>\n<p> \tYou can use data collector to monitor locks in your servers and created the baseline. The article <a href=\"https:\/\/www.simple-talk.com\/sql\/database-administration\/centralize-your-database-monitoring-process\/?utm_source=simpletalk&amp;utm_medium=weblink&amp;utm_content=optbatchproc\">&#8216;Centralize Your Database Monitoring Process&#8217;<\/a> is a good start. <\/p>\n<h1>Third Problem: WriteLog<\/h1>\n<p> \tLet&#8217;s take a look at the wait stats to identify another main problem in our batch. <\/p>\n<p> \tYou will notice WRITELOG with a big value, 2561 in our example. This is causing problems for our batch. The first thought would be to get a better disk and to locate the log and data files on separate disks. Both solutions are good, but we need to return to the basics: Is our batch doing more log flushes than is actually required? <\/p>\n<p class=\"illstration\"> \t<img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/2184-Screenshot%20(205)-1c1cd19e-c8dd-4a5c-91e3-11c45b59921a.png\" alt=\"2184-Screenshot%20(205)-1c1cd19e-c8dd-4a\" \/><\/p>\n<p> \tA log flush, writing the log to the disk, will happen at every commit that our batch does. Oh, wait! Our batch doesn&#8217;t have the commit instruction, does it? <\/p>\n<p> \tIn this case, we don&#8217;t have an explicit transaction, but every INSERT instruction has an internal transaction, and so it is as if each INSERT have its own BEGIN TRANSACTION and COMMIT TRANSACTION. <\/p>\n<p> \tBecause of this, we have many transactions in our batch and a lot of log flushes. <\/p>\n<p> \tWe can SELECT the result of <strong>logFlush<\/strong> XE session to find out how many log flushes happened: <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">SELECT \n    db_name( n.value('(value)[1]', 'int')) AS chave,\n    n.value('(@count)[1]', 'int') AS EventCount\nFROM\n\t(SELECT CAST(target_data as XML) target_data\n\t\tFROM sys.dm_xe_sessions AS s \n\t\t\tJOIN sys.dm_xe_session_targets t\n\t\t\t\tON s.address = t.event_session_address\n\t\tWHERE s.name = 'LogFlush'\n\t\t\tAND t.target_name = 'histogram') as tab\nCROSS APPLY target_data.nodes('HistogramTarget\/Slot') as q(n)\n<\/pre>\n<p> \tOf course, this is a great query to transform into a table-valued function so you can reuse it easily, but this could be the subject for a future article about extended events. <\/p>\n<p> \t<img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/2184-Screenshot%20(206)-b67bf26c-782a-473c-a1df-461522279415.png\" alt=\"2184-Screenshot%20(206)-b67bf26c-782a-47\" \/><\/p>\n<p> \tWe can see that we had 8871 log flushes during our batch. <\/p>\n<p> \tThe first solution we think of is simple: make the entire batch one single transaction. Now our batch will have an explicit BEGIN TRANSACTION\/COMMIT TRANSACTION. The new script will be like this: <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">-- It's basically the same logic than\n-- our previous batch, with only a few\n-- changes\ndeclare @i bigint\ndeclare @icnt int\nselect @icnt=1\n-- Now we have a transaction during the\n-- entire operation\nbegin transaction\n\twhile @icnt&lt;=3\n\tbegin\n\t\tselect @i=0\n\t\twhile @i&lt;1032192\n\t\tbegin\n\t\t\twith qry as\n\t\t\t(SELECT\n\t\t\t\tp.ProductID,\n\t\t\t\tp.ProductSubcategoryId,\n\t\t\t\tROW_NUMBER() OVER \n\t\t\t\t(\n\t\t\t\t\tORDER BY\n\t\t\t\t\t\tn.number\n\t\t\t\t) as unitprice\n\t\t\tFROM Adventureworks2012.production.Product (tablockx) AS p\n\t\t\tCROSS APPLY master..spt_values (tablockx) AS n\n\t\t\tWHERE\n\t\t\t\tn.type = 'p')\n\t\t\tINSERT Produtos with (tablockx) \n\t\t\t\t\tselect ProductID,ProductSubCategoryId,UnitPrice\n\t\t\t\t\t from qry where \n\t\t\t\t\t\tunitprice &gt;=@i and unitprice &lt;@i+1000\n\n\t\tselect @i=@i+1000\n\t\tend \n\t\tselect @icnt=@icnt+1\n\tend\ncommit transaction\n<\/pre>\n<p> \tTo try the new script, let&#8217;s repeat the cleanup steps again: <\/p>\n<ol>\n<li>Drop the database if it already exists<\/li>\n<li>Run the script to create the environment<\/li>\n<li>Clear the wait stats<\/li>\n<li>Clear the SQL Server cache<\/li>\n<li>Clear the performance monitor window<\/li>\n<li>Clear and start the logFlush XE session.<\/li>\n<\/ol>\n<p> \t<strong><em>Fourth Execution time: 04:59<\/em><\/strong> <\/p>\n<h1>Fourth Problem: Auto-Growth again<\/h1>\n<p> \tThe execution got a little better but not as much as we might have been hoping for. <\/p>\n<p> \tLet&#8217;s check again the number of log flushes. It dropped, but not too much: <\/p>\n<p class=\"illustration\"> \t<img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/2184-Screenshot%20(209)-1600b7df-2799-4882-ae77-7add90df1dc2.png\" alt=\"2184-Screenshot%20(209)-1600b7df-2799-48\" \/><\/p>\n<p> \tThe number of lock requests dropped again, this is good: <\/p>\n<p class=\"illustration\"> \t<img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/2184-Screenshot%20(207)-622821e4-de8d-46bf-acc3-1bcae849c0bb.png\" alt=\"2184-Screenshot%20(207)-622821e4-de8d-46\" \/><\/p>\n<p> \tWe can check again the wait stats and we will notice that the number of <strong>writelogs<\/strong> decreased a lot, so the problem seems solved. <\/p>\n<p class=\"illustration\"> \t<img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/2184-Screenshot%20(210)-c130043f-64b3-427e-81c5-416f0fb637b5.png\" alt=\"2184-Screenshot%20(210)-c130043f-64b3-42\" \/><\/p>\n<p> \tBut you might be surprised when you check the size of the database files: <\/p>\n<p class=\"illustration\"> \t<img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/2184-Screenshot%20(211)-0642d3ea-8e6a-422c-aa79-ed1e8fd0aa7e.png\" alt=\"2184-Screenshot%20(211)-0642d3ea-8e6a-42\" \/><\/p>\n<p> \tSo, we have auto-growth happening again. Check the total of auto-growths to confirm this: <\/p>\n<p> \t<img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/2184-Screenshot%20(212)%20corrigida-927ada92-37eb-4134-90cd-5fd4b8885ef3.png\" alt=\"2184-Screenshot%20(212)%20corrigida-927a\" \/><\/p>\n<p> \tWhy did the log file get so much bigger? <\/p>\n<p> \tThe transaction log space management is directly tied with the transactions and the recovery model of the database, which, in turn, is tied with the database backup plan. <\/p>\n<p> \tWe can check the recovery model of the database using the following query: <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">SELECT 'The recovery model is '+recovery_model_desc\n   FROM sys.databases\nwhere db_id() = database_id\n<\/pre>\n<p> \tIn our example, we are using the simple recovery model. This means that SQL Server will truncate the log after each checkpoint. Truncate the log means that SQL Server will remove from the log all completed transactions, while keep the ongoing transactions in the log. <\/p>\n<div class=\"indented tips\">     <strong>Side Note<\/strong><\/p>\n<p> I said we are using simple recovery model, but that&#8217;s not &#8216;so simple&#8217;.  We didn&#8217;t specified the recovery model in the create database statement, so the recovery model of <strong>TestBatch<\/strong> database is configured as Full. <\/p>\n<p> Although the database is configured with the full recovery model, it&#8217;s not working in this way. That&#8217;s because it&#8217;s not enough for a database to be configured as full recovery model, the database needs a full backup to start working with the full recovery model. <\/p>\n<p> This situation creates a &#8216;fake full&#8217; recovery model: The database is configured as full recovery model, but it&#8217;s working as the simple. <\/p>\n<p> To identify the &#8216;fake full&#8217; case we can use the following query: <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">select name,last_log_backup_lsn from \nsys.databases a, \nsys.database_recovery_status b\nwhere a.database_id=b.database_id<\/pre>\n<p>  If the field <strong>last_log_backup_lsn<\/strong> is null this means the database is in simple or fake full recovery model.     <\/div>\n<p> \tHowever, if you find that your database isn&#8217;t using the simple recovery model, you shouldn&#8217;t change it without the help of the DBA and a lot of planning. If the backup plan includes log backups, the recovery model needs to be &#8216;full recovery&#8217;. If the backup plan does not include log backups, the recovery model should be &#8216;simple recovery&#8217;. <\/p>\n<p> \tHere is the second proof that you would benefit from having the advice of a DBA who understands the relation between recovery model, transactions and log space and can plan the change of transaction log file size as part of your batch. <\/p>\n<p> \tThe approach to solve the log growth need to be different according to the recovery model. Let&#8217;s analyze the possibilities. <\/p>\n<h1>The batch with Full Recovery Model<\/h1>\n<p> \tWhen we use the &#8216;full recovery&#8217; model, the log truncations do not happen in each checkpoint, but instead happen only when we do a log backup (notice that&#8217;s an error to use full recovery model without executing regular log backups). <\/p>\n<p> \tTo take control of log space probably the DBA would like to schedule a log backup right after the batch execution. The log backup will execute a log truncation after the backup, leaving the log with a lot of empty space. <\/p>\n<p> \tEven if the exact amount of space couldn&#8217;t be predicted from the tests, the exact space needed in the log would become clearer after the first or a few more executions in production: It will then be possible to avoid all the log growth during the batch execution. <\/p>\n<p> \tThe DBA could choose whether to increase the log file size before the batch and return it to its regular size after the log backup or just keep the log file size always with enough space for the batch. <\/p>\n<h1>The batch with Simple Recovery model<\/h1>\n<p> \tBefore we included the BEGIN\/COMMIT in the script, we had many transactions in the batch, so at each checkpoint, SQL Server could delete all completed transactions from the log (the log truncation process) and the log would not grow too much. <\/p>\n<p> \tNow that we have only one transaction, SQL Server will only remove this transaction from the log in the first truncation after the entire transaction finishes. Therefore, you will need to have enough space in the log for the entire transaction. <\/p>\n<p> \tWe need to make the log file bigger to support the entire transaction but, of course, we will not keep a huge transaction log only because of a single batch execution. The best approach is to increase the log for the batch and after the batch return the log for its normal size. <\/p>\n<p> \tHowever, using the simple recovery model we do not really need a single transaction. We can afford a balance between the log size, the transaction volume and the log flushes. <\/p>\n<p> \tAt first, we had many small transactions, causing many log flushes. However, the COMMIT TRANSACTION isn&#8217;t the only reason of a log flush: if the log buffer is full, the flush will happen anyway. So, if we reduce the transactions to a single one, we will not reduce the log flushes to one and will greatly increase the space needed. <\/p>\n<p> \tWe can use bigger chunks than a thousand rows, but smaller than all the total amount of records, achieving this way a balance between log flushes, transactions and log space. <\/p>\n<p> \tIn our sample batch, there are a few solutions: <\/p>\n<ol>\n<li>Remove the Begin Transaction\/Commit Transaction and increase the size of the chunk\n<pre class=\"lang:tsql theme:ssms2012\">-- ...\n    -- ...\t\t\t\n                INSERT Produtos with (tablockx) \n                        select ProductID,ProductSubCategoryId,UnitPrice\n                         from qry where \n                            unitprice &gt;=@i and unitprice &lt;@i+10000\n    \n            select @i=@i+10000\n    -- ...\n        <\/pre>\n<\/li>\n<li>Change the position of the Begin Transaction\/Commit Transaction to inside the first while: \t<img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/2184-Screenshot%20(234)-5493e001-fb1f-40d4-8869-5ef2d678ffe0.png\" alt=\"2184-Screenshot%20(234)-5493e001-fb1f-40\" \/><\/li>\n<li>Create another while and counter variable to control the amount of records in each transaction <\/li>\n<\/ol>\n<h1>Changing our Script<\/h1>\n<p> \tAs we just saw, there are plenty of possibilities to balance the log size, transaction volume and log flushes. I will proceed with the tests using a single transaction and leave to you to find the balance for your environment. <\/p>\n<p> \tWe can do a test to find the approximate amount of space needed for the log. We can insert one record in the table and check the amount of log generated. This will work better if the recovery model is simple, otherwise will be more difficult to find the information in the log. <\/p>\n<p> \tThe script to insert a record and read the log will be this one: <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">-- force a checkpoint, which will truncate the log if recovery is simple\ncheckpoint\ngo\n-- Insert one record\ninsert into produtos \n\tSELECT top 1\n\t\t\t\tp.ProductID,\n\t\t\t\tp.ProductSubcategoryId,\n\t\t\t\t10 as unitprice            \t\t\t\n\t\t\tFROM Adventureworks2012.production.Product AS p\n\ngo\n-- read the information in the log\nSELECT * FROM fn_dblog (NULL, NULL)\n<\/pre>\n<p> \tYou can see the result in the following image: <\/p>\n<p class=\"illustration\"> \t<img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/2184-Screenshot%20(237)-8cecc8c7-7f4a-494e-9328-29c59f0b199b.png\" alt=\"2184-Screenshot%20(237)-8cecc8c7-7f4a-49\" \/><\/p>\n<p> \tThe rows between <strong>LOP_BEGIN_XACT<\/strong> and <strong>LOP_COMMIT_XACT<\/strong> are our transaction. We need to sum the values of <strong>Log Record Length<\/strong> field between these rows, multiply by the number of rows we will really have in our transaction and add the <strong>Log Record Length<\/strong> value for <strong>LOP_BEGIN_XACT<\/strong> row and <strong>LOP_COMMIT_XACT<\/strong> row. <\/p>\n<p> \tLet&#8217;s do the calculations: <br \/>128 + 112= 240 * 3.096.576 = 743.178.240 + 124 (LOP_BEGIN_XACT) + 84 (LOP_COMMIT_XACT) = 743.178.448 bytes = 709 MB. <\/p>\n<p> \tWe saw in our tests that the log file grew up to 900 MB, but this happened because we rely on auto-growth feature, each growth create a new VLF (Virtual Log File) inside the log. By adjusting the log file to the right size, we will create fewer VLFs and will make better use of the space. However, we can use this value plus 10% and we can be sure that the log used will be less than this value. <\/p>\n<p> \tYou can read more about VLFs in the excellent Paul Randal article <a href=\"https:\/\/technet.microsoft.com\/en-us\/magazine\/2009.02.logging.aspx\">&#8216;Understanding Log and Recovering in SQL Server&#8217;<\/a> and you will also see more about these calculations in Paul Randal&#8217;s <a href=\"http:\/\/www.sqlskills.com\/blogs\/paul\/benchmarking-1-tb-table-population-part-2-optimizing-log-block-io-size-and-how-log-io-works\/\">Benchmarking article<\/a>. <\/p>\n<p> \tLet&#8217;s change the CREATE DATABASE instruction to make enough log space for our batch: <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">create database Testbatch on\n(name=data,filename='c:\\banco\\dados.mdf',size=100mb,filegrowth=100mb) \nlog on\n(name=log, filename='c:\\banco\\log.ndf', size=800mb,filegrowth=200mb)\ngo\n<\/pre>\n<p> \tTo repeat: this change in the CREATE DATABASE instruction only works for our example. In the production environment, the DBA will need to change the size of the existing file. More than that: For the log file, the DBA would like to not only make the file bigger before the execution, but return the file to its normal size after the execution. <\/p>\n<p> \tTo return the log file to its normal size, the DBA can use a simple DBCC ShrinkFile if the recovery model is simple. Otherwise , the DBA will need to execute a log backup and then use DBCC ShrinkFile if the recovery model is Full. <\/p>\n<h1>Fifth Execution<\/h1>\n<p> \tNow we will repeat the clean up and do a new test run of the batch execution: <\/p>\n<ol>\n<li>Drop the database if it already exists<\/li>\n<li>Run the script to create the environment<\/li>\n<li>Clear the wait stats<\/li>\n<li>Clear the SQL Server cache<\/li>\n<li>Clear the performance monitor window<\/li>\n<li>Stop\/start the logFlush XE session<\/li>\n<\/ol>\n<p> \t<strong><em>Fifth Execution time: 03:18<\/em><\/strong> <\/p>\n<h1>Fifth Problem: Index and Fragmentation<\/h1>\n<p> \tThe table that is receiving the data has indexes; one clustered and one non-clustered. We created the clustered index to enforce the primary key, using an always-increasing key defined with Identity attribute. <\/p>\n<p> \tThe problem is the non-clustered index: For each row inserted, the non-clustered index needs to be updated, slowing our batch. The rows are inserted over the entire index tree because the key isn&#8217;t always-increasing, so we have a lot of page splits during the batch, slowing even more the process and resulting in a very fragmented index, which will need to be defragmented after the batch. <\/p>\n<p> \tWe can use the following instruction to identify the index fragmentation: <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">dbcc showcontig('produtos','indsub')\n<\/pre>\n<p class=\"illustration\"> \t<img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/2184-Screenshot%20(219)-d663705f-dfd9-457a-9e25-b5ddfc722add.png\" alt=\"2184-Screenshot%20(219)-d663705f-dfd9-45\" \/><\/p>\n<p> \tAs you might notice from these figures, the index is completely fragmented. In summary, our batch became slower, because it had to maintain an index that then became completely fragmented and will cost more time for defragmention. Did you noticed the LCX_INDEX_LEAF row when we checked our transaction in the log? This row is the update of the non-clustered index, making our log bigger. <\/p>\n<p> \tThe best solution: Drop the index before the batch and create the index again after the batch. Of course, this isn&#8217;t simple: there are a lot of points to consider. <\/p>\n<div class=\"indented\">\n<p> <em>\tHere is the third proof that you would benefit from having the help of a DBA to analyze the possibility of dropping the indexes and create them again after the batch, controlling this execution.<\/em> <\/p>\n<\/p><\/div>\n<p> \tWe will change the script that creates our environment so we won&#8217;t create the index until the bulk insertion is complete. The new script will be: <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">-- Create the database\ncreate database TestBatch on\n(name=data,filename='c:\\banco\\dados.mdf',size=100mb) log on\n(name=log, filename='c:\\banco\\log.ndf', size=900mb)\ngo\n\nuse testbatch\ngo\n-- Create the table with a primary key\nCREATE TABLE produtos\n(\n\tId int identity(1,1) not null primary key,\n\tProductId INT NOT NULL, \n\tProductSubcategoryId INT NULL,\n\tUnitPrice MONEY NOT NULL\n)\nGO\n<\/pre>\n<p> \tNow we will repeat the clean up and do a new execution: <\/p>\n<ol>\n<li>Drop the database if it already exists<\/li>\n<li>Run the script to create the environment<\/li>\n<li>Clear the wait stats<\/li>\n<li>Clear the SQL Server cache<\/li>\n<li>Clear the performance monitor window<\/li>\n<li>Stop\/start the logFlush XE session<\/li>\n<\/ol>\n<p> \t<strong><em>Sixth Execution Time: 3:12m<\/em><\/strong> <\/p>\n<h1>Sixth Problem: Balance the amount of log flushes<\/h1>\n<p> \tWe still didn&#8217;t achieve a good balance with the amount of log flushes. This problem is very difficult to identify. There are a few wait stats that will indicate the problem: The Writelog was reduced but still happens, SOS_Scheduler_YIELD is high even considering that our batch query plan doesn&#8217;t use parallelism; Dirty_Page_Pool and LOGMGR_QUEUE, two wait stats generated by background process, are also high. <\/p>\n<p> \tThe key question is: How often will SQL Server flush the log? <\/p>\n<p> \tNow that we have only a single transaction, the log flushes will happen when the log buffer become full. The log buffer is a structure that has a dynamic size between 512 bytes and 60kb. SQL Server decides the log buffer size according to the batch and log record size. <\/p>\n<p> \tWe need to adjust our batch to maximize the log buffer size and have as few log flushes as possible. To achieve this we need to adjust our chunk size, currently in 1 thousand records, to a bigger value. <\/p>\n<p> \tThere isn&#8217;t an exact calculation for this. We can do a few tests and use our Extended Events session to capture the log flushes, so we will find the best value. <\/p>\n<p> \tFirst, let&#8217;s use the following insert for our tests: <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">with qry as\n\t\t(SELECT top 1000   -- the size of the chunk\n\t\t\tp.ProductID,\n\t\t\tp.ProductSubcategoryId,\n\t\t\t10 as unitprice            \n\t\tFROM Adventureworks2012.production.Product AS p\n\t\tCROSS APPLY master..spt_values AS n\n\t\t\t--- This table is used to create test data\n\t\tWHERE\n\t\t\tn.type = 'p')\n\t\tINSERT INTO Produtos \n\t\t\tselect ProductID,ProductSubcategoryID, unitprice\n\t\t\t from qry \n<\/pre>\n<p> \tWe can run the insert above, query our XE session to find the amount of log flushes we had, then we clear the XE session (stop\/start) and execute the insert again with a different chunk size. <\/p>\n<p> \tHere the results I found: <\/p>\n<table>\n<tbody>\n<tr>\n<td valign=\"top\">\n<p>Chunk Size<\/p>\n<\/td>\n<td valign=\"top\">\n<p>Log Flushes<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>1000 records<\/p>\n<\/td>\n<td valign=\"top\">\n<p>3<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>5000 records<\/p>\n<\/td>\n<td valign=\"top\">\n<p>11<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>10.000 records<\/p>\n<\/td>\n<td valign=\"top\">\n<p>22<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>20.000 records<\/p>\n<\/td>\n<td valign=\"top\">\n<p>44<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>30.000 records<\/p>\n<\/td>\n<td valign=\"top\">\n<p>66<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p> \tYou can notice the change in the proportion of log flushes. For 1 thousand records, we have three, but for five thousand we have only eleven, not fifteen. After that, the proportion is the same: eleven log flushes for each five thousand records. <\/p>\n<p> \tThis doesn&#8217;t mean five thousand is the perfect value: now it depends on our IO system. The next test is to run the batch with a few different chunk sizes and check the wait stats. <\/p>\n<p> \tHere the results of my tests: <\/p>\n<table>\n<tbody>\n<tr>\n<td valign=\"top\">\n<p>Wait Stats<\/p>\n<\/td>\n<td valign=\"top\">\n<p>1000 records<\/p>\n<\/td>\n<td valign=\"top\">\n<p>10.000 records<\/p>\n<\/td>\n<td valign=\"top\">\n<p>20.000 records<\/p>\n<\/td>\n<td valign=\"top\">\n<p>30.000 records<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>WRITELOG<\/p>\n<\/td>\n<td valign=\"top\">\n<p>244<\/p>\n<\/td>\n<td valign=\"top\">\n<p>193<\/p>\n<\/td>\n<td valign=\"top\">\n<p>5<\/p>\n<\/td>\n<td valign=\"top\">\n<p>73<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>LOGMGR_QUEUE<\/p>\n<\/td>\n<td valign=\"top\">\n<p>10.522<\/p>\n<\/td>\n<td valign=\"top\">\n<p>5.031<\/p>\n<\/td>\n<td valign=\"top\">\n<p>5.010<\/p>\n<\/td>\n<td valign=\"top\">\n<p>4.888<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>DIRTY_PAGE_POOL<\/p>\n<\/td>\n<td valign=\"top\">\n<p>7.174<\/p>\n<\/td>\n<td valign=\"top\">\n<p>917<\/p>\n<\/td>\n<td valign=\"top\">\n<p>484<\/p>\n<\/td>\n<td valign=\"top\">\n<p>606<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>PAGELATCH_EX<\/p>\n<\/td>\n<td valign=\"top\">\n<p>119<\/p>\n<\/td>\n<td valign=\"top\">\n<p>1<\/p>\n<\/td>\n<td valign=\"top\">\n<p>0<\/p>\n<\/td>\n<td valign=\"top\">\n<p>51<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>SOS_SCHEDULER_YIELD<\/p>\n<\/td>\n<td valign=\"top\">\n<p>56.857<\/p>\n<\/td>\n<td valign=\"top\">\n<p>11.546<\/p>\n<\/td>\n<td valign=\"top\">\n<p>6.602<\/p>\n<\/td>\n<td valign=\"top\">\n<p>5.344<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p> \tYou can notice in the above table the biggest improvement between 1 thousand and ten thousand records and you will notice that between twenty and thirty thousand records some wait stats increase, that&#8217;s not good. Therefore, for my test environment, the perfect chunk size is twenty thousand records. <\/p>\n<p> \tThere are some calculations we can use together our test results. We can use the log IO limit, the maximum amount of log IO SQL Server will allow to be &#8220;in-flight&#8221;, meaning that SQL Server requested the write operation and is still waiting the acknowledgement. After this limit, all the IO&#8217;s will wait. <\/p>\n<p> \tThe log IO limit is 3840K. We already checked our record size in transaction log, it will be 128 bytes (the insert in the clustered index). 128 * 20.000 = 2.44MB and it&#8217;s important to notice that the real value will be bigger than this, other operations will happen in the log during the transaction, like extent allocations. Calculating with 30.000 * 128 = 3.66MB, it will reach the log IO limit, so the wait stats increase. <\/p>\n<div class=\"indented tips\"> <strong>Side Note &#8211; Why did we leave this optimization for last?<\/strong><\/p>\n<p> Find the balance of log flushes depends on the size of our transaction in the log. On the other hand, the size of our transaction in the log depends on dropping (or not) the indexes of our table.  <\/p><\/div>\n<h1>Seventh Execution<\/h1>\n<p> \tNow that we found the chunk size we should use, let&#8217;s change our batch to use this value: <\/p>\n<pre class=\"lang:tsql theme:ssms2012\">declare @i bigint\ndeclare @icnt int\nselect @icnt=1\nbegin transaction\n\twhile @icnt&lt;=3\n\tbegin\n\t\tselect @i=0\n\t\twhile @i&lt;1032192\n\t\tbegin\n\t\t\twith qry as\n\t\t\t(SELECT\n\t\t\t\tp.ProductID,\n\t\t\t\tp.ProductSubcategoryId,\n\t\t\t\tROW_NUMBER() OVER \n\t\t\t\t(\n\t\t\t\t\tORDER BY\n\t\t\t\t\t\tn.number\n\t\t\t\t) as unitprice\n\t\t\tFROM Adventureworks2012.production.Product (tablockx) AS p\n\t\t\tCROSS APPLY master..spt_values (tablockx) AS n\n\t\t\tWHERE\n\t\t\t\tn.type = 'p')\n\t\t\tINSERT Produtos with (tablockx) \n\t\t\t\t\tselect ProductID,ProductSubCategoryId,UnitPrice\n\t\t\t\t\t from qry where \n\t\t\t\tunitprice &gt;=@i and unitprice &lt;@i+20000 - chunk size\n\n\t\tselect @i=@i+20000 - chunk size\n\t\tend \n\t\tselect @icnt=@icnt+1\n\tend\ncommit transaction\n<\/pre>\n<p> \tLet&#8217;s do the cleanup again and execute our batch: <\/p>\n<ol>\n<li>Drop the database if it already exists<\/li>\n<li>Run the script to create the environment<\/li>\n<li>Clear the wait stats<\/li>\n<li>Clear the SQL Server cache<\/li>\n<li>Clear the performance monitor window<\/li>\n<li>Stop\/start the logFlush XE session<\/li>\n<\/ol>\n<p> \t<strong><em>Seventh execution time: 00:22s<\/em><\/strong> <\/p>\n<h1>Further Improvements<\/h1>\n<p> \tThere are a few more improvements you should test in you environment: <\/p>\n<ul>\n<li>If you are sure your data is already correct, you can disable constraints and re-enable after the batch.<\/li>\n<li>If your table have triggers, should they really be executed during the batch? You can disable the triggers.<\/li>\n<li>Keep track of the wait stats in your server to identify what&#8217;s slowing down the server.<\/li>\n<li>You can check the possibility to use the Bulk-Logged recovery model and use minimally logged operations, like INSERT\/SELECT. However, the restriction for an operation like INSERT\/SELECT work as bulk logged are very restrictive, it will help only in a few cases. You can see more about this in <a href=\"https:\/\/technet.microsoft.com\/en-us\/library\/ms191244(v=sql.105).aspx\">https:\/\/technet.microsoft.com\/en-us\/library\/ms191244(v=sql.105).aspx<\/a><\/li>\n<li>You can use the new feature of SQL Server 2014, delayed durability, to improve the batch performance.<\/li>\n<\/ul>\n<h1>Conclusion<\/h1>\n<p> \tYou saw several ways that we can improve the performance and efficiency of a batch process. If you are developing a regular scheduled batch process, then get the help of a DBA, because DBAs and developers need to work together to make well-performing batches. It is also worth discussing with the DBAs who will be maintaining the database in production how best to keep a logged check of the performance of batch jobs in case circumstances change and batches start to perform poorly in production. <\/p>\n<\/p><\/div>\n","protected":false},"excerpt":{"rendered":"<p>Step-by-step performance tuning of a SQL Server batch process &#8211; identifying and fixing auto-growth events, lock contention, WRITELOG waits, index fragmentation, and log-flush imbalance. Each problem includes the diagnostic technique, the fix, and measured before\/after results.&hellip;<\/p>\n","protected":false},"author":50808,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143531],"tags":[5529,6041,4178,4824,4783,6040,4206,4179,4150,4151,4306,6042,4252],"coauthors":[],"class_list":["post-1992","post","type-post","status-publish","format-standard","hentry","category-t-sql-programming-sql-server","tag-auto-growth","tag-batch-processes","tag-bi","tag-etl","tag-execution-plans","tag-log-size","tag-performance","tag-source-control","tag-sql","tag-sql-server","tag-ssis","tag-t-ql","tag-t-sql-programming"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/1992","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/50808"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=1992"}],"version-history":[{"count":4,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/1992\/revisions"}],"predecessor-version":[{"id":92517,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/1992\/revisions\/92517"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=1992"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=1992"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=1992"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=1992"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}