{"id":77356,"date":"2018-03-06T17:26:10","date_gmt":"2018-03-06T17:26:10","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=77356"},"modified":"2022-05-06T17:31:13","modified_gmt":"2022-05-06T17:31:13","slug":"read-committed-snapshot-isolation-high-version_ghost_record_count","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/performance-sql-server\/read-committed-snapshot-isolation-high-version_ghost_record_count\/","title":{"rendered":"READ COMMITTED SNAPSHOT ISOLATION and High version_ghost_record_count"},"content":{"rendered":"<p>I\u2019ve now experienced three occasions in which I\u2019ve been asked to help to fix a rather mysterious problem with a production Microsoft SQL Server database that was suffering from incidents of performance degradation. The problem was mysterious because it wasn\u2019t caused by one of the typical reasons such as Parameter\u00a0Sniffing or outdated statistics. In these three incidents, it turned out that the problem of performance degradation was due to transactions being held open unnecessarily when the database was running in READ COMMITTED SNAPSHOT ISOLATION (RCSI) mode.<\/p>\n<h2>What is RCSI<\/h2>\n<p>The problem that every database developer must deal with is the handling of concurrency: of ensuring that the many tasks in a Microsoft SQL Server instance do not impede each other. A database in Microsoft SQL Server can handle thousands of simultaneous requests for selecting data and doing data modifications. For the protection of this data, Microsoft SQL Server is using locks to arbitrate the requests for the requested resources. This is especially important where data is changed by one process while it is being read by another. When, for example, a user is editing a product stock level in a warehouse management system, then Microsoft SQL Server must make sure that nobody can read this record at the same time: otherwise the stock level could be out-of-date. Figure 1 illustrates this concept.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"640\" height=\"360\" class=\"wp-image-77358\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/02\/word-image-19.png\" \/><\/p>\n<p class=\"caption\">Figure 1: Protecting the record while it gets changed<\/p>\n<p>Transaction 1 (T1) is changing the record with ID = 10 while Transaction 2 (T2) is trying to read the same record. T1 is using an exclusive lock on the resource to avoid access from T2, which wants to read the uncommitted record data. This behavior is called pessimistic locking and is the expected behavior of Microsoft SQL Server in the default read committed transaction isolation level. As a workaround, Microsoft introduced for the first time, with Microsoft SQL Server 2005, the READ COMMITTED SNAPSHOT ISOLATION (RCSI) level. The word \u2018snapshot\u2019 describes the workaround that allows other requests to read data even if it is locked exclusively.<\/p>\n<p>The other processes access a copy of the locked data which is stored in the Version Store within TEMPDB. The copied data is called a <strong>version_ghost_record<\/strong> and can be monitored by querying the system view <strong>[sys].[dm_db_index_physical_stats]<\/strong>. The value in <strong>version_ghost_record_count<\/strong> returns the number of ghost records retained by an outstanding snapshot isolation transaction in an allocation unit. The copy will be created before T1 starts changing data. The copy represents a valid record as it was before the transaction started as shown in Figure 2.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"640\" height=\"360\" class=\"wp-image-77359\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/02\/word-image-20.png\" \/><\/p>\n<p class=\"caption\">Figure 2: Move a copy of the locked record into the version store in TEMPDB<\/p>\n<p>Many developers are now using RCSI to avoid deadlock situations and long waits for a blocked resource. These problems are generally caused by either implicit or explicit transactions being held open for too long for the workload. Unfortunately, RCSI can be a bad choice for an isolation mechanism in these very circumstances. The following example will demonstrate the strange consequences of using RCSI to fix a locking\/blocking problem that I\u2019ve seen now for the third time in an application.<\/p>\n<h2>Test Environment<\/h2>\n<p>All systems where the problem occurred were using a similar workload. The applications were using small tables for storing new status records and deleting outdated status records. The content of these \u2018rotating data\u2019 tables is used for tracking the flow of work. I\u2019ll use an example from the automotive industry. When a product needs to be assembled on a succession of different workbenches, it will be transported with a conveyor belt to the different operating steps. When the carrier passes a measure point, a new record will be inserted into the table. When the assembly process is finished, it moves to the next workbench and the record is deleted. The following test scenario will simulate this type of workflow as shown in Figure 3.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"640\" height=\"360\" class=\"wp-image-77360\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/02\/word-image-21.png\" \/><\/p>\n<p class=\"caption\">Figure 3: Rotating data content in a status table<\/p>\n<p>The picture shows the single steps of the workload. In a starting stock of records (~1,000 records) a process will insert new records and will then delete the old records from the table.<\/p>\n<h2>Activating RCSI for the demo database<\/h2>\n<p>When a new database is created, RCSI is not activated by default. It must be enabled, and the activation needs to have exclusive access to the database.<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">USE Master;\r\n  GO\r\n  ALTER DATABASE demo_db SET READ_COMMITTED_SNAPSHOT ON;\r\n  GO\r\n  USE demo_db;\r\n  GO<\/pre>\n<h2>Creating a demo table with a few records<\/h2>\n<p>When RCSI has been activated, it is time to create the demo table. You can use the following script to quickly insert a few records into the test table, \u00a0\u00a0<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">CREATE TABLE dbo.demo_table\r\n  (\r\n      ID    INT       NOT NULL    IDENTITY (1, 1),\r\n      C1    CHAR(100) NOT NULL\r\n  );\r\n  GO\r\n   \r\n  INSERT INTO dbo.demo_table (C1)\r\n  SELECT TOP (1000)\r\n         CAST(TEXT AS CHAR(100)) AS C1\r\n  FROM   sys.messages\r\n  WHERE  language_id = 1031;\r\n  GO\r\n    \r\n  CREATE UNIQUE CLUSTERED INDEX cuix_demo_table_Id\r\n  ON dbo.demo_table (Id);\r\n  GO<\/pre>\n<p>You can check the efficient access pattern with the following query It will show an INDEX SCAN which uses 2 I\/Os. Don\u2019t worry about the SCAN; the query is using a TOP operator to limit the output to one (last) record.<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">SET STATISTICS IO ON;\r\n  GO\r\n  SELECT MAX(Id) FROM dbo.demo_table;<\/pre>\n<p>Figure 4 shows the plan.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"811\" height=\"311\" class=\"wp-image-77361\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/02\/word-image-22.png\" \/><\/p>\n<p class=\"caption\">Figure 4: Efficient data access by usage of an index<\/p>\n<p>Both the number of data pages that are used and the number of records are easily understandable, and so, the experiment can start.<\/p>\n<h2>Prerequisites<\/h2>\n<p>Before the workload starts, a few prerequisites are required to measure the increasing value of <strong>version_ghost_records<\/strong> while the transaction is up and running. You can use different approaches to fulfill this requirement:<\/p>\n<ul>\n<li>Create your own routine within the running transaction which saves each n seconds the number value in a temporary table<\/li>\n<li>Monitor the increasing value with PERFMON<\/li>\n<li>Use a professional monitoring tools like Redgate SQL Monitor\u00ae<\/li>\n<\/ul>\n<p>In this demo I\u2019ll use PERFMON to track the number of version records in the table. SQL Server has not implemented counters for this requirement, so you have to define your own counters for PERFMON.<\/p>\n<h2>User Settable Objects<\/h2>\n<p>Microsoft SQL Server provides a User Settable object which allows you to create custom counter instances. Custom counter instances can be used to monitor aspects of the server not monitored by existing counters, such as components unique to your SQL Server database.<\/p>\n<p>The User Settable object contains 10 instances of the query counter: User counter 1 through User counter 10. These counters map to the SQL Server stored procedures <strong>sp_user_counter1<\/strong> through <strong>sp_user_counter10<\/strong>. As these stored procedures are executed by user applications, the values set by the stored procedures are displayed in System Monitor.<\/p>\n<div class=\"note\">\n<p>Note: The user counters are not polled automatically by Performance Monitor. They must be explicitly executed by a user application for the counter values to be updated (e.g. a SQL Server Agent Job)!<\/p>\n<\/div>\n<h2>User Settable Counter for version_ghost_record_count<\/h2>\n<p>The basic information about the number of <strong>[version_ghost_record_count]<\/strong> is in the system function <strong>[sys].[dm_db_index_physical_stats]<\/strong>. Unfortunately, this DMF has to run in <strong>DETAILED<\/strong> modus. This means that Microsoft SQL Server has to go through ALL allocated pages of the index to evaluate the required information!<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">SELECT\tversion_ghost_record_count\r\n  FROM\tsys.dm_db_index_physical_stats\r\n  \t(\r\n  \t\tDB_ID(),\r\n  \t\tOBJECT_ID(N'dbo.demo_table', N'U'),\r\n  \t\t1,\r\n  \t\tNULL,\r\n  \t\tN'DETAILED'\r\n  \t)\r\n  WHERE\tindex_level = 0<\/pre>\n<p>To bind this information to a User Settable Object it needs to be stored in a variable which then will be pushed to the PERFMON application.<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">-- Add a user definied counter for the monitoring in PERFMON\r\n  DECLARE\t@cache_size BIGINT;\r\n  SET\t\t@cache_size =\r\n  \t\t(\r\n  \t\t\tSELECT\tversion_ghost_record_count\r\n  \t\t\tFROM\tsys.dm_db_index_physical_stats\r\n  \t\t\t\t(\r\n  \t\t\t\t\tDB_ID(),\r\n  \t\t\t\t\tOBJECT_ID(N'dbo.demo_table', N'U'),\r\n  \t\t\t\t\t1,\r\n  \t\t\t\t\tNULL,\r\n  \t\t\t\t\tN'DETAILED'\r\n  \t\t\t\t)\r\n  \t\t\tWHERE\tindex_level = 0\r\n  \t\t);\r\n  EXEC\tsp_user_counter1 @cache_size;<\/pre>\n<h2>SQL Server Agent Job for Population of Counter Value<\/h2>\n<p>Due to the requirement that PERFMON cannot actively pull the information from SQL Server an application process must provide the data. Here a SQL Server Agent job which runs every 10 seconds will help to populate the number of <strong>version_ghost_records<\/strong>. Figures 5, 6, and 7 show the job properties.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1380\" height=\"1250\" class=\"wp-image-77362\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/02\/word-image-23.png\" \/><\/p>\n<p class=\"caption\">Figure 5: Create a SQL Server Agent Job for population of the counter value<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1380\" height=\"1250\" class=\"wp-image-77363\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/02\/word-image-24.png\" \/><\/p>\n<p class=\"caption\">Figure 6: The job executes a query against sys.dm_db_index_physical_stats<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1324\" height=\"1164\" class=\"wp-image-77364\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/02\/word-image-25.png\" \/><\/p>\n<p class=\"caption\">Figure 7: The scheduler will update the counter value every 10 seconds<\/p>\n<h2>Setup PERFMON<\/h2>\n<p>To monitor the increasing value of version ghost records the next step is the preparation of PERFMON to show the development of version ghost records in a graph. Therefore, the User Settable counter must be added to the main Performance Monitor graph.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1420\" height=\"1142\" class=\"wp-image-77365\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/02\/word-image-26.png\" \/><\/p>\n<p class=\"caption\">Figure 8: Add the user counter for monitoring<\/p>\n<h2>Starting the workload<\/h2>\n<p>The next script will start and simulates the workload that I\u2019ve suggested as an example. Please note that an explicit transaction is opened, and then it starts a continuous loop in which it will first insert a new record at the end of the table. After waiting for 10ms, it then deletes the oldest record from the table. The special feature is that open transaction. It never gets closed! That was exactly the problem in all 3 observed scenarios!<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">-- We start our workload \r\n  SET NOCOUNT ON;\r\n  GO\r\n  BEGIN TRANSACTION;\r\n  GO\r\n  \t-- Insert new record into dbo.demo_table\r\n  \tDECLARE\t@finish_date DATETIME2(0) = DATEADD(MINUTE, 5, GETDATE());\r\n  \tWHILE @finish_date &gt;= GETDATE()\r\n  \tBEGIN\r\n  \t\t-- wait 10 ms before each new process\r\n  \t\tINSERT INTO dbo.demo_table(C1)\r\n  \t\tSELECT C1\r\n  \t\tFROM   dbo.demo_table\r\n  \t\tWHERE  Id = (SELECT MIN(Id) FROM dbo.demo_table);\r\n    \r\n  \t\t-- Wait 10 ms to delete the first record from the table\r\n  \t\tWAITFOR DELAY '00:00:00:010';\r\n    \r\n  \t\t-- Now select the min record from the table\r\n  \t\tDELETE dbo.demo_table WHERE Id = (SELECT MIN(Id) FROM dbo.demo_table);\r\n  \tEND\r\n  ROLLBACK TRAN;\r\n  GO<\/pre>\n<p>The above workload creates a rotating system. The workload is running for five minutes in my demo. While it is running I reproduce, in a different query window and process, the SELECT-statements. I do these in two different ways. The first statement is using RCSI while the second query is reading data with READ UNCOMMITTED isolation level.<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">SET STATISTICS IO ON;\r\n  SET NOCOUNT ON;\r\n  GO\r\n  PRINT 'I\/O with RCSI...'\r\n  SELECT MAX(ID) FROM dbo.demo_table;\r\n  GO\r\n    \r\n  PRINT 'I\/O with READ UNCOMMITTED...'\r\n  SELECT MAX(ID) FROM dbo.demo_table WITH (NOLOCK);\r\n  GO\r\n  SET STATISTICS IO OFF;\r\n  GO<\/pre>\n<p>Figure 9 shows the logical reads of the two statements.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1487\" height=\"332\" class=\"wp-image-77366\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/02\/word-image-27.png\" \/><\/p>\n<p class=\"caption\">Figure 9: Different IO with READ COMMITTED SNAPSHOT ISOLATION<\/p>\n<h2>Analysis of the workload<\/h2>\n<p>The first output will show a max value of 1,000 (the value before the transaction started) while the second one will show a much higher max value (from the uncommitted data). Furthermore, the simple aggregation query produces 164 page reads while the second query only read 2 data pages! Within five minutes of tests the number of allocated data pages have changed dramatically.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"2673\" height=\"1046\" class=\"wp-image-77367\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/02\/word-image-28.png\" \/><\/p>\n<p class=\"caption\">Figure 10: Number of version_ghost_record_count which have been produces in 5 minutes<\/p>\n<p>The screenshot demonstrates the quick rise of ghost versioned records in the affected table over the period of five minutes. The maximum number of ghost versioned records in this example were ~14,000 rows; one of the affected systems was running in an open transaction for more than three months!<\/p>\n<h2>The Reason for this Behavior<\/h2>\n<p>The reason for this strange behavior is quick and easy to explain. The workload for a \u2018rotating system\u2019 is permanently creating new records that are used within the open transaction. The aggregation function must consider every single change in the version store to return a valid value.<\/p>\n<p>The second query is running in <strong>READ UNCOMMITTED<\/strong> isolation level and will not use RCSI (which is only usable in <strong>READ COMMITTED<\/strong> isolation level) and will use dirty reads.<\/p>\n<h2>Solution for this Problem<\/h2>\n<p>In all the three incidents of this problem that I\u2019ve looked at, the bug was not in Microsoft SQL Server but in the development of the implemented workload. Because the transaction has been started explicitly, it stood open for the whole time. The version store cannot be cleared out, and the number of \u2018copies\u2019 of the data are increasing.<\/p>\n<p>As a developer, you must make sure that explicit transactions will be short and will be closed immediately after the process is finished. At the point when the transaction is committed \/ closed, all data pages from the version store will be released immediately, and the performance degradation disappears.<\/p>\n<h2>Advice<\/h2>\n<p>It is good to set up a permanent way of monitoring the number of <strong>version_ghost_records<\/strong> in a database, especially if RCSI for a database is activated. A high count for these records is just the sort of stress condition to be alerted about, because it indicates a potential slow-down in database performance due to the version store in <strong>TEMPDB<\/strong> getting too large, and causing queries against the table to suffer from performance degradation. You might think PERFMON is way too complex for handling and implementation of such a solution? If you want to store and analyze the evolution of counters over longer periods it might be useful to use professional tools like <a href=\"https:\/\/www.red-gate.com\/products\/dba\/sql-monitor\/?utm_source=bing&amp;utm_term=redgate%20sql%20monitor&amp;utm_campaign=BS+%7C+US+%7C+Brand+Product+%7C+SQL+Monitor+%7C+Exact&amp;utm_medium=cpc&amp;utm_content=r4wpwR2M|pcrid|16478111668|pkw|redgate%20sql%20monitor|pmt|be|pdv|c|\">Redgate SQL Monitor<\/a> for these observations. To learn how to add an alert to SQL Monitor for this issue, check out this <a href=\"https:\/\/www.red-gate.com\/hub\/product-learning\/sql-monitor\/using-sql-monitor-detect-problems-databases-use-snapshot-based-transaction-isolation\">article<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Developer shops may decide to use Read Committed Snapshot Isolation (RCSI) to reduce contention, and possibly improve performance, but it is not without its own pitfalls. In this article, Uwe Ricken describes a situation where long running transactions caused a severe performance degradation when RCSI was in use.&hellip;<\/p>\n","protected":false},"author":205038,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143529],"tags":[5842],"coauthors":[19794],"class_list":["post-77356","post","type-post","status-publish","format-standard","hentry","category-performance-sql-server","tag-sql-monitor"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/77356","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/205038"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=77356"}],"version-history":[{"count":6,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/77356\/revisions"}],"predecessor-version":[{"id":77768,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/77356\/revisions\/77768"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=77356"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=77356"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=77356"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=77356"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}