{"id":1339,"date":"2012-05-21T00:00:00","date_gmt":"2012-05-21T00:00:00","guid":{"rendered":"https:\/\/test.simple-talk.com\/uncategorized\/a-first-look-at-sql-server-2012-availability-group-wait-statistics\/"},"modified":"2021-08-24T13:40:10","modified_gmt":"2021-08-24T13:40:10","slug":"a-first-look-at-sql-server-2012-availability-group-wait-statistics","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/database-administration-sql-server\/a-first-look-at-sql-server-2012-availability-group-wait-statistics\/","title":{"rendered":"A first look at SQL Server 2012 Availability Group Wait Statistics"},"content":{"rendered":"<div id=\"pretty\">\n<p class=\"start\">Now that SQL Server 2012 is released, we can expect to start seeing new and unfamiliar wait types appear in the output of <b>sys.dm_os_wait_stats<\/b>, <b>sys.dm_os_waiting_tasks<\/b> and extended event sessions. For those unfamiliar with wait types and wait statistics, the key concept to understand is that the SQL Server engine was designed to track areas where a task may need to wait for &#8220;something&#8221;. That &#8220;something&#8221; may be a resource, synchronization activity or queue. For example, a task may have to wait to acquire an Update lock on a data page because another task has already acquired it and is still holding the resource. When this occurs, that waiting task is moved off the scheduler and set to a waiting state along with an associated wait type, which in this example would be &#8220;LCK_M_U&#8221;. Once the resource becomes available, the task is signaled that the resource is available and the task then gets back in line on the scheduler. Once it has its scheduler time, it acquires the resource and continues with its work. A single task may encounter multiple waits during its lifetime, cycling repeatedly through a running, waiting and runnable state until complete.<\/p>\n<div class=\"note\">\n<p class=\"note\"><i><strong>Note<\/strong>: For an in depth overview of wait statistics, see Tom Davidson&#8217;s white paper, &#8220;<a href=\"http:\/\/sqlcat.com\/sqlcat\/b\/whitepapers\/archive\/2007\/11\/19\/sql-server-2005-waits-and-queues.aspx\">SQL Server 2005 Waits and Queues<\/a>&#8220;. While this paper was written for SQL Server 2005, it is still very relevant to SQL Server 2012.<\/i><\/p>\n<\/div>\n<p>As of SQL Server 2012 CU1, SQL Server tracks 649 different wait types. Not all wait types are relevant &#8211; with many wait types representing the accumulation of normal background tasks. While some wait types can be ignored, other wait types should indeed be paid attention to in your overall performance troubleshooting efforts. It&#8217;s best to treat wait statistics as a <i>pointer<\/i> for further research. Think of wait statistics as the initial symptoms that help you efficiently direct your next steps instead of blindly searching for possible culprits.<\/p>\n<p>Of the new wait types, there are at least 50 related to AlwaysOn Availability Groups. AlwaysOn Availability Groups are one of the more prominent new feature areas added to SQL Server 2012, adding new capabilities for both high availability and disaster recovery. If there was a performance issue related to availability groups, what wait types would we expect to see? For example, if transactions on the primary replica were slow to commit due to disk latency issues on the secondary synchronous replica, what wait type would accumulate? This article will explore the answers to this question.<\/p>\n<p>With so many variations and potential deployment configurations, consider this to be a first look, rather than a complete detailing of every possible pattern. For this article, I&#8217;m starting off with the following configuration:<\/p>\n<ul>\n<li>There will be one availability group with three availability databases  <\/li>\n<li>There are two availability replicas (primary, secondary)  <\/li>\n<li>I&#8217;m starting off with synchronous commit on both replicas and a manual failover mode between the two  <\/li>\n<li>All connections are allowed in the primary role  <\/li>\n<li>Both replicas are configured to be readable secondaries<\/li>\n<\/ul>\n<p>In order to show the application&#8217;s perspective, I&#8217;ll be measuring wait statistics for a specific load-driving session. In this case, I&#8217;m tracking wait statistics for session id 56:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">CREATE EVENT SESSION App_AG_Waits \nON SERVER \nADD EVENT sqlos.wait_info( \n&#160;&#160;&#160;&#160;&#160; ACTION (sqlserver.session_id) \n&#160;&#160;&#160; WHERE ([package0].[equal_uint64]([sqlserver].[session_id],(56)))) \nADD TARGET package0.asynchronous_file_target \n&#160;&#160;&#160;&#160;&#160; (SET FILENAME = N'C:\\temp\\App_AG_Stats.xel', \n&#160;&#160;&#160;&#160;&#160; &#160;METADATAFILE = N'C:\\temp\\App_AG_Stats.xem'); \nGO\n<\/pre>\n<p>Prior to measuring the wait type distribution for each test, I started off the extended events session as follows:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">ALTER EVENT SESSION App_AG_Waits  ON SERVER STATE = START; \nGO\n<\/pre>\n<p>After running my test, I stop the extended event session and then parse it out into a summarized wait stats accumulation result set for the specific session id:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">ALTER EVENT SESSION App_AG_Waits \nON SERVER STATE = STOP; \nGO\n\nSELECT CAST(event_data as XML) event_data \nINTO #App_AG_Stage_1 \nFROM sys.fn_xe_file_target_read_file \n&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; ('C:\\temp\\*.xel', NULL, NULL, NULL); \n\nSELECT \n&#160;&#160;&#160;&#160;&#160; event_data.value \n&#160;&#160;&#160;&#160;&#160; ('(\/event\/action[@name=''session_id'']\/value)[1]', 'smallint') as session_id, \n&#160;&#160;&#160;&#160;&#160; event_data.value \n&#160;&#160;&#160;&#160;&#160; ('(\/event\/data[@name=''wait_type'']\/text)[1]', 'varchar(100)') as wait_type, \n&#160;&#160;&#160;&#160;&#160; event_data.value \n&#160;&#160;&#160;&#160;&#160; ('(\/event\/data[@name=''duration'']\/value)[1]', 'bigint') as duration, \n&#160;&#160;&#160;&#160;&#160; event_data.value \n&#160;&#160;&#160;&#160;&#160; ('(\/event\/data[@name=''signal_duration'']\/value)[1]', 'bigint') as signal_duration \nINTO #App_AG_Stage_2 \nFROM #App_AG_Stage_1; \n\n-- Final result set\nSELECT&#160;&#160;&#160;&#160;&#160; session_id, \n&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; wait_type, \n&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; SUM(duration) total_duration, \n&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; SUM(signal_duration) total_signal_duration\nFROM #App_AG_Stage_2 \nGROUP BY session_id, \n&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; wait_type \nORDER BY session_id, \n&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; SUM(duration) DESC; \nGO\n<\/pre>\n<p>I was also careful to delete the XEL files after each test, so that I don&#8217;t accidently accumulate the combined set of current and past tests.<\/p>\n<p>Additionally, I measured the aggregated wait statistics on the primary and secondary, both before and after each test: Nothing fancy &#8211; just a delta to see what wait stats accumulated across <i>all <\/i>sessions on both sides. Before each test, I executed (in SQLCMD mode) the following:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">-- Primary replica\n:CONNECT SQLFCIPRIMARY\\DC1 \nSELECT wait_type, wait_time_ms, signal_wait_time_ms \nINTO tempdb.dbo.before_waits \nFROM sys.dm_os_wait_stats; \nGO\n\n-- Secondary replica\n:CONNECT SQLFCIDR\\DC2 \nSELECT wait_type, wait_time_ms, signal_wait_time_ms \nINTO tempdb.dbo.before_waits \nFROM sys.dm_os_wait_stats; \nGO\n<\/pre>\n<p>Then after each test I executed the following:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">-- Primary replica\n:CONNECT SQLFCIPRIMARY\\DC1 \nSELECT wait_type, wait_time_ms, signal_wait_time_ms \nINTO tempdb.dbo.after_waits \nFROM sys.dm_os_wait_stats; \n\nSELECT 'Primary replica:' ReportNM; \n\nSELECT&#160;&#160;&#160;&#160;&#160; b.wait_type, \n&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; a.wait_time_ms - b.wait_time_ms wait_time_ms, \n&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; a.signal_wait_time_ms - b.signal_wait_time_ms signal_time_ms \nFROM tempdb.dbo.before_waits b \nINNER JOIN tempdb.dbo.after_waits a ON \n&#160;&#160;&#160;&#160;&#160; b.wait_type = a.wait_type \nWHERE a.wait_time_ms &gt; b.wait_time_ms \nORDER BY a.wait_time_ms - b.wait_time_ms DESC; \n\nDROP TABLE tempdb.dbo.before_waits; \nDROP TABLE tempdb.dbo.after_waits; \nGO\n\n:CONNECT SQLFCIDR\\DC2 \n-- Secondary replica\nSELECT wait_type, wait_time_ms, signal_wait_time_ms \nINTO tempdb.dbo.after_waits \nFROM sys.dm_os_wait_stats; \n\nSELECT 'Secondary replica:' ReportNM; \n\nSELECT&#160;&#160;&#160;&#160;&#160; b.wait_type, \n&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; a.wait_time_ms - b.wait_time_ms wait_time_ms, \n&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; a.signal_wait_time_ms - b.signal_wait_time_ms signal_time_ms \nFROM tempdb.dbo.before_waits b \nINNER JOIN tempdb.dbo.after_waits a ON \n&#160;&#160;&#160;&#160;&#160; b.wait_type = a.wait_type \nWHERE a.wait_time_ms &gt; b.wait_time_ms \nORDER BY a.wait_time_ms - b.wait_time_ms DESC; \n\nDROP TABLE tempdb.dbo.before_waits; \nDROP TABLE tempdb.dbo.after_waits; \nGO\n<\/pre>\n<p>Keep in mind that I&#8217;m not going into much detail on the hardware specifications, network setup, I\/O subsystem, etc. The goal of these initial tests is to smoke out new wait types and begin formulating patterns based on specific scenarios. I fully intend to expand out the list of patterns over time. Now, without further ado, here are a few test scenarios and the associated wait statistics. <\/p>\n<h2>Test: No User Activity, Synchronous Mode<\/h2>\n<p>In the first test, I&#8217;m just interested in seeing what background processes are waiting for something to do. I had synchronous mode configured for the availability group replicas.<\/p>\n<p>What wait stats are associated with just regular availability group <i>background<\/i> processes? I ran a simple WAITFOR DELAY of 30 seconds.<\/p>\n<p>My session accumulated waits, as expected, showed no waits related to availability groups:<\/p>\n<table class=\"MsoTableLightShading\">\n<tbody>\n<tr>\n<td valign=\"top\">\n<p><b>Wait Type<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p><b>Duration<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p><b>Signal Duration<\/b><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p><b>WAITFOR<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">30008<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">0<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p><b>WRITELOG<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">1<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">0<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p><b>XE_BUFFERMGR_ALLPROCESSED_EVENT<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">0<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">0<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Regarding the primary replica itself, here was what accumulated during the 30 seconds (I filtered out some of the known background wait types as well &#8211; in order to keep the noise down on the results):<\/p>\n<table class=\"MsoTableLightShading\">\n<tbody>\n<tr>\n<td valign=\"top\">\n<p><b>Wait Type<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p><b>Duration<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p><b>Signal Duration<\/b><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p><b>HADR_WORK_QUEUE<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">137533<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">139<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p><b>HADR_FILESTREAM_IOMGR_IOCOMPLETION<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">30653<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">7<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p><b>LOGMGR_QUEUE<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">30644<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">1<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p><b>HADR_TIMER_TASK<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">30522<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">39<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p><b>HADR_CLUSAPI_CALL<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">30498<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">0<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p><b>HADR_LOGCAPTURE_WAIT<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">15060<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">2<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p><b>HADR_SYNC_COMMIT<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">5<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">0<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>So this gives us an idea of background, queue related wait types that increment independently of activity in the various availability databases within an availability group.<\/p>\n<p>What about the secondary replica?<\/p>\n<table class=\"MsoTableLightShading\">\n<tbody>\n<tr>\n<td valign=\"top\">\n<p><b>Wait Type<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p><b>Duration<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p><b>Signal Duration<\/b><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p><b>HADR_WORK_QUEUE<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">122073<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">49<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p><b>HADR_CLUSAPI_CALL<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">30994<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">0<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p><b>LOGMGR_QUEUE<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">30820<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">2<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p><b>HADR_FILESTREAM_IOMGR_IOCOMPLETION<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">30775<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">4<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p><b>HADR_TIMER_TASK<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">30433<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">7<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>We see the same background accumulation, but without the HADR_LOGCAPTURE_WAIT and HADR_SYNC_COMMIT.<\/p>\n<h2>Test: Synchronous Mode and Singleton INSERT Operations<\/h2>\n<p>In the first test, I&#8217;m performing a repeating INSERT operation against the AppDB database that participates in the availability group and is in synchronous mode. The test was a simple, singleton INSERT performed 10,000 times on the primary replica:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">USE AppDB; \nGO\n\nCREATE TABLE [dbo].[category]( \n&#160;&#160;&#160;&#160;&#160; [category_no] int IDENTITY(1,1) NOT NULL, \n&#160;&#160;&#160;&#160;&#160; [category_desc] nvarchar(200) NOT NULL, \n&#160;&#160;&#160;&#160;&#160; [category_code] nchar(10) NOT NULL, \n&#160;CONSTRAINT [category_ident] PRIMARY KEY CLUSTERED \n(\n&#160;&#160;&#160;&#160;&#160; [category_no] ASC \n)) ON [PRIMARY]; \n\nGO\n\nINSERT [dbo].[category] \n(category_desc, category_code) \nVALUES\n(CAST(NEWID() as nvarchar(200)), \n&#160;LEFT(CAST(NEWID() as nvarchar(36)),10)); \nGO 10000\n<\/pre>\n<p>After stopping the session, the aggregated primary replica wait stats for the session were as follows:<\/p>\n<table class=\"MsoTableLightShading\">\n<tbody>\n<tr>\n<td valign=\"top\">\n<p><b>Wait Type<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p><b>Duration<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p><b>Signal Duration<\/b><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p><b>HADR_SYNC_COMMIT<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">17408<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">18<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p><b>XE_BUFFERMGR_ALLPROCESSED_EVENT<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">18<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">0<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p><b>WRITELOG<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">0<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">0<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p><b>SOS_SCHEDULER_YIELD<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">0<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">0<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>In this case, our obvious standout is HADR_SYNC_COMMIT. In this guess, I&#8217;d assume this is the accumulated wait time waiting for the synchronous replica acknowledgement before committing the transaction. Books Online defines it as &#8220;<i>Waiting for transaction commit processing for the synchronized secondary databases to harden the log. This wait is also reflected by the Transaction Delay performance counter. This wait type is expected for synchronized availability groups and indicates the time to send, write, and acknowledge log to the secondary databases<\/i>.&#8221;<\/p>\n<p>What are the top waits (by duration) on the primary replica that were not background-based?<\/p>\n<table class=\"MsoTableLightShading\">\n<tbody>\n<tr>\n<td valign=\"top\">\n<p><b>Wait Type<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p><b>Duration<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p><b>Signal Duration<\/b><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p><b>HADR_NOTIFICATION_DEQUEUE<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">563205<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">0<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p><b>HADR_SYNC_COMMIT<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">19971<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">97<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>We see the HADR_SYNC_COMMIT as expected &#8211; but also a new wait type, the HADR_NOTIFICATION_DEQUEUE. I wondered if I would see this for asynchronous mode. Books Online defines HADR_NOTIFICATION_DEQUEUE as a &#8220;<i>background task that processes Windows Server Failover Clustering notifications is waiting for the next notification. Internal use only.<\/i>&#8220;<\/p>\n<p>I saw the following on the secondary replica:<\/p>\n<table class=\"MsoTableLightShading\">\n<tbody>\n<tr>\n<td valign=\"top\">\n<p><b>Wait Type<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p><b>Duration<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p><b>Signal Duration<\/b><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p><b>HADR_NOTIFICATION_DEQUEUE<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">563139<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">0<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>HADR_NOTIFICATION_DEQUEUE has near-identical duration to the primary &#8211; and no wait time for HADR_SYNC_COMMIT, so it seems to be a wait type for synchronous mode based on a primary replica.<\/p>\n<h2>Test: Asynchronous Mode and Singleton INSERT Operations<\/h2>\n<p>This dovetails in to the next test, where, basically, I just repeated the first test, deleting all rows from the table first, but this time using asynchronous mode between the replicas.<\/p>\n<p>The results on the primary replica were as follows:<\/p>\n<table class=\"MsoTableLightShading\">\n<tbody>\n<tr>\n<td valign=\"top\">\n<p><b>Session ID<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p><b>Wait Type<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p><b>Duration<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p><b>Signal Duration<\/b><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p class=\"MsoNormal\"><b>56<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p>WRITELOG<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">5029<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">96<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p class=\"MsoNormal\"><b>56<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p>XE_BUFFERMGR_ALLPROCESSED_EVENT<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">7<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">0<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>There were no wait stats generated that were related to the availability group, only those related to the transaction log and extended event session buffer flushes to targets.<\/p>\n<p>As for the primary and secondary replicas, I saw non-background wait values for HADR_NOTIFICATION_DEQUEUE, both hovering around the same wait time value as with the previous test.<\/p>\n<h2>Test: Asynchronous Mode and Large Transaction Operations<\/h2>\n<p>This test arose from the question &#8211; &#8220;can I push enough volume in asynchronous mode to get an availability group non-background related wait stat to accumulate?&#8221; To attempt this, I ran the following test:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">INSERT [dbo].[category] \n(category_desc, category_code) \nVALUES\n(CAST(NEWID() as nvarchar(200)), \n&#160;LEFT(CAST(NEWID() as nvarchar(36)),10)); \nGO 1000\n\nINSERT [dbo].[category] \n(category_desc, category_code) \nSELECT category_desc, category_code \nFROM [dbo].[category]; \nGO 10\n<\/pre>\n<p>This test did an INSERT SELECT based on the rows in category, executing 10 times total and thereby doubling the result set on each execute. <\/p>\n<p>The results on the primary replica were as follows (top 4):<\/p>\n<table class=\"MsoTableLightShading\">\n<tbody>\n<tr>\n<td valign=\"top\">\n<p><b>Session ID<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p><b>Wait Type<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p><b>Duration<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p><b>Signal Duration<\/b><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p class=\"MsoNormal\"><b>56<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p>PAGEIOLATCH_EX<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">7913<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">11<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p class=\"MsoNormal\"><b>56<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p>SOS_SCHEDULER_YIELD<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">3888<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">3875<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p class=\"MsoNormal\"><b>56<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p>WRITELOG<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">596<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">114<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p class=\"MsoNormal\"><b>56<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p>WRITE_COMPLETION<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">373<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">0<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>As for the primary and secondary replicas, I saw non-background wait values for HADR_NOTIFICATION_DEQUEUE, both around the same wait time value. There were only familiar faces here, so I moved on to some other tests, but I&#8217;ll be returning to this at another time.<\/p>\n<h2>Test: Synchronous Mode and Querying the Readable Secondary<\/h2>\n<p>I was interested to see what wait statistics might be generated around read-only queries on the secondary. To prepare this test, I loaded the table with 100,000 rows, failed-over the replica so that it switched to being secondary and then executed a simple SELECT:<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">-- Failover to the secondary\n:CONNECT SQLFCIDR\\DC2 \nALTER AVAILABILITY GROUP [AG1] FAILOVER; \nGO\n\n:CONNECT SQLFCIPRIMARY\\DC1 \nUSE AppDB; \n\nSELECT category_no, category_desc, category_code \nFROM [dbo].[category]; \nGO 10\n<\/pre>\n<p>The results were not particularly interesting. The session executing the SELECT had associated PAGELATCH_SH and PAGELATCH_EX accumulated wait time, but there was no wait statistics for the availability group; background related wait statistics.<\/p>\n<table class=\"MsoTableLightShading\">\n<tbody>\n<tr>\n<td valign=\"top\">\n<p><b>Wait Type<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p><b>Total Duration<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p><b>Total Signal Duration<\/b><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p><b>PAGELATCH_SH<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">33<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">33<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p><b>PAGELATCH_EX<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">16<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">16<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>What if we had concurrent writes on the primary? Would we see different SELECT wait statistics related to availability groups?<\/p>\n<h2>Test: Synchronous Mode and Querying the Readable Secondary with Concurrent Writes<\/h2>\n<p>I repeated the prior test, but this time with concurrent INSERT operations running on the primary replica (this ran concurrently to the SELECT statement used in the previous test):<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">INSERT [dbo].[category] \n(category_desc, category_code) \nVALUES\n(CAST(NEWID() as nvarchar(200)), \n&#160;LEFT(CAST(NEWID() as nvarchar(36)),10)); \nGO 1000\n\nINSERT [dbo].[category] \n(category_desc, category_code) \nSELECT category_desc, category_code \nFROM [dbo].[category]; \nGO 10000\n<\/pre>\n<p>After running the SELECT workload, here is what was returned. As I expected, the query runtime increased compared to the previous test due to the significant concurrent data modification activity on the primary replica:<\/p>\n<table class=\"MsoTableLightShading\">\n<tbody>\n<tr>\n<td valign=\"top\">\n<p><b>Wait Type<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p><b>Total Duration<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p><b>Total Signal Duration<\/b><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p><b>SOS_SCHEDULER_YIELD<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">23966<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">23874<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p><b>NETWORK_IO<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">8389<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">1068<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p><b>LCK_M_S<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">210<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">6<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p><b>PAGEIOLATCH_SH<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">134<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">2<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>The wait types associated with the SELECT operation on the secondary replica were ones we&#8217;ve had in previous versions of SQL Server. No additional wait stats accumulated on the primary that weren&#8217;t related to the background tasks &#8211; however the secondary replica did show the following:<\/p>\n<table class=\"MsoTableLightShading\">\n<tbody>\n<tr>\n<td valign=\"top\">\n<p><b>Wait Type<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p><b>Total Duration<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p><b>Total Signal Duration<\/b><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p><b>HADR_SYNC_COMMIT<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">1870<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">1<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2>Test: Asynchronous Mode and Singleton INSERT Operations with an Offline Secondary<\/h2>\n<p>I next wanted to test my earlier singleton INSERT test in asynchronous mode with a disconnected replica. I took the secondary replica SQL Server service offline. The below figure just shows the Availability Dashboard &#8211; and shows a &#8220;not synchronizing&#8221; for the secondary:<\/p>\n<p class=\"illustration\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/imported\/1493-JoeSack1.jpg\" alt=\"1493-JoeSack1.jpg\" \/><\/p>\n<p>The results of the test were as follows:<\/p>\n<table class=\"MsoTableLightShading\">\n<tbody>\n<tr>\n<td valign=\"top\">\n<p><b>Wait Type<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p><b>Duration<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p><b>Signal Duration<\/b><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p><b>PAGEIOLATCH_EX<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">554<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">1<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p><b>WRITELOG<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">202<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">2<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p><b>SOS_SCHEDULER_YIELD<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">192<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">190<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p><b>LOGBUFFER<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">36<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">0<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p><b>IO_COMPLETION<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">19<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">0<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p><b>WRITE_COMPLETION<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">17<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">0<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>There were no types associated with the availability groups &#8211; and that was instance-wide (no accumulation outside of the availability group background related tasks).<\/p>\n<h2>Test: Synchronous Mode and Singleton INSERT Operations with an Offline Secondary<\/h2>\n<p>The next test was for another singleton INSERT test in synchronous mode, this time with a disconnected replica. I took the secondary replica SQL Server service offline, just as I did with the previous test. <\/p>\n<p>The results were as follows:<\/p>\n<table class=\"MsoTableLightShading\">\n<tbody>\n<tr>\n<td valign=\"top\">\n<p><b>Wait Type<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p><b>Duration<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p><b>Signal Duration<\/b><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p><b>HADR_SYNC_COMMIT<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">11057<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">4<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p><b>SOS_SCHEDULER_YIELD<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">1043<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">1037<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p><b>HADR_SYNCHRONIZING_THROTTLE<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">827<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">36<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p><b>WRITELOG<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">583<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">31<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p><b>SLEEP_TASK<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">12<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">1<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>One interesting type emerged, HADR_SYNCHRONIZING_THROTTLE. Books Online defines this as &#8220;Waiting for transaction commit processing to allow a synchronizing secondary database to catch up to the primary end of log in order to transition to the synchronized state. This is an expected wait when a secondary database is catching up.&#8221; <\/p>\n<p>And looking at the non-background tasks for the primary replica itself, I saw the following:<\/p>\n<table class=\"MsoTableLightShading\">\n<tbody>\n<tr>\n<td valign=\"top\">\n<p><b>Wait Type<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p><b>Duration<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p><b>Signal Duration<\/b><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p><b>HADR_NOTIFICATION_DEQUEUE<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">49903<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">0<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p><b>HADR_SYNC_COMMIT<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">11075<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">5<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p><b>HADR_SYNCHRONIZING_THROTTLE<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">889<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">59<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p><b>HADR_REPLICAINFO_SYNC<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">465<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">0<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p><b>HADR_AG_MUTEX<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">82<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">1<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p><b>HADR_LOGCAPTURE_SYNC<\/b><\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">73<\/p>\n<\/td>\n<td valign=\"top\">\n<p class=\"MsoNormal\">72<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>New wait types included HADR_REPLICAINFO_SYNC, HADR_LOGCAPTURE_SYNC and HADR_AG_MUTEX. I had not seen them before in my testing. I wondered if HADR_AG_MUTEX was similar to DBMIRROR_DBM_MUTEX &#8211; indicating contention for the send buffer contention. Looking up the definition in Books Online, I see that this wait state was defined as occurring &#8220;when an AlwaysOn DDL statement or Windows Server Failover Clustering command is waiting for exclusive read\/write access to the configuration of an availability group.&#8221;<\/p>\n<p>HADR_REPLICAINFO_SYNC was defined as &#8220;waiting for concurrency control to update the current replica state&#8221; and HADR_LOGCAPTURE_SYNC was defined as &#8220;concurrency control around the log capture or apply object when creating or destroying scans. This is an expected wait when partners change state or connection status.&#8221;<\/p>\n<h2>What&#8217;s next?<\/h2>\n<p>There is much more to be explored in this area and other wait types not yet discussed. If you are encountering performance issues in your availability group topology, or if you suspect your availability group topology is causing issues with other workloads, I suggest you evaluate the accumulated wait statistics and use them to help prioritize your troubleshooting efforts.<\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>If you are trouble-shooting an AlwaysOn Availability Group topology, a study of the wait statistics will give a pointer to many of the causes of problems. Although several wait types are documented, there is nothing like practical experiment to familiarize yourself with new wait stats, and Joe Sack demonstrates a way of testing the sort of waits generated by an availability group under various circumstances.&hellip;<\/p>\n","protected":false},"author":221918,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143527],"tags":[4294,4178,4170,5339,4150,4151,5534],"coauthors":[],"class_list":["post-1339","post","type-post","status-publish","format-standard","hentry","category-database-administration-sql-server","tag-availability","tag-bi","tag-database-administration","tag-high-availability","tag-sql","tag-sql-server","tag-sql-server-2012"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/1339","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/221918"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=1339"}],"version-history":[{"count":5,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/1339\/revisions"}],"predecessor-version":[{"id":92255,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/1339\/revisions\/92255"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=1339"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=1339"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=1339"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=1339"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}