A Use Case for Memory-Optimized Tempdb Metadata

Simple Talk - Redgate Software Blog
Comments 0

Share to social media

I recently had an interesting production SQL Server issue that turned out to be very easy to fix. The fix doesn’t fit every workload or server, but for the limited use cases described below, it’s a simple configuration change. The general best practice for server level configurations is to leave things at default, unless there is a specific reason for changing the settings. This will outline a use case for using memory-optimized tempdb metadata.

This covers a very specific scenario and workload. You won’t use this on a typical server, which is why it isn’t a default setting. You will see this for very specific server workloads, those with many transactions and high temp table usage. Most systems can better use the memory for the regular workload instead of optimizing tempdb metadata, so don’t use this as a default setting for your servers.

Symptoms

Symptoms for the issue started with reports of a very poor performing system. Digging into the details showed 100% CPU utilization on the SQL Server and queries performing much slower than normal. After eliminating deadlocks and excessive blocking, current queries were examined and their wait types. Wait types are an important health indicator and will be one of the first things you want to check when troubleshooting generalized issues.

Changing server configuration options presupposes that a lot of additional database and server tuning has already happened. You want to be sure the general database health is good, indexes are getting rebuilt on a regular basis, statistics are updated, recommended / necessary indexes are in place, and queries are optimized. You also need to check for deadlocks and blocking.

If you’ve done your standard troubleshooting and notice a high number of page latch waits, your server might be a candidate for memory-optimized tempdb metadata. Latches are light weight locks used internally by the system. You don’t need to know all of the details to diagnose this issue, but they are described in the first link in the references. The confirmation steps to diagnose this issue will follow the general pattern:

  1. Performance is impacted and standard troubleshooting is performed
  2. Large number of page latch waits are found
  3. The page latch waits are in tempdb
  4. Other page latch wait types are ruled out – leaving only metadata latches
    1. GAM
    2. SGAM
    3. PFS
  5. Enable server setting ‘tempdb metadata memory-optimized’ if appropriate and confirm improvement
    1. If you know this is a high transaction system that extensively uses temp tables, it’s an even stronger indication to make the change

Performance

You might be lucky enough to catch this before you start having performance issues, but you will likely notice this first due to poor performance. It may be high CPU usage, or just very poor SQL Server performance. Conduct your standard troubleshooting here, including server diagnostics. Once you have confirmed the server is performing poorly and you don’t see an obvious cause, examine current queries and wait types on the server.

Wait types for current queries

Part of performance checks include looking at current queries and their wait types. The following shows current processes and their associated wait types. If there are a significant number of page latch waits, it’s an indication to continue investigating possible latch contention. Other wait types don’t immediately rule out latch contention, but it’s a strong sign that it’s a different issue. Keep an eye on wait types as you continue troubleshooting and monitoring, but it may not be the cause of your performance problems if you don’t see it here.

If you see page latch wait types when looking at current processes, you need to get additional details. You can look at waiting tasks and the current sessions to get more information. The resource description in the query shows the details for the resource, including the database ID. The number in the resource_description column before the colon indicates the database ID. Tempdb is always 2, as shown in the example below. If you see latch waits concentrated in tempdb, continue investigating. If the page latches are not in tempdb, you can use the Microsoft troubleshooting guide listed in the references.

Page latch waits in tempdb

After you have confirmed you are experiencing latch contention in tempdb, you need to confirm that the latches are not GAM, SGAM or PFS latches. Metadata latches don’t show up directly, instead the other latch types are eliminated.

After you have confirmed you have tempdb metadata latch contention, and the issue persists, consider enabling the memory optimized configuration for tempdb. If you want additional confirmation, support can be engaged at this time. Be sure to include your troubleshooting steps to expedite your ticket.

Implementation

Since this impacts tempdb, the change is implemented at the server level using sp_configure. The setting is ‘tempdb metadata memory-optimized’. If you haven’t already enabled viewing advanced options, you’ll need to do it for this. If it returns a config value of 0, it needs to be changed to 1.

If you see the following, change the value to 1. It doesn’t require a service restart and is safe to change even under load.

Advanced options are now available, but this is just the preamble to making the actual change. First, confirm the current setting.

If your server configuration is using the default, tempdb is not memory optimized, as shown here.

To enable tempdb to be metadata memory-optimized, change the value to 1, run the reconfigure command, and restart the service.

The SQL Server service now needs to be restarted. Ideally, changes like this happen during your regular maintenance window, but you may not realize you need this change until it is a production issue and the server is already struggling. You don’t have to restart the server, but the SQL Server service does need to restart using your usual method. The restart will result in a disruption of service. The disruption will be limited if you have a high-availability setup, but it will impact current connections.

Reminder – This will disrupt your service and drop existing connections. If possible, do this during a maintenance window.

Once the service is restarted, confirm the setting is changed.

You now need to check your server health. Look at your wait types to confirm that you no longer have latch contention in tempdb metadata using the same methods described above. You also need to confirm that query and server performance have improved. It’s a simple change that can have a big impact on systems experiencing this particular issue.

Reverting the Configuration

If server performance doesn’t improve or you have additional issues with this setting change, the fix is to revert the setting back to 0 and restart the service. If you confirmed that the server is having metadata latch contention in tempdb (and you did the other troubleshooting steps), I wouldn’t expect that you would need to revert this change, but it’s always good to have a contingency plan.

Reproducing the use case

This use case can be demonstrated by creating as many temp tables as possible, as quickly as possible. If you run a script like the following, in multiple connections in SSMS simultaneously, you will be able to see the problem. I was able to produce the issue locally with 5 scripts running. As noted in the script, don’t run this in a production environment. I wouldn’t even run it in a shared environment – it will cause performance problems, including high CPU usage. The good news is, there is no cleanup needed when you are done testing. Performance will return to normal once the scripts are stopped or once they finish creating temp tables.

CPU spikes when the script is run in multiple connections. If you run the troubleshooting queries from the previous sections, you will see page latch contention and you will also see that the contention is in tempdb. This query is how I reproduced the issue for my examples.

If you enable the setting ‘tempdb metadata memory-optimized’ on your test server and run the test scripts again, you will see improved performance and that the page latch contention is gone.

Summary

This is one use case for enabling the memory optimized tempdb metadata server setting. If your system has a high number of concurrent transactions and uses temp tables extensively, it may be a good fit. It worked well in the scenario I have described above, but remember to always test changes to the system settings in SQL Server and confirm improvement after making changes. The general best practice is to leave things at the installation default unless you have a specific reason to make the change. Check your usage patterns, look at your wait types, and be ready to revert this setting if it degrades performance. Opening a support ticket is always an option if you need additional confirmation before making a server configuration change in a high value production environment.

References

Article tags

Load comments

About the author

Ben Johnston

See Profile

Ben is a data architect from Iowa and has been working with SQL Server since version 6.5 in the late 90's. Ben focuses on performance tuning, warehouse implementations and optimizations, database security, and system integrations.