Mastering TempDB: The basics

The series so far:

I regularly deliver a presentation called “Mastering TempDB”. I wrote it based on what I wish I knew about TempDB “growing up” in my career as a SQL Server DBA. It really is a breakdown of what TempDB is, how it is used, what problems actually occur within it, and how to best configure it. With advice from my colleagues, I have decided to break that down into this article to educate you.

What is TempDB?

TempDB is a global resource used by everyone and everything inside SQL Server. Think of it as a dumping ground for anything that doesn’t fit in memory. It is always database_id number two in sys.databases.

It’s non-durable, meaning that the database is recreated every time SQL Service restarts. A new set of data and log files are recreated each time. So, what does this mean to you? First of all, you shouldn’t put any objects in the TempDB database that you need to be persisted. Don’t use TempDB as a development database (except for maybe a quick throwaway test), because once services restart, you will lose your work.

TempDB works in a round-robin fashion known as proportional fill; it writes data in all files based on free space in each file and alternates between files to find the best place to do its work. I’ll talk more about this later in the article.

TempDB is just like other user databases for the most part except for logging and durability. It is minimally logged. There is still a transaction log, but that log is used for rollback only, not recovery. TempDB is non-durable because there is no need for recovery since a new one is created each time.

TempDB needs to be on fast storage with low latency. There is a very high concurrency rate of transactions that write data to TempDB that are happening all the time, so you need to ensure it is pinned to the fastest possible disk. In addition, as a best practice, I recommend isolating TempDB data files and log file on their own disks.

What’s stored in TempDB?

I find this easier to break down in bullet points. When looking at this, keep in mind that each time you write code, it can cause data to take up space in TempDB. When determining what type of operation to use, take a minute to determine if it is needed, such as a sort. In addition, watch for spills to TempDB—this is similar to an operating system paging memory to disk. Spilling to disk means there was not enough memory granted, and the operation was forced to “spill” or dump to TempDB to do its work. I also recommend when performance tuning stored procedures you use SET STATISTICS IO on to see the amount of data being processed in your work files and worktables. These too are processed inside TempDB as you will see.

Temporary user objects that are explicitly created

Global or local temporary tables and indexes,
Temporary stored procedures
Table variables
Tables returned in table-valued functions
Tables used in cursors.

Internal objects

These are objects that are created by the database engine.

Worktables to store intermediate results

Spools
Cursors
Sorts
Temporary large object (LOB) storage.

Work files for hash join or hash aggregate operations

Hash joins or hash aggregate operations

Intermediate sort results

Rebuilding indexes (if SORT_IN_TEMPDB is specified)
Certain GROUP BY, ORDER BY, or UNION queries.

Version Stores

RCSI- Read-Committed Snapshot Isolation using row versioning isolation or snapshot isolation- This is a big user of TempDB resources!
Online index operations
AFTER triggers

Availability Groups

Availability Group replicas use Snapshot Isolation by default which uses row versions of each transaction. These transactions are stored in the Version Store within TempDB. They do this on secondary replicas to avoid blocking, which could prevent transactions from the primary replica from being applied. However blocking can still be an issue for long-running transactions that lead to the version store clean up process being block and possibly causing your TempDB to fillup. When this happens, you will not be able to failover. Be sure you educate yourself on how this all works. Here is a great reference link.

When using a secondary read-only replica, the query optimizer also creates temporary statistics within TempDB. The statistics from the primary database are replicated to the secondary. SQL Server also needs statistics on read-only data based on queries that run on the secondary. You will find these inside TempDB with the suffix of _readonly_database_statistics. The query optimizer uses these statistics to optimize the workload on your secondary.

What causes issues within TempDB?

Many think it is the I/O that is the main cause of issues within TempDB, but it is not. I/O can be an issue when it comes to the disk and latency, but it is really not the big bottleneck. The cause is is latching and contention. All the tasks coming through TempDB fight for the same resources and working space. That work requires latches to the three special pages that manage this: GAM (Global Allocation Map), SGAM (Shared Global Allocation Map), and PFS (Page Free Space). These pages are responsible for determining where work takes place within the TempDB data files. Operations waiting for this determination, especially when a lot of processes come in at one time, can quickly queue up resulting in “contention”. Below are three kinds of contention that can happen:

Object Allocation Contention– This is where the system is looking for space to work. It is contention for metadata pages used to manage space allocation.

Metadata Contention – This is contention for pages that belong to system object used to track metadata. The data relates to the creation and destruction of temp tables and deallocating them.

Temp Table Cache Contention-Temp table cache helps with metadata and object allocation. It allows for reuse of temp tables.

Pam Lahoud (B|T) has a fantastic blog and session on this where she deep dives into contention; be sure to take a look.

How do you fix contention?

Now that you understand the causes of Tempdb issues, here are some things to consider when trying to fix them.

Proper configuration

To take advantage of the round-robin processing of TempDB, it is essential to have multiple data files so that the workload can be spread proportionally across them. To ensure this happens, you must make sure your files are also equal sizes and autogrowths are the same. If you have one file larger than the other, you have contention. The system will place all the work in the biggest file, thinking it has the most free space. In addition, when adding a new file plan to do a restart. Though you do not need a restart to add a file, the engine will end up using the new file only because it now has the freest space. You want to reset TempDB with a restart to ensure the proportional file usage is maintained. I make it a practice to check in on my TempDB regularly to make sure they are all aligned.

Besides enabling the proper use of round-robin logic, adding more files gives you more of these special pages work with. With each data file you add, you get one GAM page and one SGAM page for every 4 GB of space allocated to those files. The question I always get is how I know what size to make them to ensure I have the perfect amount of these pages. Unfortunately, there is no concrete answer to that. There is no script or algorithm I can offer to determine a magic number for your environment. Like with any other setting, you need to know your workloads to make that determination. I encourage you to monitor your TempDB usage and growth rates as a starting point. Another great tool I like to use to keep an eye out on your Virtual Log Files (vlf’s) as an indicator of these events. Here is a blog to learn more about these.

The new versions of SQL Server help simplify this for us. During the install, it helps you determine the number of files you need and lets you create them as part of the setup process.

Finally, how do you configure your disk? TempDB processing happens on the actual disk, not memory for most operations. Since TempDB is highly concurrent and is used by so much inside the engine, it is critical to make sure it is placed on the fastest possible disks available. Use flash drives if you can. These are high-speed storage based on non-volatile memory; you may see it referred to as NVMe, Non-volatile Memory Express. These are SSD, solid-state drives. One thing to keep in mind is that NVMe’s are SSDs, but not all SSDs are NVMe’s there are different types of SSDs. No matter what type of enterprise grade SSD it is, these are great for TempDB workloads. If you want to learn more about the types of storage and what questions you should ask your storage admin as you try to determine the best configuration for your SQL Server TempDB, I have a beginners storage blog for you to read up on it.

Trace flags

Prior to SQL 2016 ,using trace flag 1118 and 1117 helped to reduce contention by ensuring that when one file grows, they all grow at the same time as well as changing the mixed extents algorithm which reduces contention on the SGAM pages. There is no downside to enabling 1118. For 1117, keep in mind this trace flag applies to all file groups, not just TempDB. When enabling 1117, if you have other file groups in your user databases and one of those files grows, it will trigger a growth event for the other files within that group and that database. If you are using an older version of SQL Server, I HIGHLY recommend these are added. It is so important, that these have now been defaulted.

Cumulative Updates

Metadata is out of the hands of the database admin, but how it is handled is critical to our workloads. It is in the hands of the SQL Engine team to identify issues and make improvements to how it works. However, you as a DBA can ensure you have the best code base for this simply by applying the most recent CUs (cumulative updates). You can take advantage of the fixes Microsoft provides to help alleviate this issue by applying the CUs. Over the years, the metadata tracking and object allocation methods have evolved to ease contention issues; the only way to take advantage of these changes is to upgrade and patch. There are several notable CUs where metadata contention was fixed: SQL 2016 SP1 CU8, SQL 2017 CU5, just to name a couple.

SQL Server 2019

TempDB got some real attention in SQL Server 2019. They introduced improvements to Temp Table Caching, Concurrent updates to PFS pages and provided two exciting new things In-Memory OLTP TempDB Metadata (Opt- in) and Table Variable Deferred Compilation. Like with all new features introduced, there are some growing pains with the In-Memory system table, so make sure you are on 2019 CU2 as they fix some contention that arose in this feature. You can read about it here.

To take advantage of the new In-Memory OLTP TempDB Metadata feature, you must enable it:

1 2	ALTER SERVER CONFIGURATION SET MEMORY_OPTIMIZED TEMPDB_METADATA = ON;

How do I see what is happening in TempDB?

You can take a peek inside your TempDB database to see what is happening. As you now know, TempDB is about allocating spaces on pages for tasks to happens. You need to be able to view those space allocations and identify those queries taking up that space. The below DMVs (dynamic management views) give you a way to just that. Note I have also included DMVs for looking inside the version stores which, if you recall, are also stored inside TempDB.

sys.dm_db_file_space_usage -Space usage of files in the databases

sys.dm_db_session_space_usage -Exclusive to the TempDB database, it returns the number of pages allocated and deallocated by each.

sys.dm_db_task_space_usage -Exclusive to the TempDB database, it returns information about the number of pages allocated and deallocated by each task.

sys.dm_tran_active_snapshot_database_transactions – Returns active transactions that generate and may access row versions.

sys.dm_tran_version_store- Version records in the version store