17 January 2019

How to Monitor IDENTITY columns to prevent unplanned downtime

If a table runs out of IDENTITY values then it, and any dependent services and applications, will be "read-only" until the problem is fixed. Steve Jones explains how to set up a custom monitor to detect and prevent such problems.

We all try hard to plan for data growth and changing workloads, as we build and adapt our SQL Server systems, and to anticipate potential problems. Some issues, however, take a long time to appear. They lay in wait, unnoticed but not dormant, and if they strike without warning, they can cause severe disruption and extended downtime.

In November 2018, Basecamp, a software as a service provider, were caught off guard when one of their events tables ran out of IDENTITY values. It put their service into read-only mode for almost 5 hours. If you detect this looming problem early enough, you’ll have the time you need to plan and implement a cure, such as changing the datatype of the IDENTITY column from an INT to a BIGINT, as described by Danny Kruge in his Simple-Talk article. However, this can take time and planning, and may also require a scheduled maintenance window.

Here, I’ll should you how to set up a custom metric in Redgate Monitor that will alert you in good time, if an IDENTITY column is starting to run out of available numbers.

Planning Ahead

Many database designers will design their tables to use the IDENTITY property on an integer-based column, for a primary key. This is a common design and provides a surrogate key that is independent of any other data values in the table that may change. These values auto increment and are handy when you may not have another column (or set of columns) that you can guarantee to be unique across your data set.

In most cases, the IDENTITY is created on a column with an INT data type, which gives you 4 billion possible values. However, many of us prefer only positive values and start our keys at 1, so we have 2 billion values (2,147,483,647 to be exact).

In systems subject to low or moderate workloads, having over 2 billion values to play with is more than ample. Even if your table loads in 100,000 new rows a day every day for ten years, you’d still have plenty of positive IDENTITY values left to use, and on almost all databases I’ve managed, the workload is much lower than this.

This leads many administrators and developers to assume that IDENTITY values are unlimited. In a practical sense they are for most tables, but that doesn’t excuse any lack of monitoring. If my workload dramatically changed and I suddenly loaded in 5,000,000 rows a day, I’d run out of space in less than two years. If my system has already been running for 4 years when the workload changes, it is likely no one has thought this far ahead.

Installing a custom metric to detect ‘nearly depleted’ IDENTITY columns

Redgate Monitor is a comprehensive monitoring and alerting system that watches your instances and databases for you. It constantly examines many pre-configured and custom metrics, sending alerts when thresholds are exceeded and ensuring data is retained to allow for trend analysis and extrapolation.

Redgate Monitor collects all the metrics automatically that are critical for every SQL Server database, and then allows to customize your monitoring stagy, for a SQL Server by adding your own custom metrics. There are always certain values that one DBA might find useless, while another feels these are critical.

Go to sqlmonitormetrics.red-gate.com, and you’ll find that it’s full of preconfigured custom metrics that you can add to your Redgate Monitor installation with a single click. These metrics are often written by the Redgate team, and cover a range of areas that are important to many system administrators, and there are also custom metrics written by other Redgate Monitor customers, including many Microsoft Data Platform MVPs.

One of the available metrics is Identity columns near limit. This metric is designed to scan your database and let you know if any of the IDENTITY columns are approaching their limit. It checks IDENTITY columns for all numeric data types (tinyint, smallint, int, and bigint), and has associated custom alerts, which will fire when they are within some percentage of the limit. The default is within 10% of the limit, but this can be easily configured.

Configuring the metric and alerts

You should install the metric so that it collects its data for every database on every instance that will experience problems if an IDENTITY column runs out of values.

Having installed the metric, you need to consider how to set up effective alerting. This will depend both on the rate at which the values are being used up, and on the pace of development in your organization. In other words, if I told you that a smallint IDENTITY column was running out of space, how long does would it take to change the datatype of the column to an int? Or change an int to a bigint? This might take longer than initially would have thought. Making the change in a database table might not be enough to ensure your application continues to work. Data types map differently in C#, Java, Python, and other application development languages. You might incur substantial work to change data types in your software.

It doesn’t matter how long it takes, just that you know how long, so that you can design the alerts to ensure you can make the change before you run out of values.

In my Redgate Monitor installation, I’ve added the metric, and set my alerts as follows:

My rule-of-thumb is that I want a low-level alert well in-advance of any potential problems. I would probably set an initial low-level alert to fire when 50% of the values have been used.

Since I rarely use anything other than int or bigint for an IDENTITY column, this will allow me plenty of time to plan and make a change. I would use the early alert to start a conversation in the development groups about how and when to make this change.

When I receive this alert, I would start to examine the data growth in the table(s) that are running low on values. I would then estimate how long it will take to make a data type change, based on my ongoing conversations with developers and system administrators.

My medium-level, or second alert, is ideally set at a threshold level where a project needs to be started immediately, but we aren’t at a critical level. I would usually set this to fire when the IDENTITY is 70% depleted, though depending on the data growth, the number of columns that are potentially approaching a limit and on my conversations with the teams, I might need to adjust my second alert, moving it to a lower threshold if I feel it doesn’t give the team enough time to respond, or to a higher threshold if it seems like an unnecessary alert.

If I have received a second alert, I would also ensure I set a reminder in my calendar to monitor this on a monthly basis. I don’t want alerts, but I’d run the query in Redgate Monitor to keep an eye on how quickly I’m using IDENTITY values, and save the results.

My high-level alert is at 90% depletion. Even in most fast-growing tables I’ve managed, I’m not going to use the remaining 10% of my IDENTITY values in a few weeks. This is, however, a critical time. We need a project that is underway with plans to make this change in the next few weeks. This is truly a high-level alert, and a place where I need myself and others to be building a solution right now.

Dealing with the worst case

If the worst happens, despite my planning and monitoring, and a table runs out of IDENTITY values, it’s possible that my application becomes a read only system, but it’s just as likely that core functionality is affected, and users receive constant errors. In this case, I like to have an emergency plan.

I always to set IDENTITY values to start at 1 with an int datatype (or bigint). So, my emergency plan, if we haven’t been able to make a datatype change in time, is to reseed the IDENTITY property to start from the lowest negative value or start from -1 and count backwards. The choice would depend on what would be least disruptive to users and administrators, but both offer me an emergency option.

Conclusion

It is very easy to lose track of IDENTITY values that are approaching the limit for a datatype. In most systems, it will take many years to use of all possible values (or even all possible positive values) However, without having monitoring set up to watch for the approaching problem, and alerting system administrators, software can grind to a halt.

Redgate Monitor has a custom metric you can easily install to help you avoid these difficulties. You can customize the alerts and choose when those are sent to admins. If you set thresholds to give you time to respond, and have an emergency backstop plan, then your users should never be affected by this issue.

Product articles SQL Toolbelt Server availability and health
How to Monitor IDENTITY columns to…

How to Monitor IDENTITY columns to prevent unplanned downtime

Planning Ahead

Installing a custom metric to detect ‘nearly depleted’ IDENTITY columns

Configuring the metric and alerts

Dealing with the worst case

Conclusion

Tools in this post

Redgate Monitor

SQL Toolbelt

Redgate Monitor

Product articles SQL Toolbelt Server availability and health How to Monitor IDENTITY columns to…

How to Monitor IDENTITY columns to prevent unplanned downtime

Planning Ahead

Installing a custom metric to detect ‘nearly depleted’ IDENTITY columns

Configuring the metric and alerts

Dealing with the worst case

Conclusion

Tools in this post

Redgate Monitor

SQL Toolbelt

Redgate Monitor

You may also like

Basic Tips for using SQL Monitor Successfully

Monitoring an estate and insights for your success: fireside chat with Kevin Davis, Manager of Database Administration

Exclusive SQL Monitor Customer User Group

Understanding PostgreSQL's Cache Hit Ratio

Tracking the number of active sessions on a database using Redgate Monitor

Product articles SQL Toolbelt Server availability and health
How to Monitor IDENTITY columns to…