21 April 2020

7 steps to effective database monitoring

Updated in 2024.

There are a whole range of reasons you should monitor your SQL Server estate, from the essentials that support firefighting through to advanced modelling to help with proactive future planning.

However, there are a few steps you need to take in order to ensure the monitoring is effective. In this post, we’ll show you what they are and provide some tips on how Redgate Monitor can make those steps easier.

If you’d prefer to watch this as a webinar instead, you can also see the original presentation of this content, which was part of Redgate Streamed.

But first …

Before you get started, you need to ensure you understand your systems and your business. The business goals of the company you work for will affect your priorities, so you need to be fully aware of the activities and processes that will have an impact on your estate. For example, are there seasonal upturns or downturns? Are there particular privacy issues in play? How does demand fluctuate during the day?

You also need to decide on buy-versus-build early on in your decision-making process, as this will influence where you invest your money and which resources are required. Your decision to purchase a monitoring tool will require a different planning approach to one that you build in-house.

One important point to bear in mind whichever route you go down is that you need to separate your monitoring tool from what is being monitored. If you put your monitoring tool on one of the servers you want to monitor you’ll add to the load, and exponentially increase it as you monitor the increase. Also, if that server goes down, so will your monitoring, and you won’t be able to use your tool to fix it or even alert you to the issue.

Finally, you must do capacity planning on your monitoring server to ensure it’s powerful enough to run your tool and deal with the volume of data coming in from your estate – an underpowered machine will prevent you from efficient monitoring. You should also plan for your monitoring to grow with your business, including upgrading the machine it’s hosted on as your estate grows.

With that said, let’s get into the 7 steps.

1. Define the servers under management

The first step to effective SQL Server monitoring is to find all your servers. Make sure you’re aware of all of the SQL Server instances in use at your organization, which means physically looking for machines in different departments as well as those hosted in the cloud.

When you know what servers you have, where they’re located, and who owns them you can begin to define the purposes they serve. Through this, you can group servers by usage and define service level agreements (SLAs) with the business to help prioritize monitoring requirements.

To help do this you can use the Microsoft Assessment and Planning Toolkit (MAP), which will give you a report on the servers that are on your environment to help you track down their location and who is responsible for them.

You can also use DBATools and PowerShell to explore your system and identify SQL Servers on the network.

Tip: Let your network team know before you use these tools, so they know you’re doing the work and don’t think there’s an attack on the network.

2. Document your servers

Next you need to ensure you have your server properties documented to help with your capacity planning, deployment, and management of them. You can use the information you document to ensure the servers are set up correctly and being used efficiently.

Also document the database properties on your servers, such as who owns them, how they’re set up, and who has access, to make sure the permissions for sensitive data are correctly set.

Finally, document the location and types of server you’re running, such as if there are virtual machines running in Azure or AWS, or if they’re in a specific data center. This matters because there are different monitoring requirements depending on how and where the servers are hosted.

To document your servers you can use PowerShell and DBATools to connect to your servers and run queries to pull this data together. You can also run T-SQL scripts against your system tables to pull out the information you need from your servers.

With Redgate Monitor, you can use your documentation to categorize your servers to ensure you’re prioritizing your monitoring for those containing the most sensitive or important data, such as production. Redgate Monitor will then automatically pull through key server information, such as processor speed, CPU limit, log space, and permissions so you can gain instant access to the information you need to manage that server. You can also drill down to the database level for specific information, which is automatically pulled through, removing the need to track down and document the information manually.

3. Track SQL Agent jobs

At a basic level you need to understand whether SQL Agent jobs are succeeding and failing, and you’re likely to be more interested in the jobs that have failed to understand what the problem was.

Monitoring is really important for SQL Agent jobs. You can use a tool to automatically track job successes and failures, or you can manually run queries to look at job history tables.

Beyond successes and failures you can also track jobs that are running longer than usual. For example, if a re-indexing job normally takes 46 minutes but is still running after 65 minutes, you should set up an alert so you can investigate what’s taking so long.

This is important, because if jobs overrun a ripple effect can cause additional problems such as maintenance tasks running into core business hours.

At a higher level, understanding the metrics about your SQL Agent jobs, such as average run time or frequency of failures, helps you take a more proactive approach to improving performance.

Within Redgate Monitor you get automatic monitoring of your SQL Agent jobs across your estate. You get a summary of all succeeded and failed executions at a high level, and you can then drill down to jobs per server, followed by individual execution metrics for each job.

4. Identify disk space, capacity, and growth rates

Disk space always causes headaches for DBAs, who often find themselves firefighting when unexpected growth causes disks to fill up at inconvenient times.

When you know how and why disks are growing, and specifically which files on the disk are growing, you can take action to identify the cause and make the changes required to manage the issue.

By efficiently tracking your disks, files, and file groups you can ensure you get alerted to changes as early as possible to give you time to react.

For example, you can use the Windows Management Instrumentation service (WMI) to run disk space and file monitoring queries from the Windows API. Through T-SQL queries you can monitor SQL Server to understand what space is being allocated to ensure it matches requirements. PowerShell can then bring these two pieces of information together into a single report.

With Redgate Monitor you can see live disk usage on your server alongside performance data for each disk. Furthermore, you can see projected disk usage that will take historical data to predict when you are likely to run out of disk space, enabling you to schedule maintenance well in advance of it becoming an issue.

You can also set up alerts to let you know ahead of time when a disk is likely to run out of space, customized based on your procurement and management needs.

But it’s not just about exceeding capacity: with Redgate Monitor you can see which disks are being under-utilized to help optimize your resources, or potentially reduce costs.

5. Understand system resources: CPU, I/O and memory

As well as disk capacity, monitoring resource usage is also key to maintaining a healthy SQL Server estate.

Simply monitoring your system metrics isn’t enough. It’s important to understand what’s normal for your servers and to identify where abnormal behaviour is occurring, which could be a sign of a deeper problem.

To manually monitor system resources you can use Windows Performance Monitor (Perfmon), through which you can customize collection frequency and set what information you want to gather from the server.

You can also use T-SQL to run Dynamic Management Views (DMV) and collect specific pieces of information about your server, enhanced with specific stored procedures, such as sp_whoisactive for activity monitoring.

Extended Events in SQL Server will help you identify and troubleshoot performance issues on your servers, while Query Store can aggregate information about your queries at a database level to help you catch problems.

With Redgate Monitor you get continuous monitoring of CPU, I/O and memory, alongside other metrics, for each server on your estate, including on-premises and cloud instances. You can set thresholds for this data to trigger alerts at certain levels, plus you can view historical data to help you understand what’s normal and what’s abnormal for a specific server.

With Redgate Monitor’s waits graph you can filter top waits to see those that are causing the biggest issues, and then visualize the historical data on any specific wait to help you identify anomalies and get advice on fixing problems.

You can also put Extended Events to work within Redgate Monitor as a way to add custom metrics for any special needs you have within your environment.

6. Find errors

Errors clearly matter when managing SQL Server. Deadlocks, blocking processes and long-running queries are the most common performance issues that arise, and you’ll need to deal with them regularly. These can slow down or completely stop your databases from running, so keeping on top of them is a priority.

You can use a series of tools to monitor errors, such as SQL Agent to capture errors and send alerts, which can be configured using PowerShell. You can also use DBMail to receive error notifications via email, but be aware that a high volume of alerts can crash your mail server.

DMVs, as previously mentioned, can help when you use the queries in the right order, as can Extended Events, which can give you a rich amount of information as long as you ask the right questions.

Redgate Monitor can help you easily manage your error handling and get the right alerts when you need them. You can choose how you get alerted and through which channel, such as Slack, email, or text message, and you can suppress alerts for specific periods like maintenance windows.

Redgate Monitor comes pre-installed with over 60 alerts out-of-the-box, but also has a powerful custom metrics tool that lets you create your own and install metrics written by other SQL Server professionals.

When you get alerted to an error you can dive in and understand what has happened, plus see the impact it has had.

For example, you can drill down into blocking processes to see what’s going on, when and where they occurred, and visualize the chains to find out where problems happened and how you can fix them.

7. Track and tune your monitoring

Finally, you need to monitor your monitoring to make sure you’ve got the right tools for the right jobs, and you’re getting actionable information out of your monitoring tool.

For example, if you’re getting more alerts than you can read, you’ll probably end up ignoring them all. Instead, you need to be able to accurately tune your monitoring to get the right alerts, when you need them, rather than drowning in them and risk missing something important.

Furthermore, you should be able to take action from the alert, whether that’s getting details through email or other notification channels, or getting linked directly to the problem so you can see what’s wrong and how you can fix it.

With Redgate Monitor you can customize and tune your alerts, group them by type to reduce the noise, and take immediate actions directly from the alerts to fix any issues.

Next steps

Now you know what you need to do to achieve successful monitoring of your SQL Servers, it’s time to grab a tool to automate as much of the process as possible.

With Redgate Monitor you get a single pane of glass to manage your entire estate. It scales as you add more servers and can be adapted to meet your specific monitoring needs. You can diagnose issues and find the root cause in seconds, and you can understand the biggest performance issues to help make improvements. With alerts that can be tuned and customized to your needs, you can find and fix issues before your users even notice.

Download your free trial today to get started.

Tools in this post

Redgate Monitor

Real-time multi-platform performance monitoring, with alerts and diagnostics

Find out more

Redgate Test Data Manager

Redgate Flyway

Redgate Monitor

Overview

Protect

Automate

Monitor

Redgate Blog

7 steps to effective database monitoring

But first …

1. Define the servers under management

2. Document your servers

3. Track SQL Agent jobs

4. Identify disk space, capacity, and growth rates

5. Understand system resources: CPU, I/O and memory

6. Find errors

7. Track and tune your monitoring

Next steps

Tools in this post

Redgate Monitor

Redgate Test Data Manager

Redgate Flyway

Redgate Monitor

Overview

Protect

Automate

Monitor

But first …

1. Define the servers under management

2. Document your servers

3. Track SQL Agent jobs

4. Identify disk space, capacity, and growth rates

5. Understand system resources: CPU, I/O and memory

6. Find errors

7. Track and tune your monitoring

Next steps

Tools in this post

Redgate Monitor

You may also like