SQL Server Performance Troubleshooting with Redgate Monitor

By Tony Davis & Grant Fritchey

Any SQL Server monitoring tool must gather the metrics that will allow a DBA to diagnose CPU, memory or I/O issues on their SQL Servers. It should also provide a set of accurate, reliable, configurable alerts that will inform the DBA of any abnormal or undesirable conditions and properties, as well as specific errors, on any of the monitored servers.

This article provides an in-depth guide to the monitoring and alerting functionality available in one such tool, Redgate Monitor.

Contents


Any SQL Server monitoring tool must gather the metrics that will allow a DBA to diagnose CPU, memory or I/O issues on their SQL Servers. It should also provide a set of accurate, reliable, configurable alerts that will inform the DBA of any abnormal or undesirable conditions and properties, as well as specific errors, on any of the monitored servers.

This article provides an in-depth guide to the monitoring and alerting functionality available in one such tool, Redgate Monitor. It explains how the tool works, reviews its features, its architecture, shows how to configure it, and offers some practical examples of how to use it to monitor your SQL Server database servers and instances.

The tool provides core monitoring functionality, such as built-in and customizable alerting, performance diagnosis using wait statistics, and the ability to capture and compare performance baselines, as well as features such as Availability Group monitoring, integration with Active Directory, distributed estate monitoring, data filegrowth projections and much more.

Why you need a SQL Server monitoring solution

When I first started out as a SQL Server DBA, I was in charge of about 20 instances. My morning ritual was to log remotely into each instance, one at a time, and work through a series of checklists, verifying each instance was running as expected, checking disk space, confirming that all overnight jobs ran successfully, examining error logs in case of issues, checking key performance counters, and much more.

Even with only 20 instances, this manual checking was very time consuming. In fact, on some very busy days, I didn't even have time to check every instance. Also, the process got boring and tedious very quickly.

Brad McGehee, SQL Server DBA

If Brad's description of his early days as a DBA bears any similarity to a typical day for you, or if you spend a lot of time gathering performance and other metrics from your SQL Server instances and databases, manually, in response to various reported issues, then it's likely that a SQL Server-specific monitoring solution will save you a lot of time.

The monitoring solution will gather the metrics you need to diagnose any CPU, memory or I/O issues on your SQL Servers. It will provide a set of configurable alerts that will inform you when an error occurs, when jobs fail or fail to run, database or server properties deviate from the norm, there are sustained spikes in resource usage, or abnormal trends, and so on.

How Redgate Monitor works

Redgate Monitor is both a performance monitoring tool and an alerting tool. It continuously monitors a set of target SQL Servers, collecting a wide range of server-, instance-, and database-level diagnostic data that will help a DBA to evaluate current activity levels on the server, and to identify resource bottlenecks, query performance problems, and other common errors and issues on their database systems.

Redgate Monitor raises an alert whenever a particular event occurs (such as a SQL Server error or a job failure) or whenever some pre-defined, but configurable, threshold value for some metric (such as disk space usage, or query duration) is exceeded. Via the Redgate Monitor web interface, users can drill into the list of all the alerts raised on the monitored servers, along with various overviews and summaries of the collected diagnostic data.

Therefore, for every alert raised, we can see various snapshots, aggregations and summaries of resource usage and query activity on the server around the time of the alert, helping us to determine quickly what might have caused it. We can also compare resource metrics at the time a certain issue occurred to their baseline values at a previous time, such as the same period on a previous day, in order to see exactly "what changed".

Alongside the pre-set summaries of the data that Redgate Monitor automatically displays in the web interface, we also create analysis graphs displaying the values of multiple performance counters over a desired time range. Outside the web interface, we can even run custom SSRS reports directly against the Redgate Monitor database.

Lightweight implementation

Users access the monitoring data and alerts via the Redgate Monitor web interface, meaning the data is accessible to authorized users on any device that has a web browser.

Redgate Monitor collects all its data using a lightweight, agentless, 3-tier architecture that minimizes the danger of the observer effect (where the act of collecting performance data from a target server affects the performance of that server), and stores all the historical data in a SQL Server Redgate Monitor database

Pre-configured and customizable

Perhaps the biggest strength of Redgate Monitor is that it is both pre-configured and highly customizable. It comes with a pre-defined set of base metrics that will allow it to start collecting server-, instance- and database-level metrics as soon as we add the first SQL Server we wish to monitor. It also comes with a pre-configured set of alerts for common resource usage, query performance and common errors and issues.

In other words, out-of-the-box, with minimal set up and zero configuration, Redgate Monitor satisfies the Pareto Principle (also known as the 80-20 rule); it will not identify and raise an alert for every possible SQL Server-related problem, but it will catch the most common ones.

From this pre-configured base, we can start monitoring and alerting immediately and, as we learn more about our servers, and the level of alerting, start manually adjusting alerting thresholds so that they are best suited for the level of activity on our servers. Redgate Monitor tries to configure sensible threshold values for alerts, but there are very few threshold values that aren't fully dependent on the specific environment, and on the applications and the patterns of activity on your servers.

Alert tuning, grouping alerts, and even disabling alerts, if necessary, is a vital part of maintaining a strong signal-to-noise ratio and therefore a healthy monitoring system that the team actually use rather than just attempt to ignore. Over time you need to tune and adjust your monitoring system so that all alerts are meaningful, and allow the team to take fast, corrective action.

The rules for effective SQL Server monitoring

I recommend reading The Ten Commandments of SQL Server Monitoring by Adam Machanic, as well as Eight Steps to Effective SQL Server Monitoring by Tony Davis. Make sure your monitoring system, Redgate Monitor or otherwise, obeys the rules and guidelines these articles propose. If you find there is some aspect of how it works that bends or flouts the rules, report it (in the case of Redgate Monitor, via the dedicated forum).

Once comfortable with the level of alerting, and confident that the most important alerts don't get lost in a "sea of noise", we can start to build out our monitoring and alerting capabilities. Redgate Monitor lets us write our own Transact-SQL code to create custom metrics that will capture the specific monitoring data we need from our servers, instances and databases, and create alerts on these custom metrics, as required. We'll cover the creation of custom metrics and alerts later in the article.

Pre-defined alerts

Redgate Monitor's key mode of operation is to raise alerts based on abnormal or undesirable conditions and properties, as well as specific errors, on any of the monitored servers, instances and databases. For example, it might raise an alert if the monitoring data reveals:

  • High resource usage – memory or CPU usage for a server exceeds a certain threshold value
  • Disk space running low – find out that the drive housing the transaction log is nearly full while there is still time to act
  • Query performance issues – for example, a query against one of the databases runs longer than a specified duration
  • Specific SQL Server errors and problems – such as a deadlock, or some other error that SQL Server writes to the error log
  • Failed or long running jobs – such as database backup jobs
  • Abnormal values for instance, database, or object-level properties – such as a database's page verification option set to NONE, or an index's fragmentation level exceeding a certain value
  • Availability Groups problems – relating to the availability or performance of availability databases in the primary and secondary replicas

You can find a full list of the pre-defined alerts at monitor.red-gate.com/Configuration/Alerts. Out of the box, Redgate Monitor will raise over 40 alerts directly related to our SQL Server instances, ranging alerts on long running queries, to overdue backups, to blocking and deadlocks, host machine information (disk space, memory and CPU usage, and so on), and information on Availability Groups. Each alert is pre-configured with default thresholds, but it is very easy to adjust the threshold settings.

The monitor.red-gate.com website is a live copy of Redgate Monitor that allows anyone on the Internet to view the alerts and performance of our SQL Server demo instances.

Later in this article, we'll review some specific examples of these alerts, and the supporting diagnostic data that Redgate Monitor presents for each one.

Pre-defined metrics

Redgate Monitor comes with a pre-defined set of base metrics that means it can start collecting server-, instance- and database-level metrics as soon as we add a SQL Server. It also collects various server, instance and database properties, all of which are useful in identifying the root cause of any alerts raised on the monitored servers.

It interrogates the Windows server for metrics such as CPU and memory use, as well as disk space usage. At the SQL Server instance and database-levels, it uses various built-in functions, system tables, views and performance counters to collect a wide range of SQL Server and diagnostic data and properties that can expose resource usage issues (for example, high memory or CPU use), performance issues (such as long running queries, blocking), as well as specific error and problems (such as deadlocks, index fragmentation, SQL Server errors). It also collects other important information, such database configuration properties.

The following sections summarize, briefly, some of the key predefined metrics that Redgate Monitor collects. As noted earlier, and as I'll demonstrate later, we can expand this list by defining our own custom metrics.

Performance Monitor counters

Redgate Monitor tracks a wide variety of Performance Monitor counters to identify bottlenecks in CPU, memory, disk I/O, or network resources. To make this easier, Performance Monitor counters are divided into four categories:

  • Ten counters related specifically to the host machine on which the instance is running
  • Eighteen counters related specifically to each instance
  • Nine counters related to each individual database on each instance
  • Six counters related to any Availability Group of which the instance is a member

For a full list of all the Performance Counters collected by Redgate Monitor, visit monitor.red-gate.com/Analysis/Graphs

As well as viewing strategic snapshots of this data, at various places in the web interface, we can also perform custom analysis of these metrics, on the Analysis tab in the Redgate Monitor web interface (see later).

Diagnostic data from Dynamic Management Views

Alongside PerfMon metrics, Redgate Monitor tracks data held in various Dynamic Management Views (DMVs), offering deeper insight into the workload distribution across an instance, and into databases or specific 'hot' tables that are the source of long running queries and resource waits.

Top queries

Redgate Monitor sets alerts on current, long running queries, based on data from the sys.dm_exec_ requests DMV and also queries information from sys.dm_exec_sql_text and sys.dm_exec_query_plan DMVs.

Redgate Monitor uses this information to present a list of the most expensive queries, by instance and by database, plus the SQL text of the executing batch and associated query plan handle, at the current time or at the time of an alert. You can sort these by execution time, count and resources used as necessary.

Redgate Monitor also captures which resource waits have affected that query, if any.

Top resource waits

Every time a session has to wait for some reason before the requested work can continue, SQL Server records the length of time waited, and the resource on which the session is waiting. For currently-waiting tasks, it exposes the waits through the sys.dm_os_waiting_tasks DMV. Redgate Monitor mines this data to present a list of the top ten resources waits on the server over a given time (such as for the previous 15 minutes), so we can see if the most significant waits are, for example, related to log writes, or waits to acquire locks and latches, or waits on some other resource. Clicking on a significant wait type reveals any queries that were delayed by this wait, query plan details, target database and the duration of the wait.

For top-level, system-wide waits, Redgate Monitor also mines data from the sys.dm_os_wait_stats DMV, which exposes wait statistics aggregated across all session IDs, to provide a summary review of where the major waits are on a given instance.

Index fragmentation

Redgate Monitor collects diagnostic data that allows it to report fragmented indexes in any of the monitored databases, and will raise an alert at a pre-defined fragmentation threshold.

Currently, for SQL Server 2000 compatibility, it collects this data by running periodically the deprecated DBCC SHOWCONTIG command, although it will likely switch at some point soon to using the sys.dm_db_index_physical_stats function.

By either route, collecting this data is potentially resource intensive, because SQL Server will need to scan every index page, and collect statistics, each time. For this reason, Redgate Monitor only collects this data once a week, at 2AM on Sunday morning, and uses the FAST collection mode (roughly equivalent to using LIMITED mode with the sys.dm_db _index_physical_stats function).

Profiler data

While PerfMon and DMV data is very useful when troubleshooting problems, sometimes we need even more detailed information. Redgate Monitor has the ability to collect selected Profiler data, as needed. Normally, this feature is left off because it can be resource intensive to collect lots of Profiler data, even though a server-side trace is used. Generally, it is only turned on to help provide more extensive data about specific alerts. For example, if tracing has been turned on, and a specific alert fires, then you will be able to see the Profiler events that occurred before and after the alert.

The Configuration | Trace page on Redgate Monitor displays a warning icon to make it easier to see which instances have trace enabled.

Expensive system processes

As well as expensive queries running against SQL Server instances, it is often important to know what processes are running on your SQL Server box, and this feature keeps track of the most resource intensive processes running on your server, in terms of CPU time. This will help to determine if it is a SQL Server instance, or perhaps another application, that is using up valuable resources.

Basic server, instance, and database properties

For each server, instance, and database, essential properties are collected, such as the OS version, SQL Server version, Service Pack levels, and much more. This information helps the DBA stay informed about the basic configuration of the environment.

Metric Overviews and Analysis

Redgate Monitor presents the data it collects in various ways across the Overviews, Alerts, and Analysis screens that comprise the three main sections of the Redgate Monitor web interface.

Overviews

The Overviews screen presents various predefined views into the monitoring data, offering summaries at the machine right down to database level of what's happening on the system right now, or at a specified time:

  • Server / operating system level, including:

    • CPU, Memory and Network usage
    • Disk metrics – drive space used, average read/write time
    • OS properties
    • Top system processes
  • SQL Server instance level, including:

    • Instance-wide PerfMon metrics, such as batch requests/sec, page splits etc.
    • Host machine resource usage summaries
    • Instance properties – SQL Server edition, version, collation
    • Performance Diagnostics:
      • Top waits at current or specific time
      • Top expensive queries at current or specified time
      • Top SQL user processes, by CPU, in 30 seconds prior to current or specified time
    • Recent errors – last 10 error log entries
  • Database level, including:

    • Database file properties – data and log file size and location, plus log file usage
    • Database-level configuration settings – recovery model, collation, Auto create statistic status etc.
    • Performance metrics such as transactions/sec
    • Performance diagnostics – most expensive queries
  • The Availability Group level, including:

    • Status and health of the AG service, the instances and the databases
    • Rate at which logs are being transferred to the secondary replicas
    • Various statuses including the settings for Failover

On the Overviews screen, at any level, we can use the back in time feature to see a summary of the metrics at a previous point in time.

Alerts

The Alerts screen displays details of all current, un-cleared alerts for all monitored SQL Servers. Accompanying the details of each alert, Redgate Monitor provides a summary of diagnostic data collected at the time the alert was raised.

We'll look at alerts in more detail later in the article, but the supporting data includes summaries of:

  • Host machine resource usage levels
  • SQL Server activity levels
  • System processes that were running concurrently
  • SQL Processes that were running concurrently (for SQL Server instance or database level alerts)
  • Queries that used the most resources during a 15 minute period, starting 10 minutes before the alert was triggered
  • The SQL Profiler trace stack (if enabled)

Analysis

On the Analysis screen, we can plot specific metrics, namely performance counter metrics plus custom metrics, over time.

We can also display and overlay multiple metrics for a given time period, and we can create metric baselines, and compare them to usage levels for the same metric, at the time of the alert. Again, we'll discuss how this works a little later.

Redgate Monitor architecture

Before we dive into the details of how to install, configure and use Redgate Monitor, we need to take a high-level look at its design. Redgate Monitor is divided into three key components:

  • Monitoring Service (Base Monitor) – the main workhorse of Redgate Monitor. It runs as a normal Windows service, continuously monitoring our SQL Servers, collecting the alerts and performance data and storing it in the Redgate Monitor database.
  • Web Server – serves up the Redgate Monitor user interface via a web browser. We can choose to install a dedicated, standalone Redgate Monitor Webserver, or use an existing IIS web server (IIS 7 and up).
  • Redgate Monitor Database – a SQL Server database that stores all the configuration and historical data collected by the base monitor.

Redgate Monitor is an agentless architecture; it collects diagnostic data without the need to install an agent on each SQL Server instance. Instead, the base monitor connects remotely to each server, collecting the data using a combination of WMI, T-SQL, remote registry calls, remote file access, and ping.

While an agentless environment minimizes the monitoring overhead, it is still unwise to install the Redgate Monitor components on the same machine as the SQL Server instances we wish to monitor. The overhead of running the base monitor and web server, as well as writing all the data to the repository, which can grow large on a busy server, could affect the performance of the monitored server (the Observer effect). Equally, it's of little help if the monitoring service, responsible for telling us that something's failed on the server, goes down at the same time as the server.

Therefore, at the very least, we need one additional machine to host the web server, base monitor and Redgate Monitor database, although the recommended configuration, generally, is two separate servers, one to host the base monitor and web server and one to host the Redgate Monitor database. Ultimately, if we can install each of the three components on a dedicated server, it will allow us to scale Redgate Monitor to monitor the largest number of SQL Server instances.

The Redgate Monitor documentation provides an overview of the Redgate Monitor components, reproduced in Figure 1.

Figure 1 – Redgate Monitor architecture

The Supported platforms and hardware and performance guidelines section of the documentation provides full details of supported platforms for the three Redgate Monitor components, as well as supported SQL Server versions for monitoring, and supported browser versions for the web clients. It also offers hardware guidelines for the machines hosting the base monitor and Redgate Monitor database.

Redgate Monitor installation and start up

The Redgate Monitor installation process is straightforward; simply run SQLMonitor.exe on the server on which you wish to install the web server component (and possibly also monitoring service) and complete the installation steps in the wizard. The installation documentation covers these steps in detail, so here we'll just review briefly the main considerations.

You can either install the web server and monitoring service on the same machine, or on separate machines. If you choose the latter option, you'll install the web server first, and then a separate installer allows you to install only the monitoring service, on a separate server.

For the examples in this article, we used a 2-server architecture for Redgate Monitor, and so installed the web server and base monitor on the same virtual machine.

The web server

The default option in the installation wizard is to install the dedicated Redgate Monitor web server, which uses the .NET 4.6.1 runtime. If you chose this option, Redgate Monitor will install and configure the Redgate Monitor Web Service. See the documentation for the required permissions for the Web Service account.

However, the installation process will also detect if the server hosts a supported version of IIS (v7 or later) and, if so, the alternative is to add Redgate Monitor to the existing IIS server. If you choose to use IIS, the Redgate Monitor Web Service is not installed.

If IIS isn't installed, it's worth installing the dedicated SM web server, and you may still opt to do so even if the machine already hosts IIS v7 or later. The dedicated web server is much easier to set up, as you don't have to juggle any IIS permissions, and is very lightweight.

Using Redgate Monitor with IIS

If you do need to use IIS, to support a 'heavyweight' Redgate Monitor deployment, then the documentation provides details on installing Redgate Monitor with IIS 7 and later, and on how to move hosting from the Redgate Monitor Webserver to IIS, if required.

The examples in this article use the dedicated Redgate Monitor web server. By default, it uses TCP port number 8080 for incoming connections, and you can test the port assignment to ensure no other program is already using it. Here, for this reason, we assign the Redgate Monitor Webserver to port 8081.

If you want to access the Redgate Monitor web server outside your organization's firewall, you will need to configure the web server so that it can be accessed through the firewall.

Installing the Monitoring service

In this example, we simply install the monitoring service ( a.k.a. base monitor) on the same machine as the web server. The Base Monitor service uses default TCP port 7399 to communicate with the Web Server. The base monitor will need an account that has access to the Redgate Monitor database, as well as an account, which may be the same account or a different one, to access to each of the SQL Server instances and databases that we wish to monitor. The former we specify during installation and the latter post-installation, during initial configuration. The Requirements section of the documentation specifies the minimum permissions required for each of these accounts.

The base monitor can connect to the Redgate Monitor database using Windows (the default, and used here) or SQL Server authentication.

If you run Services.msc on the machine where you installed the base monitor, the name of the base monitor service is Redgate Monitor Base Monitor. If you ever see the message "Unable to detect base monitor" when starting up Redgate Monitor, check to see that the service is still running (it will stop working if, for example, we change the account password and forget to update the service to use the new one).

Set up the Redgate Monitor database

The final major step in the installation is to set up the Redgate Monitor database. This will store all the data collected from each of the monitored SQL servers. Depending on the size of your SQL Server databases, and activity on your servers, the Redgate Monitor database can grow large, quickly, and it's essential to pre-plan for this during set up.

The documentation estimates that monitoring one server, hosting a single SQL Server instance, will require between 150 and 450 MB of storage per day. Therefore, for 10 servers, we can expect the Redgate Monitor database to grow by 10 to 30 GB per week.

Decide how many weeks' worth of monitoring data you wish to retain (covered in more detail in the Configuring Redgate Monitor section, later) and then create a custom Redgate Monitor database, with initial size and filegrowth for the data and log files that will avoid excessive autogrowth events, which can cause file fragmentation.

If, instead, you decide to let Redgate Monitor create the Redgate Monitor database for you, it will create a repository database called RedGateMonitor using the defaults supplied by the model database for that instance. It then alters the database, first to set the initial size, and growth increment, of the data file to 512MB, and then to set its recovery model to SIMPLE.

For the examples in this article, we created a new SQL Server database, sized appropriately, and supplied a Windows account for the base monitor to connect to the database, as discussed earlier.

Redgate Monitor database architecture

If you would like to know a little more about the schema of the Redgate Monitor database, and how it organizes and stores the monitoring data, and alerts, try Chris Lambrou's short series of posts (Chris was one of the Redgate Monitor developers).

Initial start up

With the installation process complete, all we need to do is launch Redgate Monitor. The first time we do so, create the administrative role for the Redgate Monitor web interface and then log in, and start adding the SQL Server machines and instances that we wish to monitor.

The Administrator user has full access to all Redgate Monitor features. Subsequently, the administrator can create less-privileged user roles (covered in the Configuration section, a little later in the article) for users who only need to view and analyze the monitoring data.

Licensing and activation

After installing Redgate Monitor, you will also need to enter licensing information and activate it. See the Help file for instructions on how to do this. You don't have to enter the licensing information immediately, as Redgate Monitor will work as a free trial for 14 days.

Accessing Redgate Monitor

We can access Redgate Monitor from any web client, simply by supplying the URL with the address and port number for the web server component:

http://.:8080

On the local machine hosting the web server, we can also launch Redgate Monitor from the Start menu.

Adding an instance to Redgate Monitor

In order to start using Redgate Monitor, we just need to add some SQL Servers for it to monitor. Having done so, it starts monitoring immediately, based on a pre-defined set of base performance metrics and alerts.

We add new server and instances in the Configuration | Monitored Servers section of Redgate Monitor (we can also reach this page from the left hand menu of the Global Overviews page)

To add a new server, simply click on Add SQL Server.

Figure 2 – Adding a SQL Server instance

We need to supply the following information:

  • The path to SQL Server host machine or Cluster we wish to monitor – the fully-qualified name
  • A Host machine account – used by the base monitor service to connect to, and gather server-level data from, the machine hosting the SQL Server we wish to monitor
  • A SQL Server instance account – used by the base monitor service to connect to the and gather data from the SQL Server instance

By default, Redgate Monitor will use the Base Monitor service account, which we set up during installation to allow it to connect to the Redgate Monitor database, for both the host machine account and the SQL Server instance account.

However, it is a best practice to create separate, dedicated domain accounts for each credential. The host machine account will generally have local admin rights to the server where the SQL Server instance is located, and the SQL Server instance account will generally have SQL Server sysadmin rights, although in neither case is the use of such privileged access mandatory (see Requirements).

As we add new servers over time, the Monitored servers page will provide a full list of all monitored servers and instances, reporting their current status and allowing us to perform actions such as edit credentials, suspend monitoring, retry a connection, and set a maintenance window.

Active Directory integration

Redgate Monitor is integrated with Active Directory to help manage and control access to the monitored SQL Servers. If certain users/groups are not allowed to view a certain server then that server won't appear in Redgate Monitor when viewed by those users.

On the Configuration page, click Authentication Settings and you'll see options to log in to Redgate Monitor using either Redgate Monitor credentials, or Active Directory credentials. If you choose the latter, you'll need to supply at minimum a domain name and credentials for a service account in Active Directory.

Having done so, tested the connection and saved the settings, you'll see the message in Figure 3, where we need to confirm the AD user or group that will have administrative access to Redgate Monitor.

Figure 3 – Adding an AD user or group as an Administrator

We can then use the Configuration | Manage Users page to create lower-privileged AD users and groups that have access only to specific servers.

Figure 4 – Adding a standard user or group with access only to specific servers

If permissions conflict, the least restrictive is taken. For example, if a user is a member of an AD group that doesn't have access to a certain server but also a member of an AD group that does have access, then Redgate Monitor will give that user access.

Working with Redgate Monitor

Redgate Monitor allows us to track alerts and analyze associated monitoring data across three different screens, as described previously:

  • Overviews – snapshots and summaries of monitoring data, describing what is currently occurring on the system, or what the system looked like at a specified moment in time.
  • Alerts – detailed information about alerts that Redgate Monitor has generated. We can also manage those alerts, marking them as Read (or Cleared) in order to keep the alert list clean, and so maintain a strong signal to noise ratio
  • Analysis – tools to analyze the gathered metrics over time, establish baselines, and compare different metrics.

Combined, these tools offer a degree of proactive control over the performance of our database systems, as well as the ability to respond to emergencies as they arise.

Tracking alerts

Having installed Redgate Monitor, and added some SQL Server instances, it automatically begins to collect data and raise alerts.

Redgate Monitor raises an alert in the following situations:

  • Every time the event related to the alert occurs – such as a deadlock occurring, a job failing, or an instance becoming unavailable
  • When the defined threshold level for the alert firing is breached – for example, a query runs longer than the threshold value, or available disk space drops below a threshold value, or CPU usage exceeds a threshold value for a longer than a set duration

We can configure each alert, setting the desired threshold levels appropriate for our environment, or even disabling it, if required. We'll cover alert configuration in detail later in the article.

When Redgate Monitor raises an alert, it stays raised until someone with proper permission clears it. It is very important that the team are diligent in clearing alerts that they have investigated and resolved, or that simply don't require immediate attention. In this way, the list of alerts Redgate Monitor displays will always provide an excellent place to begin troubleshooting.

Global Overview

The first screen that comes up upon starting Redgate Monitor is the "Global Overview" screen, as shown in Figure 5.

Figure 5 – The "Global Overview" screen in Redgate Monitor

Alerts screen

We can drill down into the details of the alerts either by clicking on the number of un-cleared alerts listed for a server or instance, or by switching to the Alerts screen. In this example, I clicked on the five un-cleared alerts listed for the local instance, as shown in Figure 6.

Figure 6 – The Alerts screen

The severity of the alert runs from Low to Medium to High, with the latter being alerts that should be of immediate concern, as they are the ones most likely to lead to outages or major performance problems.

Troubleshooting alerts (Deadlock example)

Simply by clicking on each grouped alert, we can see the list of alerts in that group and, by clicking again on an alert, see the full details for each alert, when it occurred, and on which database or instance, along with snapshots and summaries of resource usage and activity levels on the host machine, and SQL Server instance, at the time of the alert, as well as details of SQL Server queries and processes, or other server-level processes, which were running concurrently.

To give you a feel for the troubleshooting process, let's drill deeper into one of the three alerts shown in Figure 6, namely the Deadlock alert.

Clicking on any alert takes us to the Details tab for the alert, as shown in Figure 7.

Figure 7 – The Details tab for a Deadlock alert

At the top, for any alert, we see the time the alert was raised (and ended, if relevant). Below that we can see the current status of the alert. In this case, the alert is unread and has not yet been cleared. We can simply click those boxes to changes its status. We can also add useful comments, perhaps regarding resolution of the problem.

In the body of the Details tab, we see the details of the alert. Obviously, the information presented here is specific to the alert. In this case, we see details of the interactions of the sessions and objects involved in the deadlock in a handy flow chart.

Troubleshooting Deadlocks

Alongside the Details tab are five other tabs that provide further useful information about the alert. The Output tab is alert-specific, but the others are common to all alerts.

  • Output – displays the entire deadlock graph
  • Comments – we can add comments regarding alert resolution
  • Alert History – the lifecycle of the current occurrence of the alert
  • Occurrences – how many times Redgate Monitor has raised this alert, for the current object
  • Description – useful background information on troubleshooting the alert type

Scrolling down further, we reach the Performance Data section. On the Host machine tab, Redgate Monitor provides a snapshot of the values for various server-level Performance Counters, captured around the time that the alert fired.

Figure 8 – Performance metrics capture at the time the alert fired

The gray bar in each graph indicates the time that Redgate Monitor fired the alert. A green bar, if there is one, indicates when the alert ended. Note that these bars represent the time that the alert fired and ended, not the exact time that the underlying queries started and ended, although they are a close approximation. An alert won't fire until its threshold is reached, and this will be some time after the query began. Similarly, the SQL Server tab provides snapshots of performance counter metrics for the SQL Server instance.

For even more information, the System processes tab lists concurrent system processes, and SQL processes / Profiler trace identifies the processes running at the time of the alert, plus the trace stack, if you have trace enabled. In this case, this tab will identify who and what was running the session that caused the deadlock (OK, I confess, it was me, in SSMS, armed with a nasty, long-running UPDATE query).

Finally, we can see the Top queries from around the time of the alert. By default, Redgate Monitor lists them in order of duration, but by clicking on the desired column (such as execution count), we can reorder the results by that column.

Clicking on the long-running query will reveal the full text of the query, its plan handle, the target database, and the query duration. Clicking on the arrow, below the query text will show the query in the context of its containing batch. This is especially useful when the query is run as part of a stored procedure, as it will give you the name of that procedure

Figure 9 – Top queries running at the time an alert fired

Armed with knowledge of the queries involved in the deadlock, we can take steps to resolve the problem, to ensure it does not recur. This might involve query tuning, adding a new index, or modifying an existing one, or some other standard mitigation for deadlocks.

Clearing alerts

When we start to work with an alert, Redgate Monitor marks is automatically as Read. This means that other DBAs or data professionals looking at the Alert won't also start to troubleshoot the issue. Having resolved the problem, it is very important to mark the alert as Cleared.

A full list of pre-configured alerts

The documentation provides a complete list of alerts, along with the default threshold settings for each one and the frequency at which Redgate Monitor collects data for the related metric.

The basic troubleshooting process is similar for every alert, in that Redgate Monitor will reveal both the relevant alert details, and performance data captured around the time the alert was raised, which the DBA can use to diagnose the cause. For example, a Fragmented Indexes alert will identify the indexes, their size and degree of fragmentation, a Disk Space alert will identify the space-challenged disk and summarize the size of the databases that store data on that disk, and so on.

Set up email notifications for alerts

Immediately after an instance is added to Redgate Monitor, it begins to collect data and report any alerts, on the Alerts screen. For important alerts, it's advisable to set up email notifications so that the relevant DBA receives immediate notification. Each email contains much of the information on the alert details page. We can configure alert emails to be sent when alerts are first raised, when they are escalated (from low to medium or high severity, for example), and when they are closed.

First, on the Email Settings page of the Configuration section, enter the default email address to which to send notifications along with the mail server settings. Alternatively, we can disable all email notifications, if they are not required.

At Configuration | Alert Settings, we can override the default notification address for individual alerts, as required, or simply disable notifications for that alert by selectingSpecify an email recipient, and then unchecking the Send Email notification to: box.

Redgate Monitor's alerting feature works with PagerDuty, which allows email messages to be sent via phone and SMS. This is a third-party cloud-based application subscription service.

Using Overviews

On the Overviews page, Redgate Monitor presents various aggregations and summaries of its collected diagnostic data, offering a snapshot of the system activity and resource usage at the current time, or at a specific point in time. Redgate Monitor presents relevant data at each level, as we can drill down from the server level, to the SQL Server instance and database levels, as summarized earlier in the article.

At the SQL Server instance level, Redgate Monitor displays in the Performance Diagnostics section various PerfMon counter metrics for the instance, and also snapshots of data mined from Dynamic Management Views, presented as Top queries that were executing at the specific time and the Top queries captured over the past, say, 15 minutes.

We're going to focus on the wait statistics, because they offer a great way to understand what is causing the system to slow down, right now.

Wait statistics

A frequent occurrence for a DBA is to receive a phone call informing him or her that that the server is "running slow" now, or was running slow "just before I went to lunch". With no more specifically-defined issue on which to base their investigation, a good starting point is to review the wait statistics for instance, over the reported period of slowdown.

In order to do this, generally speaking, the DBA would need to be capturing and saving regular snapshots of the sys.dm_os_wait_stats DMV, and then comparing them to find out 'what changed' over the slow period. A tool like Redgate Monitor takes care of this "heavy lifting" for us, and presents a "top waits" view of the data, as shown in Figure 10. By default, Redgate Monitor sorts the waits by duration ( i.e. wait time).

Figure 10 – Current Top waits

On my instance, the top four entries show a number of tasks waiting for a long cumulative period, and only using up a certain amount of CPU (as evidenced by the comparatively low Signal wait time).

For all of the common waits, Redgate Monitor will display helpful descriptive text. In Figure 10, the more interesting are the waits after the first, and in particular the PAGELATCH_SH wait type. The data shows that 6,598 tasks waited for this resource (i.e. waited to acquire a shared lock on a data page that is in the buffer pool), over the previous 2 hour period.

This suggests a possible issue, so let's drill deeper by clicking on this wait type. The first point to note is that on clicking on a specific wait type, below the list of waits appears detailed information about that wait type, what often causes it and possible diagnostic routes.

Figure 11 – Redgate Monitor provides detailed descriptions and troubleshooting advice for each wait type

In the main part of the screen, for the selected wait type, Redgate Monitor displays any queries that experienced this type of wait over that period.

We can click on the query to reveal more information about the source of the problem, the query involved, the plan handle to pull the plan from cache, and other waits associated with the query.

The Rewind Time feature

Sometimes the phone call you get is about a problem that occurred yesterday. While Redgate Monitor maintains and displays real-time data, it also stores past data in a database, which means that we can roll back to see what was happening at a particular point in time. This is great for diagnosing problems that occurred when you were not around. By default, data is stored for 1-2 months, but this is configurable (see later).

To rewind to a previous point in time we use the settings available at the top of any Overviews page.

Figure 12 – The Rewind Time feature

Use of the control is largely self-explanatory. We can click the calendar to choose a specific date and time, or simply type in a date and time (using a 24 hour clock). Alternatively you can drag the markers on the timeline below to set the exact period you want to analyze.

Further down the screen, we'll also find the top waits, expensive queries, and so on, for the chosen time. We can go as far back in time as we have data stored in the Redgate Monitor database. Having completed analyzing the previous point in time data, re-click the 'skip forward' button to return to the current time.

Analyzing performance metrics

We use the Analysis screen to build and display graphs of various metrics over different time periods, and to perform baseline testing. This is the go-to location for long term analysis based on the performance counter metrics collected over time by Redgate Monitor. We can use this information for trending, historical comparisons, and other types of troubleshooting and reporting.

By default, the Analysis screen will display a graph showing a single metric, Machine processor time (total CPU utilization) for the server first in the list of monitored servers, over the last hour.

Figure 13 – The default Analysis Graph

The x-axis displays the time range and the y-axis the metric value; of course, the y-axis will display a variable range based on the types of metrics selected.

We can adjust the time range using one of the pre-set options, such as "Today" or "Last 24h", or we can simply use the time range and calendar options. Using the forward and back buttons will display the metric values for the same internal on the previous or next day.

Hovering the mouse pointer over the graph will reveal the exact metric value for a specific point in time.

Figure 14 – Viewing the metric value for a specific point in time

Below the graph is the selection box to allow us to add further metrics to the graph, and so compare different types of activity over the same time period. For server-level metrics, such as processor time, we can choose the server for which we wish to display the metric. For SQL Server instance- and database-level metrics, we can display them for all databases on a specific instance, or for a specific database.

Let's see how that works in more detail.

Multiple metric graphs

Consider a situation where the wait statistics are showing high WRITELOG waits, at certain times, for the tempdb database, and we want to find out the root cause.

Redgate Monitor collects several PerfMon counters related to log activity for a database, so we start by displaying the Log flush waits/sec metric just for the tempdb database, over the last two hours.

Figure 15 – Viewing a SQL Server metric for a specific database

Next, click the button Add another metric and add the Log flushes/sec metric. Then double click (As Above) for the server selection, Redgate Monitor will automatically display the new metric for the same server, instance and database as the previous Log flush waits/sec metric.

Figure 16 – Plotting two metrics over the same period

It's worth noting that, if we change the server, instance or database selection in the first metric, then it will also change for all subsequent metrics for which we selected As Above. It makes it very easy to compare the same set of metrics for different databases.

On the current graph, we can see that the y-axis now display no scale, since the scale for each of the two metrics is different. Redgate Monitor simply scales them proportionally within the time frame within the selected time range. Remember that we can still hover over each graph to see absolute values for each metric, at a specific point in time.

In any event, what we're interested in here is not so much absolute values as corresponding patterns in behavior between different metrics. A log flush occurs, writing log records from the log cache to the physical log file, every time a transaction commits and during CHECKPOINT operations. By plotting the two metrics together, we should see if there is any correlation, i.e. whether 'spikes' in the number of log flushes per second correlates to increased numbers of log records waiting to be written to disk. In this case, we see no correlation between spikes in the number of log flushes and spikes in the log flush waits.

Let's add a third metric to the graph, Log space used.

Figure 17 – A three-metric graph for the tempdb database

Now, we see a correlation between the behavior in the Log space used and Log flush waits/sec metrics. For the tempdb database, periodic drops in the log space used correspond closely to spikes in log flush waits.

As you can see, combining metrics in this fashion allows us to spot correlations in metric patterns.

Using Baselines

While comparing various metrics over a given time period is useful, often what a DBA really wants to know, in response to hearing that a server was "running slow" at some random time, is "what changed?" What was happening on the system at that time that was different from "normal"? That's where baselines come in.

To see baselines in action, I'm going to display some metrics for a different, busier SQL Server. Once again, we start with an analysis graph showing a single metric, Machine: processor time, for the selected server, this time for a 24 hour period.

Comparing to a baseline

Clicking the Compare baseline button automatically adds to the graph a second set of values for the same metric, over the previous 24 hour period, therefore providing a baseline, or comparison point. In Figure 18, the dark lines represent the current time range of 1 hour of data and the lighter lines represent the previous hour's worth of data.

Figure 18 – Comparing one hour to the previous hour

Extending Baselines

The Extend Baseline button allows us to extend the baseline comparison a set number of previous time ranges. So, for example, if we are initially comparing a 1 hour time range to the previous hour, then extending the baseline by 7 time ranges will add to the graph the same 1-hour time range for the previous 7 hours.

The example in Figure 19 compares the current hour with the preceding hour, and extends the baseline by 7 time ranges, with each line being fainter the further back in history.

Figure 19 – Extending the baseline to cover 7 hours/p>

We can change the time ranges, or we can modify the data selected, based on the information stored in the Redgate Monitor database. Obviously, we can't request data from counters that weren't collected, or for time ranges that have been purged from the Redgate Monitor database.

For an even clearer picture of what "normal" looks like, we can display averaged regions, for the baselines, rather than separate lines. To do this, simply click the Regions button at the top of the graph.

Figure 20 – Comparing behavior to averaged baseline regions

Redgate Monitor still displays the current data as a line, so we know what happened in the current period. The rest of the data has been averaged out and smoothed down.

Essentially, the Regions view is based on the same data, but shows the 0th, 1st, 3rd and 4th quartiles, with quite strong Loess smoothing applied and the area between 1st and 3rd quartiles (the 'mid-range' values) shaded dark blue, and the area between 0th and 1st, and 3rd and 4th (the low- and high-range values) shaded a lighter blue.

Baselines provide an excellent mechanism for making comparisons over time so that we can understand how our system is currently behaving compared to how it behaved previously. The knowledge generated by the baselines allows us to spot trends and identify outliers in order to assist in both administration and troubleshooting of our servers.

It's possible, with a little patience, to write T-SQL or PowerShell scripts to collect some or all of this data "by hand", for example from the various system views, performance counters, Dynamic Management Views, and so on. We would need to collect baseline measurements for each key metric and then capture the data into a table at regular intervals. Erin Stellato in her Capturing Baselines series on SQLServerCentral.com explains how to start collecting this data manually.

The problem however is that writing and maintaining the scripts to capture the required data, the scheduled jobs to capture it, and the tables to store the data becomes, itself, a big task and one that can detract from the more important task of analyzing the data to spot problems on our SQL Server instances. A tool like Redgate Monitor will remove a lot of this heavy lifting.

Monitoring availability groups

Availability Groups (AG) are now a fundamental aspect of the High Availability and Disaster Recovery (HA/DR) functionality within SQL Server.

They are maintained at the database level, not at the server level, so they bring with them different types of monitoring functionality. Further, the HA/DR nature of AG means that you have to monitor differently on the Primary node, the currently active instance maintaining your database, and any of the Secondary nodes, the available backup instances.

Redgate Monitor provides all the additional functionality necessary to monitor your AG for up time, availability, current status and performance with appropriate monitoring, alerts and metrics, on both the Primary and Secondary nodes.

When you add in an Availability Group, the Global Overview screen lists the AG along with the servers and instances contained within that AG.

If you click on any of the servers or instances, you'll see the standard overview screen for that server or instance. If you click on the AG itself, you'll see an overview, specific to the AG, as shown in Figure 21.

Figure 21 – Overview of metrics and alerts for a specific Availability Group

There's quite a lot of information displayed on this screen, so we'll break it down in more detail. On the left, you have the basic status and health of the AG service itself, the instances and the databases. You can also see any AG-related alerts that have been fired, as well as the current status of synchronization.

You can also see the rate at which logs are being transferred to the secondary replicas, and you can see various statuses including the settings for Failover including Synchronized and Automatic, Synchronized with No Data Loss, and Asynchronous allowing Data Loss.

We can drill down into each of the items on this screen. Clicking on the servers and databases will open the standard overview pages for those object types. Clicking on an individual alert will open the Alerts page. The spark lines for the Redo Queue, and others, will open the Analysis page.

For an AG service, you're going to want to ensure that you monitor both the primary and the secondary instances. With that you can then be sure to look at Analysis of the metrics that shows both the load on the primary, but also how well the primary is interacting with the secondary instances as it sends across the transactions and validates their safe arrival.

When you add a cluster that's part of an Availability Group, Redgate Monitor automatically detects which instances are primary, and which are secondary. All of the instances are then added.

All the information within the Analysis screen for the AG works exactly the same as for the Analysis screen for of any other set of counters. The Analysis screen is useful here because we can add counters from the Primary and from Secondary servers, as you can see in Figure 34. This allows you to see how the different servers are behaving.

Redgate Monitor offers a number of Alerts for Availability Group management, as shown in Figure 22.

Figure 22 – Availability Group-specific alerts

All this functionality will, collectively, allow you to manage your AG instances much more effectively and proactively.

Customizing alerts and metrics

After some time working with a monitoring and alerting tool such as Redgate Monitor, we begin to get a good feel for the level of activity, and subsequent alerting, on our monitored servers, as well as for how much monitoring data we're collecting, and the growth rate of the central repository for storing it.

We'll start to understand that certain alerts are either unnecessary in our environment, or require different threshold settings. At the same time, we may discover that our database suffer from particular problems for which there are no pre-defined alerts.

As discussed, Redgate Monitor is highly configurable, and it is very easy to change alert settings, and to add new custom metrics and associated alerts.

Tuning alert settings

To customize any alert, either to turn it off or change its threshold settings, navigate to the Configuration page in Redgate Monitor and select Alert settings to reveal the full list of host machine and SQL Server alerts.

To change the settings for any alert, simply click on it. Figure 36 shows the configuration page for the Long-running query alert.

Figure 23 – Customizing alerts

We can turn an alert on or off, set the thresholds for when different types of alerts will fire and, for this particular alert, specify particular SQL process names or queries that we want to exclude from the alert. For example, let's say that a once a month, a report runs that typically takes 30 minutes. Since we know this is always the case, and if there is nothing we can do about it, we can exclude it so that we won't be alerted every time it runs.

At the bottom of the screen we can specify if and where to send email notifications for this alert, as well as whether to send notifications to Slack and SNMP Traps.

Adding custom metrics

Creating custom metrics is a powerful feature and it is very easy to add one to the default metrics already built into Redgate Monitor.

Simply navigate to Configuration | Custom metrics, click on the Create custom metric button and use the three page wizard to:

  1. Define the metric.
  2. Optionally create an alert on the metric.
  3. See a summary of the options you have selected as last review before it is created.

Define the custom metric

To start the Custom metric wizard, click Create custom metric. Figure 24 shows the top half of the screen. In this example, we're creating a custom metric alert that fires when a database file increases in size. It's a good practice not to rely on auto-growth to grow data files. Instead, we should size the data files at time of database creation to cope with the initial data load plus predicted growth over a reasonable period, and then grow the files manually, as required. With this practice in place, it's wise to set an alert to notify us when a data file grows, because it indicates that the database has grown much faster than predicted, or that someone else has grown the file manually.

Figure 24 – Defining a custom metric to monitor data files growth events

We give the metric a name, a description, and then enter the Transact-SQL code that we want to run to collect the metric data. It is a good idea to create and test this Transact-SQL code in SSMS, first, to ensure that it not only works, but that it is as lightweight as possible. We want this code to run as fast as possible, using the least possible amount of server resources.

In the bottom half of the same screen, we specify the servers, instances, and databases for which the metric should collect data, and how often.

Figure 25 – Defining instances to which the alert applies, and data collection frequency

In this example, the alert applies to all user databases on selected SQL Server instances, and will collect data once every 30 minutes.

Data collection frequency

The more often a custom metric collects data, the more resources it will need. Collect data only as often as is truly necessary.

On this screen, we also specify whether Redgate Monitor should display the collected value, or calculate a rate of change between the current and previous value. This option is often needed for counters that increment continuously whenever SQL Server is restarted.

Notice near the middle of Figure 25, a button called Test metric collection…, which we can use to test that the metric works as intended. I highly recommend you perform this test to ensure that the metric returns exactly the data you expect.

Add a custom alert

On the second page of the wizard, we can create an alert associated with our metric and set the threshold values on which it will fire, and the severity of alert to raise at each threshold.

Figure 26 – Defining an alert for a custom metric

Since the purpose of the alert here is simply to notify us of any growth event on any database, Figure 39 simply specifies a High severity alert if a data file grows by 1 MB/hr. Alternatively, if we were actually interested in the growth rate, we could set different severity alerts depending on the rate.

Create the custom metric and alert

The final step is simply to review the custom metric before creating it. If everything looks OK, click on Create metric and alert, and they will act just like any of the built-in metrics and alerts, and will begin working immediately.

The SQLMonitorMetrics website

Redgate maintains the sqlmonitormetrics.red-gate.com website, where users can submit, comment on and help improve a community-maintained set of custom metrics, all of which are tested thoroughly by the development team at Redgate, before posting to the site.

On the site, you'll find ready-made and tested custom metrics in categories ranging from auditing to detecting I/O issues to replication and security. If you find a metric that would be useful in your environment, simply click on Install metric to install it automatically in Redgate Monitor. Likewise, if you've developed a metric you think others would find useful, submit it for consideration for inclusion on the website.

Other Maintenance and Configuration options

Although we can start using Redgate Monitor "out of the box", most DBAs will want to perform some minor additional configuration, including:

  • Determining how often to purge historical data
  • Setting Maintenance windows to avoid alerting during these periods
  • Organizing monitored servers into logical groups
  • Assigning users different access permissions depending on their requirements

We do all of this work via the Configuration section Redgate Monitor's web interface.

Purging historical data

A very important consideration is how much historical data to store because, over time, Redgate Monitor will store a high volume of monitoring data, both host machine data and SQL Server data, as well as basic alert data and SQL trace data, if enabled.

Redgate Monitor offers granular control over the length of time to retain different categories of data, defaulting to between 1-2 months for various types of machine data, 1-2 months for various types of SQL Server data, 1 month for basic alert data and 3 days for SQL trace data.

In most cases, you'll want to keep a month's worth for most types of data, which will make it easier to view trends over the typical business cycle of a month.

To customize the default data purging settings, go to Configuration | Data purging.

Configuring groups

We can group monitored SQL Server instances into logical groups, such as "Production" and "Development", or into whatever groups make sense in your environment.

Having done this, Redgate Monitor will:

  • Display the related instances together
  • Filter and configure Alerts for all of the instances in the group at one time, which can save work

To create a group, navigate to Configuration | Groups.

Defining user roles

We can assign users to different user roles depending on their required level of access, as follows:

  • Administrator – Full access to all servers and options.
  • Standard user role – Access limited to the servers you specify; restricted options.
  • Read-Only user role – c– Limited read only access to the servers you specify; restricted options.

We create the administrator role on initial startup, and if you're the only DBA who needs to access Redgate Monitor then you won't need to create additional roles. To allow others access, but with restricted privileges, simply navigate to Configuration | Manage user roles, define passwords for the other roles, and supply them to the users who require access. If you opt to use Active Directory authentication, you'll probably only create the administrator role, which will be needed if you ever need to revert to Redgate Monitor authentication. Other roles will be managed through Active Directory group permissions.

Setting maintenance windows

Most SQL Server instances will have maintenance windows where a lot of additional server activity will occur, such as backups, rebuilding indexes, and running DBCC CHECKDB, all of which can place an extra burden on the server's resources, and could trigger some performance-related alerts that are essentially "false positives".

To avoid this situation, we can set a maintenance window for each monitored server, which tells Redgate Monitor not to trigger any performance, but not operational, alerts during this time interval, although monitoring still continues.

To set a maintenance window for the host machine, navigate to Configuration | Monitored servers. Select the blue cog next to the server or database you need and choose Set maintenance window from the drop down.

Custom Redgate Monitor reporting

In addition to analyzing metrics directly in the Redgate Monitor web interface, we can also build SQL Server Reporting Services (SSRS) reports that access the Redgate Monitor database directly. Redgate provides a free SSRS Pack to help in getting started, which includes ready-made SSRS reports on CPU and memory usage, database growth trends, general server properties, and more.

Summary

The main point of any monitoring effort, whether homegrown or using a dedicated tool such as Redgate Monitor, is to be able to spot problems as quickly as possible, and respond to them before they affect too many people, and to avoid certain problems by predicting when they are likely to occur and taking evasive action.

Our overall goal is to troubleshoot the issues we discover and so improve the performance of the servers and the applications that use them, and to start to see a downward trend in the number of alerts raised, and optimization of resource usage across the servers.