Using the SQL Monitor Estate Pages
Kathi Kellenberger introduces the new Estate monitoring pages in SQL Monitor 9, and explains how they will help DBAs monitor and safeguard the security, stability and availability of all servers in their care.
As a DBA responsible for the availability, performance, security and overall health of a set of SQL Servers, you have many plates to keep spinning. You’ll need to respond quickly to avert or resolve query performance and resource contention problems. Each morning as you arrive at your desk, you’ll also need to cast a watchful eye over all your scheduled monitoring tasks. Did all the scheduled jobs run? Are all databases being backed up? Will any storage volumes run out of space soon? Are the latest cumulative updates applied?
SQL Monitor collects all the data you need to answer these questions. It provides easy ways to track a set of metrics, for each server and each database, and will allow you to create metric graphs, compare to baselines, and so on. It has configurable alerts that will warn you if a backup fails, disk space is running low, or a job runs long, or fails to run at all.
However, when you need to manage a large estate of servers and databases, spread across a diverse range of platforms, you need a bit more help. On top of the standard set of metrics and alerts for tracking issues for each server and database, the Estate pages in SQL Monitor 9 and later make it much easier to retain a simple picture of the overall health of the whole estate.
There are currently four Estate pages (Installed Versions, Disk Space, Backups and SQL Agent Jobs) and each page provides a concise, graphical summary of the current landscape for that metric. They will allow you to identify problems, bottlenecks and stress-points quickly. For resource metrics such as disk space, SQL Monitor will not only show current levels of consumption, but also predict the future value, based on the trend. This allows you to spot potential problems early, while you have time to plan the best response, rather than simply firefight alerts as they occur.
Estate-wide summaries, graphs and projections
Database administrators are often responsible for dozens or even hundreds of SQL Server instances, often spread across multiple data centers, Azure VMs, or even Azure SQL Databases. The number of databases can easily be in the thousands. Even with a tool to do the data collection and alerting tasks, keeping on top of all the issues, across all your servers, can be a challenge. To make matters even more complicated, DBAs often support a wide range of SQL Server versions with multiple settings and assorted scheduled processes on each one. At a certain point, your life as a DBA starts to resemble a circus, or at least a juggling act, and you’re likely to start dropping some balls.
If you’re a small team who must monitor a large estate, you need an easy-to-digest picture of resource use, activity and any issues, across all your monitored servers. This is what the Estate pages intend to provide. I’ll demonstrate how to use each of the current four Estate pages, Installed Versions, Disk Usage, Backups, and SQL Agent Jobs as shown in Figure 1.
Figure 1: The Estate pages
Expect more pages to be added over time. If you want to review them yourself, but don’t yet have SQL Monitor running in your company, you can take a peek at ours at https://monitor.red-gate.com. Don’t worry; you can’t break anything!
Installed Versions
The Installed Versions page shows details of the versions of SQL Server installed across your monitored estate, which cumulative updates or service packs are applied and whether any further updates are available, and more.
A fundamental responsibility of the operations team is to protect their systems from any bug or vulnerability that could compromise the data. As a first line of defense, this means applying all available cumulative updates and service packs, promptly, and any containing important vulnerability fixes, immediately.
Ideally, you’ll have a pre-production server, configured as closely as possible to production, where you can apply these patches and test their impact on the applications, to make sure that the patch doesn’t break anything. If you’re aware immediately of an available update, you’ll have time to plan all of this so that any scheduled outage causes minimal disruption.
At the top of the Installed Versions page, you’ll see a simple report showing which versions of SQL Server you have installed and how many of them have available updates.
Figure 2: The Installed Versions summary
You can use the filters above the report, to limit it to certain groups of servers (such as only production servers), SQL Server versions, and statuses. You can also type in a server name.
Below this, you’ll see a full list of all the instances along with the version, edition, current patching level, a link to the updates, and the date that mainstream support ends. You can also filter and sort this list.
Figure 3: The instance list
Sure, you could do all this the old-fashioned way, with an Excel spreadsheet, or with some scripts, but having SQL Monitor gather it all for you, and link to the patch downloads, will save you countless hours.
Disk Usage
Your manager and the storage team do not want to be notified at the last minute that you urgently need more disk space to be allocated to a server, as it could take time to order and configure new storage. As a DBA, this is another task that can keep you busy gathering the information and trying to figure out the current usage and predict future growth. The Disk Usage page should make it a lot easier to monitor disk space, and plan capacity requirements, across all servers.
The top of the Disk Usage page has a graph and a summary for current and projected disk space consumption across all monitored servers. You can filter by group, disk letter, or server name.
The top line on the right of Figure 4 is the total disk capacity, and the bottom line is the projection.
Figure 4: The Disk Usage graph
The new Estate menu in SQL Monitor includes a Disk Usage feature that shows the recent growth in disk space use, across the monitored estate, and projects the growth trend over the coming months.
The summary chart gives you all the figures, and the warning triangle indicates within the growth forecast, some disks volumes will exceed their capacity.
Figure 5: Estate Summary for disk space
Scrolling down, you’ll see a server list with individual disk volumes, space used, capacity, percentage used, projected use over a year, projected change, and time until full.
Figure 6: The Disk Usage report
If you click on a drive letter it brings up a detailed report for that volume, with a chart showing a breakdown of Free space, Allocated SQL (space allocated for SQL Server data or log files) and Used SQL, over time. You’ll also see a chart with the used and free space for all the drives and the current status.
Figure 7: The Disk volume report
Scroll down and you’ll see more information about each database file on the volume, including how much space is used inside each file.
Figure 8: Disk space used per file, on a given disk volume
With a single view over the changes in the free space on the storage, which the Disk Usage page provides, and an understanding of the underlying rate of growth of the databases, the need to increase capacity will become a much more predictable event. Of course, unexpected events like a runaway query filling up tempdb
happen occasionally, but this feature will go a long way in helping you manage capacity.
Backups
SQL Monitor has built-in alerts to warn you if a backup job (or any other job) fails, or fails to start, or performs erratically. However, if you’re managing thousands of databases, it’s easy to miss things. It’s easy to forget to add a new database into a maintenance process, and you don’t want to find that out when it’s time to do a restore.
Each database will have a backup routine, largely dictated by the RPO and RTO objectives established for it. If it’s a development database, it might just need regular full backups, but frontline business databases may require a more complex scheme comprising full and differential database backups and frequent log backups.
The summary graph for the Backups page, shows the current recovery point for all user databases (you can include or filter out system databases, as required). In this example, we have 2 databases that have not been backed up for between 15 minutes and 1 hour, meaning that if these databases went down now, that’s how much data you might lose, on recovery. There are 8 databases that stand to lose between 12 hours and 1 days’ worth of data, and so on.
Figure 9: The Recovery Point chart
After scrolling down, you’ll see quite a bit of information about recent backups for each database, including statistics about the last full, differential, and log backups, the recovery model, and the worst recovery period in the last 30 days. Being able to sort by Recovery Model is quite helpful, because it’s easy to accidently leave a new database in Simple mode when it needs to be in Full.
Figure 10: The backup report
Incidentally, the red icons indicate that the backups were taken using SQL Backup. The report shows just the most recent backups, but by clicking the arrow next to a database name, you’ll see the Backup History as a chart, and a report for that database.
Figure 11: The Backup History
If you scroll over to the far right, you’ll see a little symbol, which you can click to be taken to the server overview corresponding to the time of the backup.
Figure 12: The Server Overview link
If one of your backups took an unusually long time to run, you’ll be able to see if the server performance looked out of the ordinary or if anything unexpected was going on at the time of the backup. Making sure that every database is set in the proper recovery model and being backed up as expected is a big, tedious task. Thankfully, SQL Monitor will do some of the hard work for you.
SQL Agent Jobs
In a diverse SQL Server estate, each server is likely to have various custom jobs, tailored for the business processes supported on that server. The SQL Agent Jobs page keeps track of the success and failure of the scheduled jobs, as well as the duration of the last run, and when the job is next scheduled to run. Like the other areas, this one starts with a filterable summary as shown in Figure 13.
Figure 13: The SQL Agent Jobs summary
After scrolling down, you’ll see two sections. The top is for the jobs which have recently failed as shown in Figure 14.
Figure 14: The Failed Jobs report
Figure 15 shows the successful jobs.
Figure 15: The Successful Jobs report
By clicking the arrow next to a job, you will see an Execution History timeline graph and a report of the latest runs. Figure 16 shows the Execution History.
Figure 16: The Job Execution History
There is also a link next to each line of the history so that you can jump to the server activity graph for the time the job ran, if you need to investigate a long running or failed job. If you are a DBA, you must keep track of the status of scheduled jobs. As a DBA, this page is the first one I look at every single morning!
Conclusion
For a small handful of servers, it might seem easy enough to manually check the scheduled jobs each day or write a script to gather some of this information. Often, important metrics like disk space use are not reviewed until there is a problem. As the number of servers and databases grows, it is even more difficult to collect and manage all the required data, let alone analyze it all, build reports off it and then act.
Fortunately, SQL Monitor collects all this data for you, and its new Estate pages provide summarize it succinctly, making it much easier to maintain the broad view of the overall heath of the SQL Server estate, and spot approaching trouble, early.
Tools in this post
Redgate Monitor
Real-time SQL Server and PostgreSQL performance monitoring, with alerts and diagnostics