Simple Talk is now part of the Redgate Community hub - find out why

Monitor your Database Backups Using Operations Manager

Thomas LaRock shows just how easy it is for a DBAs to monitor any aspect of the databases in their charge by using Operations Manager, just as long as they are prepared to spit on their hands and do some coding. As he says, 'The possibilities are endless'.

Introduction

While presenting my work on Operations Manager at PASS this year, I showed a new custom monitor for Operations Manager that I had recently written and deployed into our Data Server environment. Previously, I had created what I had thought to be an exciting Operations Manager monitor to notify me if any SQL Agent jobs had been created or modified within the past day. Not many people shared my excitement, and I started to realize that people really want Operations Manager monitors that are simple and work reliably.

I could not agree more. Even I want it to work, and I was becoming frustrated that, every now and then, Operations Manager would seem to not work as expected. For example, it is possible to have a job not succeed, but not report a failure, at which point Operations Manager will not notify me of the issue. This is mostly due to poor error handling than anything else, but what if the job happens to be your database backup job inside of SQL Agent? You could be without a recent backup of your database and never know it!

After some internal conversations I gave myself two options; either dig into, and through all, our jobs, ensure we have thorough error handling for every conceivable error, and make certain we have configured Operations Manager to adequately scrub the error logs and event logs to make certain we raise alerts when necessary. Or, preferably I could build a monitor in Operations Manager that looks into the msdb database to determine when the last time a full backup has been completed.

Which one sounds better to you? I was attracted to the second option because it let me build yet another custom monitor that I feel should be part of the standard Management Pack offered by Microsoft (or perhaps a third party vendor). And when I presented this idea to the folks at PASS it really seemed to hit home. I know many people have strung together lots of reports on their database dumps in order to verify that they have happened. My preference is to have real time monitoring rather than daily reports, and I think this custom monitor really made people sit up and take notice of Operations Manager, as it provided one of the ‘Hello World’ examples that people could build upon for themselves.

Create the Monitor

This monitor was created in much the same way as I’ve already described in SQL Agent Jobs.  There are two important differences in the new monitor

  • the target is the database,
  • We will accommodate both SQL 2000 and SQL 2005 Databases

612-image002.jpg

Figure 1

Why two different targets? Because of the second difference, that we will need two different VB scripts, one for each version we are monitoring currently (SQL2000 and SQL2005). And why do we need this? Because Microsoft reserves the right to make changes to their system tables between versions, that’s why.

In Figure 1 you can see that I have named the monitor ‘Verify Backups’, and I have placed it under the ‘Availability’ aggregate rollup monitor for each target. This means that, when we examine the database state view later on, we will be able to drill into the health explorer for a database to see the current health state for a particular database. I chose the Availability rollup simply because it made the most logical sense. If you do not have a current backup of your database available, then the health status for the availability of your database should show that. In other words ‘availability’ does not have to simply mean real-time availability; it can be extended to include the availability of your backups.

Script Overview

Using the same idea as in the previous monitor I created, I knew that I needed to create a PropertyBag, which meant that I needed to return a name-value pair. So, I went about trying to piece together some T-SQL that would return three columns: database name, number of says since the last full backup, and number of days since the last differential backup. I could change the code to include transaction log backups, if desired, but my goal for this example was simply to check that nightly differential backups, and a weekly full backup were being done.

The VB script for the SQL 2000 DB target is nearly identical to the SQL 2005 DB target VB script. The only difference is in the T-SQL being sent. The SQL 2000 version is as follows:

The idea was to return a list of current databases from the master database and join out to the msdb database. That way I would only focus on current databases, as opposed to reporting on databases that no longer exist but still have records inside of the msdb database. We also filter for sample databases as well as tempdb. The SQL 2005 version is as follows:

The difference here is that in SQL2005 we want to avoid looking at database dump information for database snapshots. Although a database snapshot cannot have a backup taken, it does appear in the list of current databases for the instance, so we needed to filter for that possibility in our query.

Also note that we focus on the start time for the backup, as opposed to the finish time. This is because the start time for the backup is more important than the finish time since that is when the LSN checkpoint is marked in the backup file. In other words, you restore to the point in time at which the backup started, not when it finished. So, we want to focus our monitor on the start time.

I have included the VB script as downloads with this article

For the monitor schedule I decided to run every three hours, but you can adjust according to your level of OCD. Some people will choose to run once a day, some might choose once an hour.

The last items of interest would be defining ‘health state’ for the monitor. I decided to define the state of this monitor as ‘Unhealthy’ should I find a database that has not been dumped in more than seven days as shown in Figure2.

612-image004.jpg

Figure2

The full parameter name being used in Figure2 is:

This is how we tie to the property bag elements defined in the monitor scripts provided with this article. Next, we define a degraded state to be one where we have a full backup within the past week, but the differential backup is more than one day old (Figure3).

612-image006.jpg

Figure3

The parameter name for the differential variable is as follows:

Finally, a healthy state is defined to be having a full backup in the past week and a differential backup in the past day (Figure4).

612-image008.jpg

Figure4

And that is all there is to it, a custom monitor very similar to the one I described previously, with just a few alterations. But building it is one thing, how is it to be used? That is a good question and one that, I think, helped me win a few people over at PASS this year.

Database State View

For me, the one item that always wins me over in Operations Manager is the ability to quickly see the health of your entire enterprise in one screen, the Database State view. What we have done above is to create a new monitor that rolls naturally into this view. So going forward, should we have a database that does not have a current backup, we can have a way to visualize the problem. To me this is better than relying on a series of email alerts or reports. When I was still studying mathematics we were always encouraged to visualize the problem, because visual images are more powerful than anything else. Perhaps that is why I enjoy working with Operations Manager as opposed to emails and reports.

Figure5 shows a portion of the database state view. So, with a quick glance we can focus our attention on the specific instance with an issue at this precise moment. That is better than reviewing reports that were run hours ago, or responding to email alerts hours after they were sent in the middle of the night. You can configure the frequency of the monitor to get as close to real-time status as you desire (remember in the above example we schedule our monitor to run every three hours).

612-image010.jpg

Figure5

By selecting one of the icons you are given a detail view of the current databases for that instance as shown in Figure6.

612-image012.jpg

Figure6

If you were to double-click on one of those rows you would bring up the Health Explorer for that particular database as shown in Figure7.

612-image014.jpg

Figure7

Figure7 has quite a bit of information for us to review. First, on the left, you can see the ‘Verify Backups’ monitor is in place as part of the ‘Availability aggregate rollup’ monitor for the target, in this case the SQL 2005 DB target. On the right, we have selected the State Change Events tab which then lists in detail the times that the monitor ran and updated the health of the status. So on 11/19/2008 at 1:16AM, the script ran and detected that the LiteSpeedLocal database had not had a differential backup in two days. Then, when the monitor ran again at 4:16AM, it detected that a differential had been done, and reset the health of the monitor accordingly.

Summary

If this monitor does not get you excited about customizing and using Operations Manager as a way to help monitor your database instances, then I am not certain that anything ever will. The first responsibility of any DBA is to be able to recover in the event of a disaster, no matter how big or small. To recover, you had best have a backup. To recover well, you had best have a recent backup. Being able to use Operations Manager for this, being able to visually inspect your instances quickly, is something that I find lacking in the native tools.

And what is the second responsibility of a DBA? Well, it depends, right? It depends on the nature of your shop, but whatever you believe that to be, I am certain we can find a way to get Operations Manager to assist you. Even in the above example, we could go one step further. Why not configure a recovery task to have the monitor do a backup of the database should the monitor detect a differential or full backup is missing? In other words, we could configure Operations Manager to assist us in maintaining the health of our enterprise, and not just sit back and report on the health of our enterprise.

The possibilities are endless.

How you log in to Simple Talk has changed

We now use Redgate ID (RGID). If you already have an RGID, we’ll try to match it to your account. If not, we’ll create one for you and connect it.

This won’t sign you up to anything or add you to any mailing lists. You can see our full privacy policy here.

Continue

Simple Talk now uses Redgate ID

If you already have a Redgate ID (RGID), sign in using your existing RGID credentials. If not, you can create one on the next screen.

This won’t sign you up to anything or add you to any mailing lists. You can see our full privacy policy here.

Continue