Checklists: A DBA’s Best Friend

Because so much of the work of the DBA involved doing things in the right order without missing steps out, checklists are a great way of ensuring that the chance of mistakes is minimised. DBAs generally create the sort of checklist style that suits their way of working, while making their work easier for them and as reliable as possible.

As DBAs, we pride ourselves on knowing our trade. Unfortunately that pride often leads us to believe that we don’t need to have written instructions for common tasks, and that our memories are infallible. Well, sadly, no matter how often you practice your work you will, sooner or later, forget something and make a mistake, unless you use checklists.

I can appreciate that the idea of using a checklist may seem, well, insulting in some way: It’s as if we’re admitting that we’re not good enough to fly by the seat of our pants, and instead need someone holding our hands as we step down the list. But what’s more important, preserving our egos or getting our jobs done correctly and quickly?

Professionals of all trades use checklists without fail. Airline pilots and crews famously have thick binders containing detailed steps for almost every situation possible. Astronauts have the same, only likely even thicker. And thanks to the Affordable Care Act, some hospitals are even making their doctors use one (I don’t know about you but the thought of someone cutting me open without using a checklist is doggone near terrifying!). In fact, a Harvard trained physician wrote a New York Times bestseller on the very subject of checklists. The very best DBAs use them, and always have done. So let’s leave our egos on the sidelines, shall we?

Checklists standardize our work. I previously wrote about the advantages of standardizing our work, some of which include:

  • Ensuring repeatable (and successful) outcomes in daily work.
  • Allowing delegation of work to less experienced resources.
  • Ease the transition to automation of common tasks, by clearly defining steps and logic.

Finally, you can, by using checklists, ease the level of concentration we need to apply to routine tasks of  IT professionals. Our work can be stressful enough, without constantly having to make micro-decisions about how to proceed in every eventuality while executing a task. Working at this pace  can lead to poor decision making, so if we can avoid it, we should.

The work of the DBA is ripe with opportunities for checklists. Here are just a few examples, along with suggestions or trigger-questions to help you create your own.

Installing SQL Server

There’s a couple of very good reasons why you should use a checklist when installing SQL Server. First, it’s (hopefully) not something that you do very often, so the steps aren’t likely to be fresh in your mind. Second, it’s a complex process, at least if done correctly (in other words, not “Next, next, next, install”). For example, are you going to remember to grant the SQL service account the “Perform Volume Maintenance” right every time (because you really should)? Finally, it’s in your best interest to make sure that all your servers are as standardized and similar as possible. Do you really want to be trying to figure out where that script is that you keep on servers for emergency restores when it’s 2AM and your users in London are calling?

When installing SQL we must consider things like:

  • What components of SQL do we install (i.e. just the database engine, or do we also install replication, client tools, etc)?
  • Do we install the components in the default location, or is there a specific layout we want to follow (beware the defaults, unless you enjoy running databases off your C: drive)?
  • Are there any pre-checks required before starting the installation process, such as:
    • Physical drive configuration and layout (e.g. standard mount points, naming of folders, sizing, etc)
    • OS version, edition, and patch level
    • Any operating system level settings, such as page file size and location
  • How are the accounts for running SQL Server and the SQL Server Agent acquired?
    • If we create them ourselves, is there a specific OU they must be created in?
    • Is there a naming convention we need to follow?
    • Do accounts need to be placed into any security groups?
    • Are there any operating system level rights that need to be granted?
  • Do we install named instances or the default? If named, is there a standard for them?
  • What patches or cumulative updates should be applied?
  • Any standard jobs or routines to install, such as a utility database?
  • Any third party tools to install, such as monitoring agents?
  • Any other post installation steps to complete, such as:
    • Configuring number and sizes of tempdb files
    • Applying standard configurations around things like Max Degree of Parallelism or other server settings

Restoring Databases

Restoring a database is easy, right? Right click on the icon in Management Studio, select Restore, next, next, next… Oops! I just tried to restore a database to the “C:” drive and crashed my server when it took up all the space!

While a checklist is essential for restoring databases (because again, it’s hopefully not something you have to do often), it’s also true that it’s something that should be routinely practiced and drilled. This allows you to complete the steps as quickly as possible, even while you are still following the checklist to ensure that nothing is missed.

Often times, restoring databases is done in a live-fire mode, where critical processes may be offline until the task is complete. But it’s also true that restores are sometimes part of routine processes such as refreshing test environments. Since the procedures may vary from instance to instance based on the situation, you may want to consider multiple, specifically-tailored checklists.

Here’s some good questions to think about when making this checklist:

  • Do we take a backup of the tail of the log before trying anything? How do we do that?
  • How do we find the latest backups? Because of course, they are all in the same logical structure since you used a checklist when setting things up, right?
  • Do we need to move the database files to a different location on restore (common when restoring between environments)?
  • What are the relevant Books OnLine entries for the commands you’ll need? Because you don’t want to try and remember them, right? Better yet, point to where there’s a script to do this.
  • Are there any post-restore actions that need to be taken, such as re-granting rights or enabling Service Broker (better yet, do that during the restore itself)?

Performance Troubleshooting

This is one of those tasks that I see done all too often in a  haphazard and unplanned manner. So often, in fact, that I wrote an article describing in detail how to apply the rigors of the scientific method to performance tuning in SQL Server. Too often, data isn’t collected and analyzed in a methodical manner, nor are there a list of common options to resolve known problems and how to execute them.

With that in mind, here are some examples of steps that you may want to include:

  • Data to collect before taking any action:
    • Is there any blocking occurring? If so, capture the block chain (including the commands being executed and the locked resources).
    • What sessions are consuming the most resources over a short window of time?
    • What wait types are accumulating on the instance the fastest?
  • Possible causes and remedies:
    • If the performance is traceable to one particular statement, capture an actual plan outside of the calling process (an excellent tool to do this is SQL Sentry Plan Explorer.
      • If the plan shows any signs of poor cardinality estimates:
        • Are statistics up to date? If not, update them (and automate it, pronto!).
        • Are statistics based on very small estimates? Increase the sampling rate and try the query again.
        • Are the compiled parameter values different from the runtime values, and is this causing problems? If so, force a recompile using sp_recompile (and document the issue as it may be necessary to make code changes)
      • Are there any indexes that may help? If yes, evaluate in a test system and plan for deployment if possible.
    • If certain wait types show excessive growth, it may indicate a system issue. For example:
      • PAGEIOLATCH_XXX – May indicate IO subsystem performance (but may also indicate a poorly performing query, check Perfmon counters for IO susbsystem performance).
      • CXPACKET – If accompanied by high signal percentage waits, may indicate CPU pressure, possibly caused by too high of a max degree of parallelism setting. Also check your cost threshold for parallelism setting. (Note, this is a notoriously hard wait to pin down, and it could be totally innocuous. There is a ton of terrible information out there around doing things like setting MAXDOP to 1 to “resolve” this. Don’t listen to it; instead, look at this excellent post from Brent Ozar & Co, it’s much better advice.)
      • HADR_SYNC_COMMIT / DBMIRROR_SEND – May indicate bottlenecks in remote sending / hardening of transactions in database mirroring or AlwaysOn.
      • LCK_XXX – indicates resource contention which is casuing blocking.
      • Look at what objects the blocked code is trying to access (Adam Machanic’s sp_whoisactive is excellent for this when used with the “@get_locks” parameter).
      • Are the queries at the head of the blocking chain performing poorly? If so, why?
    • If blocking is occurring, take these steps

This checklist could go on for quite a while (and probably should be the subject of another article), so I’ll stop there for now.

Setup of a Cluster

This goes hand in hand with the installation of SQL, in that it’s something you probably don’t have to do very frequently, but when you do it’d better be right.

Things to think about here:

  • Request two network connections setup (one for the public network, one for the private cluster communication.
  • Do you want to create the cluster objects in Active Directory yourself, or have your domain administrator pre-stage them? (Here is a good checklist to use for either approach.)
  • What OU should the cluster objects be created in?
  • Is there a naming convention used for cluster objects, including the failover cluster virtual name?
  • Has the cluster verification wizard been run to ensure that the cluster is in a supportable configuration?

Addition of Someone to the DBA Team

While this one isn’t technical in nature, I’d say it’s no less important than any of the others. If you think about it, there’s probably quite a number of steps required just to ensure that a new DBA obtains all the requisite rights and privileges to operate, let alone things like training or tools.

Here’s some of the likely things that need to happen to get someone up and running on your team.

  • Are there any Active Directory groups that the employee’s domain account should be added to?
  • Are there any pieces of software that need to be installed on their computer, such as monitoring clients, source control (you do keep your database scripts and tools in source control, right?), or other miscellaneous items?
  • Do any requests need to be sent to other teams? For example, at one of my previous jobs, we had to request that the infrastructure group grant new DBAs access to the VMWare vCenter instance.
  • What documentation should the new employee read or review?
  • Are there any practical exercises or training steps that need to be completed, especially before access is granted?

By making the on-boarding process deliberate and explicit, you’ll be able to get new resources up to speed quicker and in a more standardized way.

Conclusion

Checklists are an essential part of the daily work and life of a DBA. They ensure that we complete our work consistently and successfully, remove the burden of decision making, and ease the on-boarding of new resources. As a final question then, we might ask, how do we implement them?

The answer is that the implementation doesn’t really matter. You could use a Word document with check boxes, a paper document that gets printed out, a Project document, or even a custom web site that tracks their use (there are also plenty of online options available). I would just say that the best systems have these features:

  • They allow for you to stop during the process of completing the checklist, then pick up where you left off.
  • They allow for saving or storage of completed checklists for future review (audits anyone?).
  • They allow for dynamic logic, such that subsequent choices are changed based on current selections (i.e. if I choose one path, the resulting options change to match).

Ultimately, it’s far more important to start using checklists at all then to have some kind of airtight implementation. So what are you waiting for? Start making your checklists and see an immediate improvement in your work.

Some sample DBA Checklists