Book Review: Troubleshooting SQL Server: A Guide for the Accidental DBA by Jonathan Kehayias and Ted Kreuger

Experienced DBAs have a wealth of experience to rely on when it's time to troubleshoot a problem, but it can be harder if you're just starting out. Brad McGehee reviews a book designed to get new and accidental DBAs up to speed on their troubleshooting skills.

1582-troubleshooting.gif

Download the free eBook (PDF) here
Buy the Kindle edition: Coming soon
Buy the Printed Book: UK | US

The phone rings and an end-user complains, “Heck! Something’s wrong with the database. My report, which normally only takes a few seconds to run, is now taking over 10 minutes to run.” If you are an experienced DBA, then several possible explanations as to why this problem occurred will probably pop into your mind, and then you will try one of them at a time until you identify and resolve the problem.

But what if you are less experienced; perhaps even a part-time or accidental DBA? How do you go about troubleshooting SQL Server problems then? Most experienced DBAs will tell you that it will take a lot of time, perhaps years, working as a DBA before your troubleshooting skills are finely honed. Well, sure; up to a point. While I think that learning SQL Server troubleshooting skills does take time, it doesn’t have to take as much time as you think, especially if you read Troubleshooting SQL Server: A Guide for the Accidental DBA. After reading this excellent book on SQL Server troubleshooting, you will be well on your way to becoming a better SQL Server troubleshooter, even if you’re still slightly short on experience.

What makes this book so good is that it distills the knowledge and experience of Jonathan Kehayias and Ted Krueger in a format that any DBA, no matter their skill level, can easily understand and directly apply to their day-to-day work. Don’t let the title fool you, this book is not only for accidental DBAs. I think every DBA who reads this book will learn something new. I know I did.

Jonathan Kehayias is a SQL Server MVP, a Microsoft Certified Master (MCM), and is a principal consultant and trainer for SQLskills.com. Ted Krueger is also a SQL Server MVP and is a Senior SQL Server Consultant. Additional content and technical editing was provided by Gail Shaw, another SQL Server MVP, and a Senior SQL Server Consultant. As you can see, the book was written by experienced professionals with lots of real world experience.

Troubleshooting SQL Server: A Guide for the Accidental DBA is different to most technical books, for two key reasons. Firstly, the book is easy to read and understand. While it is technical in nature, everything is carefully explained, making it easy reading for almost every DBA to follow along. Secondly, it provides specific advice on how to identify and correct common SQL Server problems. Think of it as a recipe book for problem-solving.

When many less experienced DBAs begin to troubleshoot SQL Server problems, they don’t know where to start. This, in itself, is half the battle when troubleshooting. The very first chapter of this book focuses just on this topic: where to begin. It describes a troubleshooting methodology any DBA can follow to help identify performance-related SQL Server problems. According to the authors:

“Nailing down a specific methodology for troubleshooting problems with SQL Server is hard because, of course, the exact route taken to solve the problem will depend on the specific nature of the problem and the environment. One of the keys to accurate troubleshooting is not only collecting and examining all of the relevant pieces of information, but also working out what they are telling you, collectively.”

In the last sentence above, focus on the word “collectively”. This is because the authors’ approach to troubleshooting starts with a broad panoramic view on the problem. In other words, many SQL Server problems can’t be identified and resolved with a single piece of information. For example, a single Performance Monitor counter is usually not enough information to identify and resolve a problem. The authors suggest that you need a wide variety of different data from different sources in order to identify most SQL Server problems.

While the book includes the detail of troubleshooting many different types of SQL Server problems, chapter one focuses on identifying performance-related issues. According to the authors, the troubleshooting methodology that they recommend when troubleshooting performance issues includes gathering information from the following four areas, and in the following order. This way, you will gather all the information you need (from a holistic point of view) so that you can best determine what performance issues your SQL Server instance is experiencing.

The troubleshooting methodology includes:

  1. Looking for problematic wait stats using the sys.dm_os_wait_stats Dynamic Management View (DMV). The book includes scripts you can use to extract the necessary information from the above DMV, along with advice on what they mean.
  2. Looking at the sys.dm_io_virtual_file_stats DMF to help identify potential disk I/O related issues. As before, a script is included that extracts the important data from the DMF. This information, along with advice from the authors, allows you to not only analyze the information on its own, but also to correlate it with the wait stats information gathered in the first step. Often, it is the correlation between many different data points that allows you to correctly diagnose performance-related problem.
  3. The next step, which is often the first step taken by novice DBAs, is to look at a series of Performance Monitor counters. Using the methodology suggested in this book, Performance Monitor counters are an important diagnostic tool, but they are not the most important ones. Collecting counters can be done by using either Performance Monitor or using the sys.dm_os_performance_counters DMV. The code that is included in the book extracts what the authors consider to be the most important counters from the DMV. The book discusses what the results mean, and how they might correlate with data collected in the previous two steps.
  4. The last step may be new to some DBAs, and that is to analyze plan cache usage. This step can be used to identify problematic queries that are currently in the plan cache. Code is included to extract important data from a variety of different DMVs, including sys.dm_exec_query_stats.

Don’t worry if some of this doesn’t sound familiar, because each of the topics is covered in-depth, including specific recommendations, in various chapters throughout the book. By following the above troubleshooting approach, you will begin to narrow down most SQL Server performance-related issues quickly. But this is only chapter one of the book.

The second chapter is on disk I/O configuration; and here, the authors help you to select the right storage hardware, teach you how to properly configure I/O related options, and troubleshoot common disk I/O problems. The chapter covers local disks, DAS, NAS, SAN and solid state technology, explaining the pros and cons of each; suggesting the best way to configure storage; and explaining how a SQL Server’s workload affects your storage hardware selection and configuration. Once you have purchased your hardware, but before you put it into production, the authors explain how to test it for IOPS and throughput (MB/sec) so you can ensure that it will keep up with your workload. Once your storage system is up and running, you will learn about how you can use disk latency (Physical Disk\Disk Reads/sec and Physical Disk\Disk Writes/sec Performance Monitor counters) to help identify problematic disk I/O performance: And once all of this is covered, the authors cover many ways to troubleshoot common disk I/O problems.

Chapter three covers CPU utilization. You wouldn’t think this would be a large chapter, but it is 50 pages long, covering every possible way you can diagnose problems that results in high CPU usage. Some of the techniques include using Performance Monitor counters, SQL Trace, and DMVs (including wait states). A key feature of this chapter is on how to identify CPU-intensive queries and how to potentially modify them for better performance. You get specific advice on how to write high performing queries that is not only useful for mending slow queries, but is also great advice when writing new queries. You might want to share this advice with your developers so they write better queries in the first place. You will also learn how specific server settings can affect query performance. The authors make specific recommendations on when to use, or not to use, Forced Parameterization, Optimize for Ad Hoc Queries, Max Degree of Parallelism, Cost Threshold for Parallelism, and more.

Memory issues often haunt DBAs, and Chapter four covers all aspects of memory management. This chapter starts out with a discussion of SQL Server memory internals, both for 32-bit and 64-bit systems. As you might guess, the authors suggest only using the 64-bit version of SQL Server, as it is much easier to manage memory than 32-bit systems. Of course, many DBAs still have 32-bit instances, so the authors spend a great deal of time on how to configure them for optimal memory performance, along with how to identify and resolve related problems. When discussing 64-bit memory, they cover a variety of settings that can affect how memory is used, including Lock Pages in Memory, Min Server Memory, Max Server Memory, among many others. After the internals discussion, there is a discussion on how to identify memory pressure problems using Performance Counters and DMVs. The chapter ends with a discussion of common memory-related problems and how to deal with them.

Chapter five takes a slightly different approach and talks about missing indexes and the problems this can cause for query performance. The chapter focuses on how to identify, and select, the best indexes for best overall SQL Server performance. This includes using tools such as the Database Engine Tuning Advisor (DTA) and a variety of DMVs. The best part of this chapter is the advice provided on index selection and design. This advice is not only good when tuning indexes, but also very useful when adding indexes to a newly created database. Besides learning how to add missing indexes, you will also learn how to identify both unused and duplicate indexes, both of which create unnecessary overhead, hurting overall performance.

Blocking, a common problem experienced by DBAs, is the topic of Chapter six. The chapter begins with a primer on locks and concurrency, and then jumps into why blocking occurs, describes tools you can use to identify blocking, and includes suggestions on how to overcome blocking problems by maximizing concurrency.

Chapter seven focuses on deadlocks, and starts out with a brief explanation of what deadlocks are, then jumps into how to capture information on deadlocks, along with how to interpret the data you have collected using the various tools discussed in the chapter. The most interesting part of this chapter is a discussion on the most common types of deadlocks, which include: bookmark lookup deadlocks, range scan deadlocks, cascading constraint deadlocks, and intra-query parallelism deadlocks. And to round out the chapter, there are plenty of suggestions on what you can do to prevent deadlocks in the first place.

Chapter eight focuses on how to deal with large or full transaction logs. While this is one of the easiest things to prevent, it is one of the most common problems faced by novice DBAs. The chapter briefly explains how the transaction log works, and describes the three main ways that transactions logs can grow. These include performing index maintenance, running a database in the Full Recovery Mode without taking transaction log backups, and long-running or uncommitted transactions. Each problem is thoroughly discussed, along with how to resolve each.

Chapter nine is the last chapter of the book and focuses on how to deal with the problem of accidently deleting data or database objects, and how they might best be recovered. How you recover lost data or objects depends on what was lost, and how. The main focus is on how to use RESTORE to restore your database to a point in time just before the data or objects were accidently deleted. This is a section every DBA needs to have memorized, or at least documented, so they will be prepared when data or database objects “accidently” disappear. The latter part of the chapter focuses on how to prevent accidental data loss in the first place, by discussing such important topics as planning for data loss, which focuses on backing up your databases, along with backing up your database objects using source control. There are also discussions on using security and triggers to help prevent accidental data loss.

After reading this 358 page book on practical ways to identify and resolve common SQL Server problems, I think you will be surprised how much information you will have learned, whether you are a novice or experienced DBA. In fact, I would guess that most of the readers will want to keep the book handy for quick reference, whether it is in the PDF format on your computer, or in hard copy stored near your computer. While I have read many books on SQL Server, this is one of the most practical ones I have every read. If you read this book, I think you will agree.