When your ‘Best Practices’ are only your ‘Best guess’ with regards to SQL Server configuration
SQL Server configuration plays a vital role in the health of your SQL Estate. Your needs may vary depending on the purpose of the server, or its role within a business, but a healthy configuration ensures availability and performance for your customers.
We’ve been thinking hard about SQL server configuration in Foundry, Redgate’s R&D division. We know misconfiguration leads to estate performance issues that can hit your colleagues and customers hard.
Our definition of “configuration”
The most common settings — TempDb, Collation, File growth settings, Min/Max Memory, MaxDOP, Cost Threshold of Parallelism — are the settings that need to have a keen eye kept on them on if you don’t want issues to strike out of the blue.
Configuration errors can strike when you least expect it.
But keeping your estate healthy through the lens of configuration is a broader topic than simply tweaking dials. Let’s look at each of the scenarios in which configuration is a prominent concern.
-
- Establishing: The definition and documentation of configuration; what’s ‘best practice’ for us.
- Creating: The act of standing up a new server and making it fit for purpose
- Checking: Knowing that the server is running on the correct configuration.
- Modifying: Rolling out improvements and patches across your estate.
- Tracing & Troubleshooting: The diagnosis and resolution of configuration issues
Lets look at these in more detail in turn.
1 ) Establishing
The definition and documentation of configuration; what’s ‘best practice’ for us.
Simple, standardized configuration ensures consistency, and ease of deployment. Developing and documenting ‘ideal’ or ‘Best Practice’ achieves this.
Some go further, developing ‘run books’, ensuring best practice is followed by providing step by step instructions on standing up a new server.
Run books are used to provide a step by step approach to attack the configuration in the correct fashion
Even so, problems arise:
Documents become outdated quickly. The moment anything is changed on the estate, the run book also needs updating. Keeping on top on this is vital, but time consuming.
Here we’re exploring options around ‘living’ documentation, that’s easily updated, enabling you to define and document SQL Server configuration from a central, connected location.
2) Creating
The act of standing up a new server and making it fit for purpose.
Even with best practice and a baseline to work from, bespoke needs — from customers and applications — require bespoke configuration. It’s down to you to decide how to accommodate this, whilst maintaining control.
Your customers’ needs won’t always stick to the straight best practice configuration you’ve decided on.
Deviating from the norm produces opportunities for errors to creep in. Poor service to the customer, reputational damage, or even the cancellation of a contract altogether are real risks.
One way to mitigate this risk is by only making a few ‘flavors’ of configuration available to the customer; This balances flexibility with security. Achieving this takes time, as different flavors need refining and adjusting over time in order to provide that balance of performance and control.
Configuration is like a box of chocolates, you never quite know what you’re going to get.
Not everyone has the luxury of time, so in this area we’re exploring ways to empower you to apply a configuration template, fine tune, and deploy to one or many servers as required.
3) Checking
Knowing that the server is running on the correct configuration.
Often, when configuration is set at provision time, it’s left alone for long periods of time. If nothing’s broken then there’s no problem, right?
Except, is it? What if settings are incorrect, the result of a configuration choice that doesn’t immediately create a problem, but lays dormant.
You’re not always the person responsible for standing up the server in the first place. Many people are responsible for estates built upon over years by contractors and developers until it resembles a labyrinth that it’s down to the DBA to understand and organize.
Here we’re thinking about ways to simplify the identification, diagnosis and resolution, time which you could be spending on more valuable things to ensure the smooth running of your business.
4) Modifying
Rolling out improvements and patches across your estate.
At some point configurations will need to be reviewed. They may require modifying. Patching and preparing for growth or high intensity workloads are common examples of this.
In most cases, this involved working through each server step-by-step, ensuring the changes are rolled out correctly.
Over time, people develop, collect and curate scripts and tools to handle this. Crucially, it takes time and experience to achieve this.
If you’re not the curator of these tools and scripts, then knowing how to ‘drive’ them requires excellent documentation (see ‘run books’ above) and a robust testing process.
In the modification scenario, we’re looking at ways to provide a better connection between ‘how things should be’ (documentation) and how they are in the target environment. Can we learn from the Infrastructure as Code movement?
5) Troubleshooting
The diagnosis and resolution of configuration issues
Everyone strives to be on top of their estate, able to react quickly to issues and ensure applications meet their SLAs.
Managing your estate can be like the Wild West, how do you ensure you keep law and order?
A substantial amount of time is dedicated to ensuring alerts — notifications of issues — are configured correctly, raising the alarm before issues become crises.
But what about dormant misconfigurations? They won’t spark an alert because things are running ‘as expected’. Until they aren’t. The longer these issues remain dormant, the greater the distance between symptom and root cause.
We’ve heard horror stories about restaurants unable to process any payments at peak times due to issues caused by a config setting. It’s also often the customer that notices the performance drop before the DBA due to the lack of reporting or alerts set up to proactively monitor these settings.
Last year Garry Bargsely wrote “A day in the life of a production DBA” and noted how an urgent alert was traced back to a TempDB setting:
This incident was resolved quickly, but a full audit of the whole estate to review configuration was still required as a result. For some, the worst case scenario involves engaging with (expensive) specialist consultants.
In this scenario, we’re exploring ideas around developing faster detection method for dormant symptoms, followed by faster diagnosis.
Let’s talk
We’re in the early stages of researching where the main challenges lie within these 5 stages of a configuration process. To ensure that we continue to work on the right problems and develop the right tools to support our customers both now and in the future.
If you have any thoughts, ideas or experiences you feel would be beneficial to aid us in this journey, please sign up and we’ll arrange a time to talk.