What Counts For a DBA: Failure

My family and I enjoy sitting down together to watch the TV show called “Wipeout“. The contestants try to run wacky obstacle courses filled with various bizarre and fiendish obstacles in the fastest possible time. Failure is a guaranteed occurrence; contestants run into fake walls, get hit by “cartoon” hammers, and generally end up falling off a multi- story precarious perch unceremoniously into a pool of water or even more humorously, disgusting mud. They then pick themselves up, wipe untold amounts of gack and goo off their face, and go back for more of the same until they finally get to the end.

My family finds all of this hilarious, but I have to admit I sometimes wonder how stupid these people must be to keep going as they know they are going to fail over and over. Then of course, because everything I do inevitably gets me thinking of technology, it all reminds me of the daily process of the average production and development DBA. Then I understand why these people keep going. They become driven to finish in the belief that next time they can dodge the hammer and not get smashed… and the hope of a large sum of cash isn’t a bad motivator either.

DBAs are faced daily with failure in some way, shape or form. The first time we execute any non-trivial SQL query we almost inevitably get some error message like the dreaded “syntax error near…” message. Then we keep going, failing over and over until things work at least once. When everything seemingly works, we give our creations to users and barely a day, sometimes minutes, after our lovingly-crafted solutions get in their hands, users begin reporting various types of failures. The system is running “too slowly” or is failing to satisfy some mysterious “requirement” that you’re fairly sure the user just made up on the spot just for their entertainment. And, of course, no matter what the true source of the problem, from slow reports to hardware failure to network glitches (and occasionally something SQL related), it’s always initially considered the database’s, and the therefore the DBA’s, fault.

Some of these “hammers to the head” failures hurt, but somehow giving up is never an option; DBAs keep going back for more. They keep going-and-going, fixing problem after problem until the torrent of 2AM alerts on their phone becomes only a trickle of concerns that can be put off to the morning. Then, at 8AM it is back to start the process again, eventually repeating until the next time an error occurs somewhere in the enterprise.

If the fate of the DBA has many parallels with that of the Wipeout contestant, there is one crucial difference: for a good DBA, acceptance of failure is not taken lightly and giving up is not possible. If a good DBA did the Wipeout course for the first time, the hammers to the head would connect just like for everyone else. The next time however, those hammers will have been disabled permanently, even if it meant a slower course time for them and all following contestants could finish in record time.

“Proactive monitoring” helps to head problems off at the pass. Code is fortified with layer-upon-layer of error handling.  DBAs fight vigorously for the time, manpower and resources to do adequate testing, often in the face of cries of “insanity” that meet our requests, especially to spend money for a test server of the relatively same power and capabilities as the production server.

Failure, especially public failure, is something that by nature we shun. However, the try-fail-learn cycle is one that all good DBAs (and programmers) must accept and embrace without fear. Fail once, that’s life, fail again in the same way, and that isn’t being a good DBA and a DBA that doesn’t learn from mistakes is probably in the wrong profession. And of course, we do like the promise of a nice cash prize every two weeks.