For several years, I was responsible for the smooth running of a large number of enterprise database servers. We ran a network monitoring tool that was primitive by today’s standards but which performed the useful function of polling every system, including all the Servers in my charge. It ran a configurable script for each service that you needed to monitor that was merely required to return one of a number of integer values. These integer values represented the pain level of the service, from 10 (“hurtin’ real bad”) to 1 (“Things is great”). Not only could you program the visual appearance of each server on the network diagram according to the value of the integer, but you could even opt to run a sound file. Very soon, we had a large TFT Screen, high on the wall of the server room, with every server represented by an icon, and a speaker next to it that would give out a series of grunts, groans, snores, shrieks and funeral marches, depending on the problem. One glance at the display, and you could dive in with iSQL/QA/SSMS and check what was going on with your favourite diagnostic tools. If you saw a server icon burst into flames on the screen or droop like a jelly, you dropped your mug of coffee to do it.
It was real fun, but I remember it more for the huge difference it made to have that real-time visibility into how your servers are performing. The management soon stopped making jokes about the real reason we wanted the TFT screen. (It rendered DVDs beautifully they said; particularly flesh-tints). If you are instantly alerted when things start to go wrong, then there was a good chance you could fix it before being alerted to the problem by the users of the system.
There is a world of difference between this sort of tool, one that gives whoever is ‘on watch’ in the server room the first warning of a potential problem on one of any number of servers, and the breed of tool that attempts to provide some sort of prosthetic DBA Brain. I like to get the early warning, to get the right information to help to diagnose a problem: No auto-fix, but just the information. I prefer to leave the task of ascertaining the exact cause of a problem to my own routines, custom code, intuition and forensic instincts. A simulated aircraft cockpit doesn’t do anything for me, especially before I know where I should be flying.
Time has moved on, and that TFT screen is now, with SQL Monitor, an iPad or any other mobile or static device that can support a browser. Rather than trying to reproduce the conceptual topology of the servers, it lists them in their groups so as to give a display that scales with the increasing number of databases you monitor. It gives the history of the major events and trends for the servers. It gives the icons and colours that you can spot out of the corner of your eye, but goes on to give you just enough information in drill-down to give you a much clearer idea of where to look with your DBA tools and routines. It doesn’t swamp you with information.
Whereas a few server and database-level problems are pretty easily fixed, others depend on judgement and experience to sort out. Although the idea of an application that automates the bulk of a DBA’s skills is attractive to many, I can’t see it happening soon. SQL Server’s complexity increases faster than the panaceas can be created. In the meantime, I believe that the best way of helping DBAs is to make the monitoring process as simple and effective as possible, and provide the right sort of detail and ‘evidence’ to allow them to decide on the fix. In the end, it is still down to the skill of the DBA.
 
         
	 
	 
	
Load comments