Redesigning Red Gate’s SQL Server Performance Monitoring and Alerting Tool

It isn't easy re-designing a successful tool, but it is fascinating work. Andrew Clarke talks about the challenges with Tom Randle and Adam Walker, the user experience designer and usability test coordinator, behind Red Gate's SQL Server monitoring and alerting tool SQL Response v2.

SQL Monitor: Sensing the heartbeat of the server

Note: This interview was first published in July 2010, when SQL Monitor was in the EAP page. The full product was released in November 2010 and is now available to download from the Red Gate website.

AC: What were the particular challenges you faced when designing SQL Monitor? Tom: Designing an application like SQL Monitor is much harder than designing something with a discrete workflow, a job, and an end result, such as a backup tool; the requirements for SQL Monitor are necessarily more woolly and open-ended. It’s a monitoring tool where the user wants to be able to see metrics that will help to inform their decisions, and it also needs to provide the information needed to take action and solve certain problems.

The list of different database metrics and information that we show is very open-ended. In designing the tool, we’ve had to heavily research what we actually show, and find out which of the many possible dashboards, graphs and widgets people care about. It turns out that people want very different things from what you’d come to expect by just reading the opinions of the experts and seeing what other designs are out there.

In general, the potential scope for features is pretty much infinite, and it has been very hard to rein that in and figure out what is really important. AC: Is that because there isn’t a shared understanding amongst the potential users of what the tool should be doing? Tom: Yes, to an extent. Some people want a profiling or performance type of tool, and others have a requirement that is more on the alerting side of things. In between, there are those who want a real-time dashboard on their monitors which they can use to keep an eye on their servers.

We’re supporting the proactive approach, to help you make sure that your servers are up and running, running well, and to assist you in spotting potential problems before they happen.

1080-Interview.jpg

Adam: That’s the heart of the problem; there’s no consensus on what exactly you need to see, and every DBA has unique requirements. Even if there’s an existing example of something, people have very different ideas of what’s useful or not. The other problem is the very loose terminology: The word ‘dashboard’ is bandied around a lot. People say they want a dashboard, but when you explore the details of what they mean, different people actually want very different things. Tom: For v2 we also realized that we had to put a stronger emphasis on SQL Server monitoring functionality in addition to alerting to allow DBAs to be pre-emptive. People believe that they need to see the good as well as the bad, and they like the reassurance that comes with it. AC: As a DBA, you actually want somehow to sense the heartbeat of your SQL Server installation to see how it’s dealing with the workload that’s being demanded of it, so it’s rather nice to have the raw metrics so you can make your own judgments about where the machine is being stressed. It’s hard to explain; it’s like the human body, in that potential points of pain are as important to know about as when you’re actually hurt. DBAs need to know their servers’ pressure points.

Checking the server from the phone.

Tom: We realized that the best way of meeting these new requirements was to go down the route of a web application. AC: That must have been a culture shock. Tom: It was to a certain extent, but I’m really pleased about it, because I think that the separation between aesthetics and functionality is a much better approach for the web, and it’s much easier for us to tweak and play with the design. With WinForms, you can take basic functionality for granted, whereas with the web you can’t – You’ve got to start from scratch. We’re keen for this to be quite cool; I think it’s quite a funky product in terms of cool graphs, and people certainly like graphs. AC: I love graphs, as a DBA, because I feel I can sense what is going on in the server. Tom: There’s still some work to be done for the UI before we launch v2, but I’m really keen for this to be a really slick application. Adam: Definitely so. The reception we’ve had to the fact that it will work on an iPhone, iPad and other mobile devices has been fantastic.

1080-ipad.JPG

Tom: Yeah. It lends itself to being a web application very well, because you’ve usually got multiple users wanting to access data from multiple computers, and you wouldn’t want to have to install the tool on a remote desktop every time you want to look at the data. Now, however, you should be able to look at your alerts or check the health of your server from home, or in a restaurant, much more easily. Adam: I like what one of the users said: “Oh great, DBA on the go!“. It seems like that’s really important. AC: It is. A DBA has all the essential ingredients there in an iPhone or iPad. You can actually connect to your office PC via a VPN link, work at the command line, and actually do a huge amount of work in controlling a remote SQL Server. If you can be sure of getting an alert telling you that your application’s stopped because you’ve run out of space for your transaction log, then you can get in there straight away to sort the problem out remotely. Adam: Yeah, from your bed. AC: Yeah, but that might be a bit of a romance-killer. The only component you now need is the function that sets your iPhone off with an alert about that initial problem, and that’s what SQL Monitor v2 will give you.

Custom reports from Reporting Services.

Tom: A lot of our time is spent on the installer, because we’ve got quite a complicated architecture now. AC: Will the installer also install the web app? Tom: Yes. So, if you go down the quickest install path and install both the web server and the base monitor on the same computer and use an XSP web server rather than IIS , then we can just set that all up and install it straight away. Alternatively, you can opt to use IIS instead, and you can have a separate machine for the web server if you want. You can also have a separate machine for the SQL Server instance that actually stores your data. For v2 we are now using a SQL Server database instead of SQLite, which we used in version 1 of SQL Monitor. AC: So presumably now you’ve converted to SQL Server, people can actually put their own reports into that? Tom: Yep, we’re working with Reporting Services, so now anybody with SSMS can have their reports within SSMS straight from SQL Monitor.

A Community of users.

Tom: The other major thing we’ve done is set up the Future of Monitoring project website and the SQL Monitor Early Access Program to collect feedback. We wanted to try and get as many people as possible to suggest what they wanted, so we could tailor the tool to the needs of actual end-users. AC: Has that worked? Tom: We’ve definitely benefitted. We’ve grown a core set of users who are really keen to participate in any way they can and are vocal and helpful. We’ve had good replies to a lot of questions and specific issues, and we also had a lot of useful ideas from our Design a Dashboard Competition. AC: I think a lot of dashboards go for the airplane analogy; Lots of dials, artificial horizons and so on. Adam: We did think we’d just get that traditional dashboard, but we were really surprised by some really quite creative and original ideas, such as using Galileo’s thermometer. Tom: They also helped us decide the level at which the dashboard worked, answering the question of “Do I want to see everything or do I want to see one server at a time?” Adam: If we speak to one DBA and he gives his opinion on something, then we have to evaluate that. If, on the other hand, he goes public on the website, then that opinion gets peer-reviewed by other DBAs. It was very useful to have that multidirectional conversation.
 
We’ve received fantastic support from the community, and I feel that now we’ve reached the point where it’s time to give something back; something they can actually install rather than another blog post. So we now have a SQL Monitor v2 page on the Red Gate website, where people can join the Early Access Program and get their hands on the latest EAP build immediately. We’re also now directing the community to the SQL Monitor EAP forum so people can leave feedback on the EAP builds. Tom: Moving on from the Design a Dashboard Competition, we asked for opinions about the user-interface fairly early on in the development process when the UI was pretty basic, which turned out to be an advantage. I think it was Steve Krug who said that if you leave showing the designs to the time where you feel comfortable with the design, then you’ve left it too late. But you always do feel a bit exposed showing your babies out there before they’re fully developed, and I think it’s a fine balance. Adam: I was actually quite worried when we introduced the first, early design, because it didn’t look like Tom’s design for the release version yet. It was really important that we were going for this ‘wow’ factor of a really cool looking tool, so we were really surprised at the reaction we got to those basic designs. People were saying “it’s slick” – They actually thought that it was a really cool tool, yet it still wasn’t near how it’s going to look when it’s finished. Tom: The other, very hard problem we’re solving in terms of infographics is how you actually visualize the data. A lot of computer applications make mistakes; they’ve got different axes, units and data on the Y axis, and overlaying different information on the same graph like that has enormous potential for misleading people.

Most graphs are done when the designer is sure of the limits of the data beforehand. These examples are fairly static things, whereas we’ve got to do it completely dynamically in a tool like SQL Monitor. Some users can have two databases and others can have a hundred; making the tool cope with that, as well as different monitor sizes and all sorts of other variables is really quite challenging. AC: Drawing graphs within a web application isn’t easy in the first place. Tom: True. We’ve also made the decision to design the interface so that , when you add a server, there’s a host machine, and then we infer what instances are on it so that you can then choose whether to monitor them or not. In our hierarchy, we break machines out separately, and we can show you the non-SQL processes as well.

The Installer

Tom: We’ve come up with a very clever licensing system for version 2.0 of SQL Monitor, where you can enter a five server license, and then the individual licenses are auto allocated to your list of machines. Recently we’ve put a lot of effort into the installer, because any situation where you have to install something on more than one machine is fraught with difficulty.

Now, if you install the web server, you then get the that up and running, it then tells you to download the base monitor onto whichever computer you want to install it on, and then you run it from there. The design of the installer was a real saga. I think I had a mock-up of the installer that had about a hundred different screens on it, and now we’ve reduced it dramatically. Adam: Yeah, the initial plans just looked overwhelming.

1080-TA1.jpg

When Tom sketched out how many steps were required, and all the different credentials that were needed at each step, it looked like an impossible task. Now, with the smaller version of the installer, we’ve watched a couple of people install it within a matter of minutes. AC: Well, the DBA’s time is so limited now because they’re managing more and more server installations. It’s in the hundreds now, so having something that is going to be slick to install is going to be a huge advantage for them, and they’ll really appreciate that.

Alerting

Tom: Alerts can be quite tricky because you need an alert for the initial event, for example that CPU usage has shot up, but then you also want to know if the usage has gone back down again. With a traditional alert, you’d just know that it was high at a certain point in time, but with SQL Monitor version 2, we’re introducing the concept of alerts that have a duration.

1080-TA2.jpg

So, if your CPU is high it will tell you, and then, to tell you CPU usage has gone back down again, that alert will kind of flag itself as no longer being as important. So if you had two alerts, and one of these CPUs was still being thrashed while the other had calmed down again, you could quickly assess which was more important. AC: Wouldn’t you end up being bombarded with lots of e-mails? Tom: We’re giving you the option to turn off certain (or even all) email alerts, so you don’t get bombarded. We’ve also put quite a lot of effort into our global overview, where you can see all your machines and SQL Server instances, and you can see whether they’ve got any alerts on them as well as their CPU and memory. AC: It’s an important step to have a view of your servers as a totality that gives you a quick overview of the health of your whole SQL Server environment, that’s very important.

Simple as well as powerful

Adam: We’ll often speak to somebody who, when we start explaining our software tool in order to ask their opinion, says “you’re forgetting that I’m a part-time DBA – I fell into this by default, so just make it simple for me.

That’s been really useful in reminding us that what we’re trying to make is something that’s simple as well as powerful. AC: Yes, like the original vision for SQL Monitor, which offered both simplicity for novices and accidental DBAs, and in-depth information and diagnostic data for more advanced and senior DBAs. Tom: SQL Monitor v2 has the same focus, but it’s going up a notch as we’re aiming to be able to cope with a much larger number of servers. Adam: We’ve held over 30 hours of user research sessions and user group discussions. Tom: It’s incredibly hard to do a mock-up of a dashboard; in the early tests I just had to put some random numbers in the dashboard mock-up, and if those numbers didn’t add up, then all of a sudden the user is thinking “What the hell?” Instead of realizing that it was our mistake, the user sometimes thought that they’ve not understood what we were showing them, and that really stung us. The fact that all the numbers have to add up means that actually showing paper prototypes or visual mock-ups is very hard. We ended up showing a few flat pages and asking users what they thought of those, rather than having a full blown test. It became more of a discussion than what we’ve done in the past, which was quite a new thing. But we did them very early to get end-user feedback as early as possible during the development cycle, and we’ve now moved on to the more advanced stages of user testing.

We’ve been exploring the information architecture with users, as well as the actual information design, which goes beyond classical usability and simply answering questions such as “how do I make this workflow as streamlined as possible?” It’s been quite tough, because the amount of UI in SQL Monitor v2 is pretty significant. Just in terms of the sheer quantity of things you can configure and drill into, SQL Monitor v2 is pretty extensive. AC: And the diversity of training and experience of the end users as well is pretty challenging. Tom: Yeah, and just the fact that all their systems are so different.  Again, if everyone has only got one instance on a machine then what it’s showing them is probably different to what they’d see if they have multiple instances on a machine. There would be differences if you have virtual machines or real machines, and so on. Then, for example, there’s also the question of multiple CPUs. Do you want to see a summary for all of your CPUs, or do you want it per-core or per-processor? That kind of thing, it’s just… AC: It’s tricky when you’ve got 200 cores isn’t it? Tom: …Yeah, exactly. It’s very, very tricky.

SQL Monitor is available now. Visit the Red Gate website to find out more and download a free trial.