At Redgate, we regularly chat to IT leaders and experts about the challenges of managing and monitoring their server estates. This series of ‘DBAs at work’ insights is based on some webinars hosted by Grant Fritchey, Microsoft Data Platform MVP, which talked about intensifying data demands and how effective estate monitoring enables DBAs to manage continually evolving environments. This episode features Deborah Thompson, Database Administrator at WestJet, Canada’s second largest – and friendliest – airline.
How different is it working for an airline as a DBA?
A lot, because there’s a more to do. Prior to the COVID-19 pandemic, WestJet carried more than 22 million guests a year, on over 700 flights per day, with a fleet of more than 180 aircraft. While we are building back to those levels, behind the scenes, our databases are critical to safety, operations and commercial procedures. From safety reporting, kiosk, and check-ins to air ground communications, our business really relies on the performance of our databases.
What are you seeing in the way your estate is changing?
There’s been an initiative to deploy to the cloud when possible, but most of our databases are hosted on-premises. The problem we face is integration, complexity and the performance hits we encounter when we’re merging data from multiple sources.
For instance, we have an electronic flight bag, which this is a tablet that’s used by our flight crews. It contains materials like operating manuals, and navigational charts, load data and guest data. That data is all coming from different sources and systems, some of which are pulled from off-site locations, some from on-premises, some from Oracle. So ‘lifting and shifting’ the data to the cloud isn’t an option for us in a lot of areas, and the initiative to move to cloud platforms needs to be heavily architected.
What role does estate monitoring play in your organization?
It allows us to be proactive with reporting and really cuts down on our daily checks. Instead of spending half a day just doing a checklist, we’ve cut it down to less than 30 minutes by using SQL Monitor. But most importantly, SQL Monitor is the canary in the coal mine. For our department, we need to be alerted when replication is slow between nodes in different data centers, but this isn’t always caught by our enterprise monitoring systems. Quite often it’s a DBA on my team that’s calling our operation center and asking if there are any network issues or any VMware issues, because we’re seeing all these alerts. And until we ask, and other people investigate, the issue isn’t detected. So that’s one thing that we love about it.
Has your monitoring process improved collaboration between teams?
It’s allowed us to contribute a lot in terms of troubleshooting major issues. There’s always going to be operational incidents, whether it’s a network outage or an issue in the data center. With SQL Monitor, we’re able to provide specific times when our monitoring system was unable to connect or when it slowed down. We can share these specific details with other teams who can check their logs too, and together we can find out exactly when and why the situation happened. Without SQL Monitor, I can imagine going through up to 200 servers and pulling SQL logs or trying to collect them in different ways.
Have you found any surprise benefits with SQL Monitor?
We had a different monitoring system in the past, and it was a lot of effort to maintain. We had one DBA working full time on the monitoring system and it took over an hour to just add a system to the monitor. It was very painful. When we switched to SQL Monitor, I found that it was literally a couple of minutes from the time I typed in my cluster name to the time that all nodes on that cluster and the databases are monitored. I was shocked. It freed up so much time, and we no longer have to have one dedicated member of our team working on monitoring.
What are the biggest threats you’re seeing as your estate expands?
I started at WestJet in 2011 and we grew explosively for the first five years and it was tough because at the time our team wasn’t growing at the same rate, it was getting smaller. There was this need to do a lot more with less people. I think ultimately, as an organization’s estate starts to grow, unless they’re willing to invest in tools and people, there are going to be balls that get dropped. And I don’t think any DBA wants that attention to be on them. So of course, what keeps me up at night is how can we do more with less and maintain the stability of our systems the way we want to without dropping any of those balls.
Do you think these threats are going to change over the next 12 months?
It’s hard to say over the next 12 months, because we’re talking the pandemic right now and in the airline industry, it’s a little difficult. When the pandemic hit, our network reduced by more than 90 percent and the majority of the workforce was immediately stood down. We essentially scaled right down to bare bones operations, parking more than 75% of our fleet, but at the same time, we still had to have the same systems in place for those aircraft that were in the air. And, as we’re moving forward, of course, we’re getting more planes in the air and we’re having to ramp up our business again.
Over the next year I don’t see any changes happening in terms of our team having to leverage what we’ve got to ensure our systems are stable, and to move forward as we grow back to where we were pre-pandemic.
How do you deal with the fact that you’re doing more with less?
Automation. The more automation that we can do, the better. I know one of the biggest pain areas in our team is vulnerability management and we patch all our systems monthly. While we do have a technical operations team that does the patching for enterprise-wide systems, the DBA team still must ensure we’re shutting down and coordinating with our application teams, to make sure that they’re shutting down services and doing what they need to on their side. This gets to be really time-consuming, and as DBA this is not what I want to be spending my time doing. These are the types of things that can be automated or done by someone else.
One of the things that I’m trying, to do more with less, is streamlining tasks to other teams. I develop scripts (the SQL Monitor API is a lifesaver for us when it comes to writing PowerShell scripts) to automate a lot more. It means that our operations center can run those scripts to start and stop services themselves, and we’re not woken up in the middle of the night with alerts because servers are down, because that’s all integrated within the script.
But ultimately leveraging the people that you do have, trying to streamline more onto the teams that perhaps have a few more cycles, and utilizing the tools we have the best we can.
What is one takeaway you would like to leave everyone that they can take home and put to work?
I think honestly, spending the effort to get good monitoring in place will pay off in terms of DBA effort in day-to-day operations. If you leverage the tools, it gives DBAs an opportunity to use their skills where they’re much better needed. It’ll also help DBAs do much more with less and make for happy business, happy customers, and most importantly, happy DBAs.
Look out for other episodes in the DBAs at work series, featuring Grant Fritchey talking to:
- Dennis Heitmann, Database Administrator at Atruvia
- Kevin Davis, Manager of Database Administration at Tower Loan
If you’re new to SQL Monitor and would like to see how it can help you monitor database performance issues, you can download a fully-functional 14-day free trial, or try our live online demo environment.
Was this article helpful?