Break it early to ship it safely

Sunny

We want developers to break things – just not for the customers. If all our tests are green, I get nervous that we’re not testing deep enough.

Naga Santhosh Reddy Vootukuri, Principal Software Eng. Manager, Microsoft Azure SQL

Naga Santhosh, Sunny to most, leads a team that ships changes to Azure SQL databases worldwide. Those deployments must be fast, frequent, and invisible to customers. That kind of reliability doesn’t come from playing it safe during development. It comes from encouraging the right kind of failures, long before they reach customers.

I like devs to break things in LKG (Last Known Good) or Stage environments,” Sunny laughs. “When tests are always green, I worry we’re not hitting the edge cases. Break it early, fix it early—that’s safer for the customer and less stressful for the team.

One of Sunny’s biggest challenges is making sure changes behave consistently across thousands of subtly different Azure SQL environments. It’s those configuration differences that create the worst surprises. His engineers rotate on-call every few weeks, knowing an alert can come at any time, even at 2 AM. “It can be stressful,” Sunny admits, “but our culture is built on support, not blame.

On-call engineers are ‘joined on the bridge’ by managers and leaders who are willing to help troubleshoot. The priority is to mitigate the impact on the customer. Then comes the retrospective, understanding what happened and how to prevent it next time. “That’s how you build your customers’ trust, and your engineers’ confidence and awareness.

So how does a team shipping at Microsoft scale keep deployment stress under control? For Sunny, it’s about using new technology to build better safety nets. Containers now let them spin up realistic replicas of all the different types of production environment, catching more problems before release. Every feature is wrapped in a Feature switch, so rollout can be controlled, feedback gathered, and any risky change disabled instantly, without needing a full rollback deployment.

And when Microsoft, Amazon, or Google publish postmortems of their own failures, Sunny reads them all.

You can learn so much from how others fail, and especially from how they recover and build better resilience into their systems.

Away from work, Sunny’s curiosity never really switches off. When he’s not playing cricket or spending time with his family, he’s writing, speaking, or diving into the latest technology. “That’s how I unwind – by learning, sharing, staying curious. Right now, I’m exploring Agentic AI, and it reminds me that even after 17 years, I’m still just getting started!

Get the next stories straight to your inbox.
Sign up for the Redgate Update and we’ll send them as soon as they’re published. You’ll also get industry news, Redgate announcements, event invites, and more.

Read next

Blog post

Rollbacks, Red Eyes And Unreliable Deployments

This series is about the stress of database deployments on the people behind them, and the small, steady changes that help relieve it. – Felicity Questier, Redgate Software. Behind every unreliable deployment are the people carrying the pressure. From delayed releases to weekend firefighting and the fallout for teams and customers, these data professionals share

Go to the blog post