Not long ago, a crack team of SQL Server experts was flamed for a crime against database normalization they didn't commit. These men promptly escaped from the taunts of academic relational theorists. Today, they survive as website editors and software evangelists. If you have a problem, if no one else can help, and if you can find them, maybe you can hire the DBA Team.
Scene 1: Dave the Agile Data Guy
Dave clutched his temples in despair. Before he'd even sat down, he'd started getting calls from users trying to access financial trade reports. Connections dropping, scary looking error messages about I/O inconsistencies.
The dreaded words crystallized in his mind like ice tentacles: "data corruption". He shook his head as if he could break them loose. Maybe it wasn't so bad. Maybe he was being paranoid. Or maybe he was way out of his depth.
Central IT told him to isolate the database, start his recovery procedures, and that someone would be along to advise further in due course. Advise him, or escort him to the door? He immediately put the production database in
RESTRICTED_USER mode, took a tail log backup, and started
DBCC CHECKDB. Darn it, did it always take this long?
He'd read the most important thing in these situations was not to panic. He breathed deeply and tried to think about beaches. He needed someone who knew what they were doing.
Before he left, he pulled out the latest monthly full backup, thankfully taken just last week, plus the latest nightly differential and the string of log backups, including the tail log backup. He kicked off the restore of the full backup. This was going to take a while. He grabbed a coffee and headed out.
The real DBAs would know what to do. They had to.
Scene 2: Into the DBA Den
Dave approached the congregation of DBAs around the water fountain, at the heart of the silent, glass-walled office.
They listened attentively to his tale of woe.
"It smells like corruption all right. These things happen, Dave." Joe gave a sympathetic smile. "Especially if
DBCC is running long, which usually means it's found a problem and gone into 'deep dive' checks."
"But don't panic yet," reassured a second DBA. "You're running your test restore and I'm sure that'll be fine. You do regular test restores, right?"
Dave hoped the DBAs didn't notice the color drain from his face. Test restores? Wasn't
BACKUP…WITH CHECKSUM enough?
"And even if your backups are bad, you've still got your replicated reporting servers," offered a third. "As long as the logs aren't corrupted, replication should still be fine and data corruption won't transfer over."
Dave's eye started to twitch. Implementing replication was still in the team's backlog.
DBCC CHECKDB report last time it ran?" asked Joe.
DBCC CHECKDB report last time it ran?"
DBCC CHECKDB on your backups automatically with SQL Backup Pro
"Nothing," asserted Dave. "It's a scheduled fortnightly job and I get an email if things go bad. There were no alerts last week."
"Any glitches in the hardware?"
"The sys admins ran full I/O subsystem and memory diagnostics last week when some log backups started failing for no reason. Everything came back clean."
"Then there you go! I'm sure you'll be back up and running within hours," said Joe, winking.
Dave almost allowed himself hope. "If one of you guys could come and take a quick look with me, I sure would appreciate…"
"Would you look at the time, I should be in a meeting. Best of luck!" Joe waved cheerily.
Dave watched hopelessly as the DBAs scattered, muttering various apologies. Did he catch a suppressed chuckle?
Scene 3: A visual assault
Dave stared at the screen in his office, unable to take in the heart-stopping stream of red text assaulting his eyes. The full backup restore had failed! Had corruption seeped into the backups? Why hadn't his
WITH CHECKSUM picked it up?
His career was on the line. He was alone. Without thinking about it, he copy-pasted the error message into Google. At the top of the results was the same infuriating advert that had been pursuing him around the technical sites. The DBA Team, replete with explosions, fire, and garish titles. Wait. Maybe these people were real? He picked up the phone to Phil Factor, the man who'd helped him get a job here. The man who just might know how to find them.
Scene 4: Phil smells a rat, or possibly ferret
"…Allegations of trading irregularities rocked the world of futures dealing today. We go live to Monte Bank's head office, and their CIO, Gordon Knot."
"Let me guess, Dave. They're not VERIFIED!"
Verify your backups automatically with SQL Backup Pro
Phil Factor and the ferret curled peacefully in his lap watched the anchorman wobble his jowls dramatically.
"Gordon, talk us through these allegations. Why haven't you been able to produce the reports that would prove or disprove the claims?"
"Firstly, Ken, these rumors are scurrilous and completely unfounded. The delays are caused by one of those technical problems that can afflict any high-speed trading institution. Unfortunately, it appears to have led to data corruption in the reporting system."
Phil Factor sat bolt upright in his high-backed leather chair, launching the ferret across the room.
"I'd like to assure the public that an expert team of IT professionals are doing all they can to remedy the problem, and everyone responsible for these IT failures will be held fully accountable."
"How very convenient!" Phil banged his fist on the desk. "All the blame pinned on IT, again! How did Dave end up looking after a system like that anyway? I recommended him as a developer, but he's no DBA. And I very much doubt he's a team of experts."
Phil sighed heavily and sunk back down in his chair. The irked ferret left through the cat-flap in search of peace.
"No, my dear," Phil continued addressing his absent ferret. "I suspect they aren't keen to get that data back. Something tells me this corruption goes deeper than the database."
He picked up the special phone. It was time to get the DBA Team together.
Scene 5: A team game
Dave fidgeted in his desk chair. Still
DBCC CHECKDB marched, like Old Man River, relentlessly on. He glanced incessantly over his shoulder for his executioner from Central IT, but there'd been no one except a man in an outlandish Day-Glo shirt refilling the water coolers.
A prickling on his neck made Dave turn to find himself staring up at a muscular man crammed into a funereal suit, topped with a trilby.
"Alan Capon, Central IT." He smiled grimly, and didn't extend a hand.
"Ah yes, now, err… I've been…"
Alan cut him off. "Do you know what I love, Dave? Well-maintained databases. And baseball!" With a flourish he revealed a large bat. "Out on the field, you're part of a team. But if one man drops the ball… If he forgets, say, to perform his backups…. He lets the whole team down, and the whole team fails."
He swung the bat down heavily onto Dave's desk, leaving a deep impression in the flimsy chipboard.
Dave jumped out of his skin. "Now look, I do backup all my databases…"
"Do you, Dave? I hear we're in a real mess with this corruption problem, and why would that be if you have all your backups..?"
"Well, I've got the backups, but they're not…"
"No, let me guess, Dave. They're not VERIFIED!" He smashed the bat down again.
"I do verify them!" Dave yelped. "But for some reason it didn't report anything…"
"He's what happens to people who don't Back Up Databases"
Keep your data safe with the world's most trusted tool for SQL Server backup -
SQL Backup Pro
Alan was struck with a new idea. "I see, Dave. None of this is your fault. Dark forces are working against you. The people we really need are the hardware guys, am I right?"
"I spoke to them - they swear there's no problem with the IO subsystem."
"So this goes even deeper… Maybe we need the ones who designed this whole shoddy database system!"
Dave regarded him fearfully. "I don't know. I'm just one data guy trying my best. I don't know about dark forces, but… some things here just don't add up."
Alan regarded Dave for a long second. "Well, data guy, I think you're on the level." He ripped off his hat, wig, and prosthetic mask, and offered his hand with a smile. "Grant Fritchey, DBA Team. We're here to help."
Dave stared dumbly.
"Sorry about that," continued Grant. "We walked into a similar situation a few years back, and ended up being framed for a crime against database normalization we didn't commit. Had to make sure you weren't here to lure us in to a trap."
"Nice bat," Dave managed.
"I call him BUD. That's cause he's what happens to people who don't Back Up Databases…" Grant's eyes bulged dangerously again, before relaxing. Dave smiled weakly.
A faint neon glow suffused the cubicle. They turned to see the water cooler guy – none other than Steve Jones – treating them to the full Technicolor glory of his shirt.
"Low key, Phil said. Inconspicuous."
"The shirt, Steve. How many times?"
"People like my shirts. Dave, you like my shirt, right?"
Grant sighed. "Let's get to work."
Scene 6: Will the real Alan Capon please stand up
Gordon Knot gazed contentedly out of his top-floor office window at the magnificent view across the bay.
"It's perfect. We hire Dave, and tell the press it was on Phil Factor's recommendation. When Dave gets into trouble he calls Phil, who calls the A-Team, who just can't resist leaping in to save the day. Everything is unfolding according to plan."
He spun and fixed his gaze on the man sat the other side of his palatial mahogany desk. The real Alan Capon was dressed in a sharp charcoal suit, impassive, hands folded in his lap.
"I hear the DBA Team are already down there interfering, just as you predicted" he replied.
"Now it's just a mopping up exercise. The information in that data could be inconveniently… misconstrued. Make sure that nothing can be traced back to the bank, and that this data never again sees the light of day."
"It won't. And if they get in my way, neither will the DBA Team."
"Excellent. With the evidence safely beyond reach, we can begin our media campaign, framing Dave and the DBA Team as devious insider traders," Knot sighed contentedly. "Incidentally, I hear one of them does a rather good impression of you," he smirked.
"A pale imitation, I assure you," replied Alan without a trace of a smile. He stood to reveal the sturdy, dented bat nestling in his lap. "I'll pay a little visit to our tame sys admin, Dan – and then see about our friends."
Scene 7: Deep in Errors
Grant's gaze swept the room, taking in Dave and those few developers who hadn't made feeble excuses.
"Let's see if I've got all this straight," Grant boomed. "On this screen we have
DBCC running way long against the production database." He started counting out on his fingers: "And over here, Steve's attempting a test restore to standby from a 5 week old full backup, followed by the chain of log backups, right?"
Steve nodded. "Any cowboy used to wrestling panicking yearlings to the ground single-handedly ain't afraid of a few rearing log backups."
"Sure, Steve." Grant turned back to the room and continued counting off. "The last full backup is corrupt, but the
WITH CHECKSUM on the backups also reported nothing"
"Check," said Dave sullenly.
"We're getting serious application errors, but last week's regular
DBCC on the production database reported nothing."
"You won't have auditing in place till the next sprint, replication is still a backlog item, and AlwaysOn Clustering is a mystery to all of you. Am I right?"
Some of the developers looked like they might cry.
"OK, let's check this
DBCC thing out first." Grant opened an SSMS query window, turned on
traceflag 3604 and ran
DBCC DBINFO (N'DBNAME').
dbi_dbccLastKnownGood value says three weeks ago, not one." He fired off
sp_help_jobhistory and sighed. "The last
DBCC job didn't run, which might explain why you didn't get any email alerts."
"WHAT?! How? Did someone disable it?" cried Dave.
"A job that runs long or doesn't run at all can sting just as bad as one that fails. And none of you look like you can deal with the poison," sighed Steve. "You need a proper alerting system that checks for this sort of thing."
Dave sighed, crestfallen.
"OK, we know the last time the database was definitely good – three weeks ago," Steve continued. "So let's open the error logs, work back to that point, and see what we find."
They gathered around Grant's monitor to investigate the output for the most recent log. The problems weren't hard to spot.
"You need a proper alerting system that checks for this sort of thing."
Monitor your servers in real time with SQL Monitor
"824 errors – these are what we call soft I/O errors," Grant explained. "The OS serves a page back from disk and SQL Server detects a problem with it. It shows us the Database ID and page ID related to the I/O error. We've got 2570 errors here too – corrupt column values, and… HOLY COW!" exploded Grant. Everyone flinched.
"Look here! The
PAGE_VERIFY option changed to
NONE. That's why the latest database backups succeeded without a murmur, despite the corruption!"
Dave was aghast. "No! That's not right. That can't be right. I never changed it. It should be set to CHECKSUM!"
Steve let out a loud whistle and they all spun round.
"Well, if that don't sting like a kick in the butt from an angry mule." He sat back to reveal the error message unfurling on his screen. "Looks like we have ourselves a broken log chain, pardners."
"What's going on here?" wailed Dave. "This isn't possible…"
"It's as if someone switched the database to
SIMPLE recovery model and then back to
FULL, right after the last full backup. Look here, right in the error log."
"No way. This database always runs in
FULL recovery model!" Dave buried his face. "Although that would explain why the log backups failed last week."
"Something smells rotten, alright. Almost like someone didn't want you to find out about this corruption till it was too late to save the data. If only the logs could tell us who did this…"
"But how is all this related? How did the actual corruption happen?" mused Grant
"99% of the time, I'd say it was an I/O subsystem issue. Usually the magnetic-drive disks, but could be firmware, a system filter driver, network switch, RAID controller… anything," said Steve.
"The sys admin swears the whole I/O subsystem is clean as a whistle. They're lying, they must be!" cried Dave.
Grant blew through his cheeks. "Occasionally, I've seen in-memory corruption due to a faulty chip, but other than that, it could be someone messing with the data files – accidentally or deliberately. But if they were using a tool like
DBCC_WRITEPAGE, we'd see it in the logs too, and we don't."
The three men contemplated the enormity of the task ahead of them. After a moment, Steve broke the silence: "OK, we can restore from backup to the point the log chain broke, but not beyond. After that our log backups are useless, and our full and diff backups have corruption."
Grant looked glum. "I can't see any option but to run some sort of repair on the live database. We'll save as much as we can and transfer it over, but there's bound to be some data casualties."
"First, we need that
DBCC report. And if the problems are so bad
DBCC can't complete…" Steve trailed off. "Well, let's just pray it finishes."
Scene 8: Evil in the server room
Alan Capon sipped a cappuccino, more than satisfied with his "bit of mopping up".
"Everything is in place then, Dan?" Alan Capon smiled his first genuine smile of the day.
"Yessir! The latest controlled power surge this morning, through their SAN disk sector, was a biggie. The database more or less went pop!" Dan smirked. "So did Dave's eyes, I'll bet!"
"And they can't trace it back to us?"
"Nope. Random power surge, what could we do?"
"And the ‘necessary adjustments' to ensure the corruption spreads undetectably?"
ALTER DATABASE operations get logged to the default trace, but those files get overwritten within a day on this system. And changing the database recovery model?" Dan looked smug. "All the hallmarks of an errant administration by the Agilistas."
"Good. As soon as those DBA Team clowns realize the cause is lost and run the classic 'repair with data loss', the bank will be in the clear, and we'll have them exactly where we want them," Alan smiled. "And we'll be stinking rich, of course."
"Bwa-ha-ha-ha-ha!!!" Dan burst out in loud, demonic laughter.
"Calm yourself, Dan."
"Sorry. It just felt appropriate."
Scene 9: What the DBCC?
"Ping!" shouted Dave. "Perfect timing. I can't wait to see what this has in store for us."
The team scanned the
"Dave!" exclaimed Grant. "You ran
TABLERESULTS. Pretty smart for an agile data guy. Makes the output a heck of a lot easier to read."
Dave smiled for the first time that morning.
"You know," said Steve, as he surveyed the carnage. "Maybe we can fix some of this. All of these with
indexid= 2 are our non-clustered indexes, and the
DBCC might be able to repair those automatically, or we can just rebuild them."
"Sure," said Grant. "But these 2570 errors – I don't think
DBCC can repair those. We'd need to nose around with
DBCC PAGE, find the corrupt values and their associated key ids… and then we'd need to know what value to set it to."
"You know, the data in that particular table is fairly static," Dave offered, cautiously hopeful. "It could be possible."
"But what about these 824 errors?" said Grant. "I think we're looking at some sort of
EMERGENCY mode repair for them, and serious data loss."
Steve grabbed him by the shoulders. "Grant – are you thinking what I'm thinking? We could do with some moral support here. I hear Paul Randal is in town for an Immersion event, but I'm sure he wouldn't miss this party."
"Leave it to me," said Grant. He tapped out a tweet: "Dealing with some corruption issues here that even the great Paul Randal wouldn't be able to solve!"
Thirty seconds later Grant's phone rang.
"Hi Paul! Now, calm down… Hey, I never said you definitely couldn't fix it…"
A few moments later, Grant hung up and smiled. "He'll be here in 30 minutes. I love it when a plan comes together."
Steve clapped his hands. "Right, let's take a backup of the corrupt database and start repairing those NC indexes. If the bank wanted this data swept under the rug, they're going to be sorely disappointed."
Scene 10: Baseball karma has its limits
Alan stood in the server room, swinging his bat repeatedly in a precise arc. His baseball exercises always helped him focus, even when the server room was so tiny, so enclosed, so hot.
Suddenly Dan burst into the room, breathless. Alan focused on his swings.
"Some guy's just arrived – dressed weird – sounds like Sean Connery – says his name's Paul Randal. He's demanding to see the DBA Team."
"Do not let him into the building, you hear me?" Swing, swing. "He's the one man who could get that data back and ruin this whole thing."
Dan shuffled his feet. "He's… he's already in. Reception couldn't stop him."
Alan's bat clonked to the floor. He strode over to his desk. "On no account," said Alan, pulling a lovingly-maintained Tommy gun from his desk, "will he, or any data he might resurrect, leave this building alive."
Scene 11: Paul splash lands
"Paul! Good to see you," enthused Grant. "And still in your wetsuit!"
"I came straight from the beach," harrumphed Paul without slowing down. "Now, let's see this corruption that apparently I can't solve. It had better be good."
Everyone gathered behind Paul as he went straight to a monitor.
He examined the
DBCC output. "The page that got the 824 error, it's pointing to a different page that contains off-row data."
"Yep, in that particular table, the trading report data is stored in off-row LOBs," confirmed Dave.
"A lot of that data is now inaccessible because the owning data row is corrupt. If we run
DBCC repair on this page, those LOB values are going to get deleted and we'll lose them altogether."
Paul let this sink in, and gathered the team in close.
"I might be able to save this data, but the fix is hairy," he said in a low voice. "We'll need to create a dummy row in this table with an off-row LOB value, then find the linkage in the dummy row and use
DBCC_WRITEPAGE to overwrite it with linkage to the row we need to save from the corrupt row. How many pages are affected like this?"
"We had a look in
msdb.dbo.suspect_pages and there's only one," Steve said. "But the data… Well, it ain't no hill of beans."
Paul rolled up his sleeves. "OK, let's take another backup, put the database in
single_user mode and get started. You realize this is the end of this as a production database? We're just going to have to copy the data over to your restored database." Dave nodded solemnly.
Paul made a note of some of the relevant page and slot IDs and a text timestamp from the
DBCC output for one of the affected pages. The team stood mesmerized as Paul glided expertly around through
DBCC PAGE, grabbing the length of the LOB value, creating a dummy record in the table and timestamp offsets as he went, and converting them to hex before writing values out using
Finally he looked up and breathed deeply. "Let's see if we can get at that LOB data." He ran a select against the dummy row and all three gasped in amazement when the lost LOB data appeared on the screen.
"Get a backup of the database and shove a copy up into the cloud"
Keep a safe copy of your backups in the cloud with SQL Backup Pro
"The last thing we need to do is run repair and delete the dummy rows."
It was a wonder to behold.
"Grant, get a backup of the database with the recovered data, and shove a copy up into the cloud. I'll let Phil know what's been going on down here," said Steve.
Paul threw a small silver object towards Grant. "And throw a copy on here for good measure. 2 TB flash drive. Biggest in the business. With backup compression, it should fit."
Scene 12: Mexican standoff
"Time for a word with those SAN admins," cried Dave gleefully.
"Steady there, cowboy," said Steve, raising a hand. "You don't have any proper auditing yet, remember. We can't prove who's behind all this."
"Wait – we may not do full auditing, but I did set up a job that copies the files of the default trace off to a remote network share. It gives us a permanent record of all DDL changes on the server before the files get over-written."
"Like changes to page verification and the recovery model!" said Paul.
"Along with the domain name of the person who made the change. I read how to do it on Simple-talk. Good article, that."
"Maybe you do have a bit of DBA in you after all, Dave," Grant regarded him with a newfound respect. "Let's take a look then!"
Suddenly, Alan Capon filled the doorway, Dan hovering at his side.
"Excuse me, but this is corporate property," blustered Dan feebly. "I have the authority to requisition this important evidence so it can't be altered before the auditors see it." He insinuated himself in front of the terminal displaying the default trace data.
"The game's up!" shouted Dave. "We know it's you behind the database corruption, and that data proves it! Turning off page verification to cover your tracks, switching the recovery to
SIMPLE to break the log chain – of all the low-down, dirty tricks…"
"Pure supposition. You probably forgot to switch it back after an index rebuild, or some such. Typical clumsy DBA."
"I think you ought to move away from that terminal now, friend. You've done enough damage already," Steve cornered Dan menacingly. He glared back defiantly, but as Steve got closer, he shrank down in his chair, shielding his eyes.
"Hey," growled Alan. "Keep away from him with that shirt."
When Steve turned, he was looking straight down the barrel of Alan's Tommy gun.
"Stop right there, Alan," said Grant firmly, raising BUD over his shoulder. "I'm not going to let you do that."
Capon's gaze flicked between Grant and BUD. "You think your little stick scares me?"
Tension rippled through the claustrophobic room. Dave watched nervously from under a desk.
Suddenly Paul loomed over Alan. "See here," he said, pointing at his forehead and administered a swift Glasgow kiss. Alan crumpled. His gun went off. A bat swung wildly in the air.
When the pandemonium quietened, Dave emerged to find bodies sprawled, groaning on the floor, and the silver flash drive containing the backup loose on the floor.
He pocketed it and dashed from the building, gasping in the fresh air, but suddenly found himself blinded by spotlights. A line of officers stood outside, guns raised. Phil Factor stood to one side, quietly stroking a white ferret.
"Don't shoot," shouted Dave holding the flash drive above his head. "I'm a DBA!"
Scene 13: One week later
"Well, that was fun, pardners. We should do that again sometime!" Steve raised his glass of beer and the others followed suit.
"Fun? Steve, you got shot in the foot!" cried Dave.
"I dug out the bullet with the hoof pick on my Swiss Army penknife," Steve shrugged. "I've had worse."
"I'm getting too old for this," sighed Grant. "I'm giving up this DBA Team party and going to join a meditation group in the mountains. Get me some real peace and karma."
"What happened at the bank, Phil?" asked Dave.
"It was a tense meeting, but they eventually saw sense and told the press we were the team of experts they'd hired to fix things. For a handsome fee, of course."
"The entire IT staff of the Monte Bank had a whip round, and he ended up getting more than us. Nobody had laid out anyone from Central IT before. Paul is regarded as a god-like figure, even by people who've never heard of
"They framed Alan as a single rogue trader, and he fled underground. The bank claimed total ignorance and resumed trading the day after we recovered the data. Publicly, the bank are obsequiously grateful for our services. Privately, I suspect not."
"So is there a moral?"
"Don't blame the ferret for catching rabbits. It's his job."