SQL Server backup

Strategies for keeping data available - second in a series exploring major facets of SQL Server backup.

Strategies for keeping data available

Second in a series exploring major facets of SQL Server backup

In part 1 of this series, I wrote about the reasons you should backup. From losing your job, to killing off the company, to obeying the law, there are lots of reasons to backup in general, and to do SQL Server backup in particular.

Unfortunately, SQL Server backup is a lot harder than it used to be. Only a few organizations, such as hospitals or police stations, used to require that applications stay online 24 hours a day, seven days a week. Now, many organizations have that requirement.

Recently, I was helping a client with an operation that required a web site’s database to be offline for perhaps 15 minutes a day. This is a company that does minimal e-commerce, and virtually all of its customers are in the United States. I was shocked when I reviewed web site logs to see that, while the site was a lot calmer between, say, 3 a.m. and 4 a.m. Eastern time, it was by no means quiet. There were still tens of thousands of hits during those hours. Don’t these people sleep?

Because SQL Servers are often active 24 hours a day, a single backup per day during a lull in the action will not be sufficient. But how much backup is enough? I can hear you saying, “I have very important data, I can’t afford downtime!” But, before you dismiss the possibility of permitting any downtime or data loss, consider that it is fabulously difficult to be 100% certain that you will never lose any data or have any downtime under any circumstances.

My view is that the law of orders of magnitude applies when considering the costs of achieving different levels of data loss. (Consider the cost of a separate secure location at least 3 miles from your current office, round-the-clock production DBA-level manning, a full security model of who can run queries, fully redundant architecture, all the software required to make this work 100% perfectly, dedicated network line between your different mirrored servers, costs of configuration and testing, full security and background testing for anyone allowed to access the server, etc). My guess at these orders of magnitude can be seen in the table below:

Accepted data loss

Disruption period

Cost of maintaining

None under any circumstances

None under any circumstances


Exceptional circumstances
(1 hr. maximum)

Less than 1 hour in exceptional circumstances

Hundreds of thousands

Several hours of most recent data

Several hours

Tens of thousands

Days of most recent data





Little or nothing

Table 1: Estimates of the orders of magnitude in costs for different levels of data loss acceptability

Clearly, most organizations do not have the need nor resources to eliminate all data loss or disruption. They will likely conclude that it is acceptable to have several hours each day when data is not being backed up, risking the possibility of losing new data that comes in during that time.

In this article I will talk about backing up your SQL Server in the context of the sorts of circumstances that you will be using a backup.

What could go wrong?

In thinking about how much data loss and disruption your database can afford (and how much you can afford to spend to avoid it), you need to think carefully about the possible ways that data can be lost.

Data deleted by error or malicious action – Most of the time when I have needed to restore data, I have been the cause of the problem, either directly or indirectly. Recently, I ran a query to repair a live database. There was an error causing a number of duplicate rows. I ran the SELECT part of the query (the part that I would use for a NOT IN() clause in my DELETE statement) and got a number of rows that I considered reasonable. I used that as a NOT IN() clause for an update statement, and since I was certain that the query had to be correct, I did not, as I almost always do, run the query as part of a transaction, and then COMMIT or ROLLBACK as appropriate. I clicked the little green arrow in Query Analyzer, and to my horror saw that every row in the table had been deleted! I had used the wrong field in my SELECT list in the NOT IN() clause in my WHERE predicate, and so created a query that deleted everything.

Fortunately, I had a backup, and with limited disruption I was able to recreate the data.

A majority of data loss happens due to errors such as this, rather than through intentional actions of people wanting to hurt you or your data. Hanlon’s Razor states: Never attribute to malice that which can be adequately explained by stupidity.

Preventing the sort of inadvertent damage that I caused can be handled in a number of ways. First, this DELETE query should have been run on a test database first, even if that required backing up and restoring the database to another machine. Next, I should have run the SQL statement within a transaction. Finally, in many cases, I (as a developer) should not be allowed to get anywhere near production data.

Malicious action is much less frequent, and there are a number of things you can do to prevent it. A basic step is physical security of the database server. If a potential attacker cannot get access to the machine directly (meaning that the appropriate ports are closed down on your firewall), then they will need to work a little harder.

One common problem is that backups are often from a time period where the problem already existed, and will not contain the undamaged or missing data that you want to restore. If data related to a monthly process is damaged through user or programmer error, for example, by the time the monthly process is run, it is possible there will not be backups of the table just before the problem occurred.

Hardware failure – This is what many people worry most about when they consider backing up SQL Server data. There are a number of ways that hardware problems can require restoring a backup.

First, there is the ever-popular disk failure. Hard disks spin at anywhere from 5,000 to 15,000 RPM. Bad things can happen quickly with this kind of mechanical movement. Less likely is a failure of the hard disk controller or motherboard. In any case, you will likely not be able to directly get at the data on the hard disk. This is where having backups on a different physical volume, if not a different device or on removable storage, is critical.

One common problem when this sort of failure occurs is the lack of hardware to quickly restore the backup. If your database server goes down, how long would it take to restore the database onto a new device, and possibly change the programs involved to point to the new server?

Catastrophic site failure – The scenario that is most damaging is often the least thought about. The book Microsoft SQL Server 2000 Unleashed refers to this sort of catastrophic failure as “the Scud test.” If a missile hit your building, could your database, even your business, survive? While this may have been an abstract concept just five years ago, as I write this I am less than 60 miles as the crow flies from where the World Trade Center (WTC) towers once stood. In addition to the undeniable human toll, many businesses that survived 9/11 discovered that their data was gone. ” Here is a sampling of some of the companies impacted by the WTC disaster, along with information on how well their planning worked: http://www.business2.com/b2/web/articles/0,17863,514202,00.html. The impact ranged from virtually none to total data loss. Since 2001, many of the backup plan improvements triggered by 9/11 have been exercised in the face of events such as wildfires, hurricanes and floods.

More mundane, but equally destructive, can be catastrophic power or connectivity failures. One client of mine happily moved all of its database and web servers in-house from a co-location site, only to move them back after a 36-hour power failure that knocked the company off the Internet. While the servers could have been powered up briefly to copy off the databases and applications, the company did not have any convenient way to get that data back on the web.

In the case of a catastrophic site failure, how convenient would it be to get access to your off-site backups? How easy would it be to recreate your machines (all of them) when you have nothing to start out with?

Totally preventing significant disruption in light of this sort of catastrophic site failure is what can move your backup plans into the millions of dollars category.

So, what is backup really?

In my view, backup is more than just the actual backup of data to a disk or device. I am really talking about the strategies you can use to allow your data to remain available with the least possible disruption.

The best way to handle backups is to prevent the loss in the first place. You cannot prevent most catastrophic site failures, but you can mitigate more minor site failures by providing lots of uninterruptible power supplies and redundant Internet access. In the event of a fire, flood, hurricane or terrorist attack, you likely have to rely on backups.

Next: Building from ground up to prevent data loss

Properly setting up hardware is the first step in keeping your data safe. Decisions you make as you are specifying hardware and setting up the databases can make you a database hero (and perhaps even the next Database Geek of the Week) or make you the scapegoat when it all goes wrong. Your average hardware guy does not know SQL Server, and your average DBA does not know hardware. Being the one to understand enough about both worlds can make a real difference in the outcome.