Designing Databases for Rapid Resilience

As the volume of data increases, DBAs need to plan more actively for rapid restores in the event of failure. For this, the intelligent use of filegroups is important, particularly when the Enterprise Edition of SQL Server offers the hope of online restores. How, though, should you arrange your data on the different filegroups? What happenens if the primary filegroup gets corrupted? Why backup and restore indexes?

Because the volume of data is increasing relentlessly, we are forced to change the way we store our data to guarantee that it stays available. In this article, I will focus mainly on why a good physical design of a database is vital for its availability, and, in turn, to the well-being of the entire organization.

First, let’s create an imaginary context for the subject matter of this article involving a company that resembles no particular existing company.

  • A company called XPEF works within the financial sector
  • XPEF has a client base of 1.5 million private users, and some 100 businesses, all of which rely on XPEF’s data
  • Aside from the busy OLTP system that serves the clients and the business, the XPEF relies heavily on a ReportMart database. Data is offloaded nightly to it from the transactional system; the data in the ReportMart is used for various purposes, and it is the backbone of the organization’s backoffice functions (customer services claims, reporting, fraud analysis, trending, client data analysis for offloading to other vendors etc.)
  • Finally, let’s suppose that the ReportMart database grows at the rate of at least 5 million rows per day, and the demand is to keep at least 3 years of data at all times, of which the past 365 days should be accessible for making data changes, and the rest of the data should be Read-only.

After a heated discussion, the following guidelines are established and given to the database team to implement:

  • The data from the past night’s load is not critical, because only 1% of the daily work is related to it: In the case of data loss, the batch job which does the daily import can bring the data from the main system in less than 10 minutes
  • The data from between 2 days ago to 1 month ago is vital to the daily operations: In case of data loss it must be restored within 1 hour, otherwise there will be heavy penalties from other vendors and from the state. About 80% of the workload revolves around this data.
  • The data from between 1 month ago to 1 year ago is important for trend and fraud analysis, but since it represents only about 15% of the workload, it needs to be brought online within 4 hours in case of the data loss.
  • The rest of the data is important but, since it consumes only about 4% of the daily workload, it can be restored within 8 hours in the event of data loss.

Having said this, here is how the database team sees the requirements and the challenges:

  • Regardless of which segment of the data fails, make sure that it is restored as fast as possible and without affecting the availability of the rest of the data
  • The priority is to have the data from 2 to 30 days back available first, then the data for the past year, then the ReadOnly data and lastly, the data from the past 24 hours.
  • The database team needs to identify, and address, a potential data loss before the business administration detects it, and they need to maintain data availability proactively.

The database team knows that, in order to meet all requirements, they need to take advantage of the piecemeal restore.

Only the Enterprise edition of SQL Server provides Online restore functionality, i.e. the rest of the database may remain Online and operational while one of the filegroups is restored from a backup. The recovery time is very dependent on the speed of both the backup media storage and the destination system storage. The most important aspect, however, is this: How much data is to be restored at once, and how can the data be ‘split’ in a smart way in order to restore without a tremendous time penalty?

The answer lies, to a great extent, in the smart use of filegroups!

From here on this article will cover the different options the XPEF DBAs have in regards to ‘splitting’ the data and backing up / restoring it.

We will cover several scenarios, i.e.:

  • Scenario 1: All user data is in the PRIMARY filegroup
  • Scenario 2: All user data is in a single filegroup, different from the PRIMARY
  • Scenario 3: the user data is split in different filegroups by importance and by recoverability priority

After covering these scenarios, we will cover some important points about backups in general, proactive monitoring and caveats of smart database modeling.

Scenario 1: All user data is in the PRIMARY filegroup

Right off the bat, we can say that keeping all user data in the PRIMARY filegroup is unnecessary, and even comes with potential business risks. The reasons are simple:

  • With today’s constantly growing volume of data, no one can afford to continue to keep inexorably expanding data in one filegroup.
  • The more data we have in a single filegroup, the longer time it will take before it is available to the end users.
  • The PRIMARY filegroup contains metadata and vital system data, critical to the proper operation of the database (think about it – the PRIMARY filegroup contains up-to-date references to all the objects in the database; whereas your backup may not contain the latest changes to the system objects).
  • And finally, the big question: what happens to the database if there is data corruption on page(s) in the PRIMARY filegroup and there is no other choice but to restore the PRIMARY filegroup? What happens to the rest of the filegroups and the objects within them?

We will come to the rather alarming answer to this question in due course.

The basic steps of a piecemeal restore:

The basic idea of a piecemeal restore is to partially perform maintenance on a database by recovering one or more of its filegroups from a full or differential backup.The sequence in which the filegroup restore is done depends on the database recovery model (FULL/BULK, SIMPLE or changed from FULL to SIMPLE) and on the SQL Server edition (Enterprise or other). For this article I will discuss the FULL recovery mode with Enterprise edition. (This setup gives the most colorful experience, since it uses the logs and the ONLINE availability.)Here is an outline of the restore sequence of a database in FULL recovery in Enterprise edition:

  1. Mark the damaged filegroup OFFLINE:
  2. Restore the filegroup only from a backup file (full, then differential)
  3. Backup the tail of the log of the damaged database by running:
  4. Then restore all log backups WITH NORECOVERY since the last FULL or DIFEERENTIAL backup
  5. Restore the tail of the log WITH RECOVERY, which brings the recovered filegroup ONLINE

So, let’s get back to the big question: ‘what happens to the database if there is data corruption on page(s) in the PRIMARY filegroup or if the disk drive fails and there is no other choice but to restore the PRIMARY filegroup? What happens to the rest of the filegroups and the objects within them? ‘As you may have guessed: a disaster happens. When the primary filegroup is damaged, the entire database becomes unreadable since the metadata is damaged.

When this happens, you may see similar messages in the SQL Server logs when trying to access any objects within the database:

Error: 823, Severity: 24, State: 2.The operating system returned error 1006(The volume for a file has been externally altered so that the opened file is no longer valid.) to SQL Server during a read at offset 0x00000000148000 in file ‘P:\XPEF1.mdf’. Additional messages in the SQL Server error log and system event log may provide more detail. This is a severe system-level error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

That DBCC CheckDB is going to take a long time. To make matters even worse, you’ll find that, the entire database has to go OFFLINE in order to restore the PRIMARY filegroup from a backup. There is no way to restore the PRIMARY filegroup and have all other data ONLINE.

Scenario 2: All user data is in a single filegroup, different from the PRIMARY

So, as we have already seen in Scenario 1, it is completely inappropriate to keep all data in PRIMARY filegroup, but is it enough to create another filegroup, let’s call it UserFileGroup1, and keep all data there instead?

Well, it is definitely a step forward, but not too big a step.

As mentioned earlier, the XPEF company of our example has clear definitions of the data ‘weight’ according to the age of the data itself.

So, if we just create the UserFileGroup1 and place all data there, we avoid the problem of data corruption in the PRIMARY filegroup, but we still have not solved the problem of how long it will take to restore the UserFileGroup1 in case of a disaster.

Hence, let’s look at a third scenario and try to solve the backup and recovery problem.

Scenario 3: the user data is split in different filegroups by importance and by recoverability priority

As mentioned earlier in the business requirements, the data has different value to the business, according to the age of the data.

The data from 2 days ago to 1 month old is the most critical to the business (80% of the workload depends on it!), hence it has to be restored first. This means that in order to be restored fast, the data has to be stored in its own filegroup. Let’s call it UserFileGroup1.

In second place, the data from 1 month to 1 year old has to be restored (only 15% of the workload depends on it), so it has to be in its own filegroup. UserFileGroup2.

In third place, the data older than 1 year has to be restored (only 4% of the workload depends on it). It has to be in its own filegroup. Let’s call it UserFileGroup_Historical.

And in the end, there is the data from the past 24 hours, which is of the least importance to the business. This data can be in the UserFileGroup1 filegroup.

By ‘splitting’ the data in such manner, we guarantee that the data will be restored in accordance to the SLAs and the company will save money and time.

Easier said than done:

Of course, I specifically did not promise in the beginning of this article that it was easy or cheap to segregate data by importance and have a resilient database system. Here are a few points which can be used as food for thought, especially since the value of data is different for each company / organization.

  • On creation of tables and indexes we can specify which filegroup they should reside on
  • The Standard edition of SQL Server supports filegroups, but only in Enterprise edition can a table be ‘spread’ over several filegroups; in Standard edition each table and index belong to only one filegroup
  • Each company values the data differently, and it is not necessarily by the age of the data, as it is in the example above; some other company may want to ‘split’ the data according to a region, or by a specific customer / vendor, etc. It all depends on the definition of the data’s importance to the specific company.
  • Non-clustered indexes save us a lot of IO when they exist, but they generate a lot of IO when they are restored from a backup. Consider keeping backups without the non-clustered indexes and after restore, just re-create them from scripts
  • In Standard edition there is no ONLINE restore (i.e. while restoring one filegroup the rest of the database cannot be ONLINE at the same time), however it is still faster to restore only a portion of the database which is most needed and most critical than to have to wait to restore the entire database
  • In keeping all data in a single filegroup (whether it is the PRIMARY or not) there is a single point of failure; in case of data corruption in the backup file (yes, it does happen!), the ‘healthy’ filegroups can be restored from backup and the problematic one can be worked on at a later time
  • Each filegroup can have one or several files which means that the different filegroups can be ‘assigned’ to different performance speeds; this means that you don’t have to buy the fastest disks for your entire data, but only for the most important part. In the example above, only the UserFileGroup1 in Scenario 3 will reside on a SSD, and the UserFileGroup_Historical may reside even on a much slower cheaper disks (hopefully not on an external USB harddrive )

Lessons learned: Important points about backups in general, proactive monitoring and caveats of smart database modeling

And if this article so far is not enough of a reason to start separating the User Data from the PRIMARY filegroup, here is another reason: imagine having to restore 3 years of historical data all at once while your phone rings every 5 minutes with the question: “When is ANY data going to be available ONLINE?”

The bottom line is this: when designing your databases try to create separate filegroups according to purpose, volume and SLA requirements.

Think hard before designing tables and indexes: how much data, how important, what restore time is required and so on.

Also, make sure to set up SQL Server Agent alerts for high severity errors, and make sure you act immediately when you get the alert. The Severity 24 error, for example, is the only way to find out before the users that there is a problem with your database – data corruption or potential hardware failure. Keep in mind that if the data is in the cache the users may not find out about the hardware failure for a while, but the error log will have indications of the problem.

How you log in to Simple Talk has changed

We now use Redgate ID (RGID). If you already have an RGID, we’ll try to match it to your account. If not, we’ll create one for you and connect it.

This won’t sign you up to anything or add you to any mailing lists. You can see our full privacy policy here.


Simple Talk now uses Redgate ID

If you already have a Redgate ID (RGID), sign in using your existing RGID credentials. If not, you can create one on the next screen.

This won’t sign you up to anything or add you to any mailing lists. You can see our full privacy policy here.