Online Exchange Backups

In the third of Jaap's popular series on Exchange Backups he explains Online backups. These have major advantages over offline backups since the backup application does all the logic, and work, for you.

In my previous article, Exchange Server Log File Replay, I explained how to create an offline backup, how to check the database for consistency, how to purge the log files in a safe way and how to restore an offline backup. The major disadvantages are that the database has to be offline, it’s a lot of work and you need a very good understanding what you’re doing.

A much better solution is to use online backups. Online backups do all the work, check the database and purge the log files. And they do this all online, so there’s no interruption for the end users.

NTBackup

Windows NT, Windows 2000 and Windows 2003 have a neat little backup utility called NTBackup. NTBackup is a lightweight version of a (very old) Veritas BackupExec version. But it’s cheap and it does precisely what we can expect from a backup solution.

When installing Exchange Server on a Windows 2003 Server the NTBackup application is extended with two ESE DLL’s which makes it possible to create online backups from the Exchange Server. The process is the same for all backwards versions of Windows and all backwards versions of Exchange Server. Unfortunately Windows Server 2008 is configured with a new backup application (it’s a feature that has to be installed separately) that can create snap-shot backups of your Windows Server. It does not contain a plug-in for Exchange Server, so when you run Exchange Server 2007 on a Windows Server 2008 machine you have to buy a 3rd party application to backup your Exchange databases.

In NTBackup and all other streaming backup applications there are four types of backups:

  • Full backup – makes a backup of the entire mailbox database and purges the log files;
  • Incremental backup – only the changes made since the last full backup are backed up. Since all chan­ges are written to the log files only the log files since the last full backup are backed up. When finished they are purged.
  • Differential backup – only the changes made since the last full backup are backed up, but the log files are not purged.
  • Copy backup – this is the same as a full backup, but it does not interfere with your normal backup cycle, i.e. the header information is not updated and the log files are not purged.

Creating a full backup

NTBackup creates a backup at the ESE level. This means it accesses the database through the ESE engine and not on the file level. When opening NTBackup you have to select the Microsoft Information Store. Do not select the file “mailbox database.edb” from the disk. Although this will put your mailbox database in the backup set it will not do the necessary maintenance and it does not prepare for restoring your file.

585-image002.jpg

Figure 1. For backing up the Exchange databases select the “Microsoft Information Store”

When you start the backup the following things are happening:

  1. The current position of the checkpoint file is recorded. This location is needed for purging purposes; all log files older than the current log file can be deleted when the backup is finished. This location is recorded in the header of the database in the “Current Full Backup” section;

     

  2. NTBackup begins the backup operations and it starts reading the pages in the database. It starts with the first page and continues until it reaches the last page at the end of the database. During this operation all pages are checked for their checksum before they are streamed to the backup media. If a page fails its checksum test the backup operation is aborted and an error is written to the Windows Event log.
    New pages that are created during the online backup will still be flushed to the database, even when they are flushed to a portion of the database that already has been backed up. This is no problem since all transactions are still written to the log files and thus create a recovery path. During a restore the Exchange server will correct this during a so called ‘hard recovery’.

     

  3. When all database pages are written to the backup media the database is safe. All information that’s written to the log files needs to be written to the backup media as well. To achieve this, a “log file rollover” is forced. This means that the current log file E00.log is closed (or E01, E02 etc., depending on the storage group), a new log file is created and the lGeneration number (check my first article on the lGeneration number) is increased. The log files from the point recorded in step 1 until the last log file created in step 3 are now written to the backup media. Because of the log file “roll over” you will never see the E00.log file in the backup set.

     

  4. All log files prior to the point recorded in step 1 are now purged from the log file disk.

     

  5. The “previous full backup” section of the database is now updated with the last information when the backup was running

     

  6. NTBackup is now finished with the backup of the database.

     

When checking the database header after creating a streaming backup you will see something like this (with irrelevant information removed):

Creating an incremental backup

An incremental backup can only be created if a previous full backup is performed on the Exchange database. The process of NTBackup creating an incremental backup is as follows:

  1. The backup session is initialized and the ESE engine is contacted. The location of the checkpoint file is logged in the Current Incremental Backup section.

     

  2. A log file roll-over is performed, forcing a new log file to be created.

     

  3. All log files up to the new log file are written to tape.

     

  4. All log files are purged from the disk.

     

  5. The Current Incremental Header section of the database is updated.

     

  6. NTBackup is now finished.

     

If you check the header information of the database again you will see both the header information of the full backup as well as from the incremental backup:

Note. The process for a differential backup is identical to an incremental backup, except that the log files are not purged.

-1018 Errors

One of the tasks performed by a streaming backup is a checksum check on all pages being streamed to tape. As explained in my previous article an incorrect page can be in the database, but as long as the page isn’t touched by the Exchange server you would never know. If the backup touches the page it sees that the checksum isn’t correct. What will happen depends on the version of Exchange:

  • The release version of Exchange Server 2003 and earlier
    The original release of Exchange Server 2003 (and earlier) do not contain any error correcting code for CRC errors. If an issue is detected the backup will fail and an error with Event ID 474 is logged in the eventlog. In the description you can read:
    “The read operation will fail with error -1018 (0xfffffc06)”.
    As per Microsoft knowledgebase article 867626 you have to perform a database repair action or restore the database from a previous backup.
  • Exchange Server 2003 SP1 and later
    Service pack 1 for Exchange Server 2003 contains error correcting code for checksum errors and thus can handle database pages that have an incorrect checksum. A streaming backup will check the pages and will notice any inconsistencies. Owing to the error correcting code in SP1 the backup application will continue, but will fix the page and write a notice in the eventlog with Event ID 399:
    “Information Store (2672) First Storage Group: The database page read from the file “G:\SG1\priv1.edb” at offset 6324224 (0x0000000000608000) for 4096 (0x00001000) bytes failed verification. Bit 24032 was corrupted and has been corrected.”

A lot of detailed information can be found in the Microsoft Exchange Server 2003 SDK Documentation which can be found here. Also Jerry Cochran’s book “Mission-Critical Microsoft Exchange 2003: Designing and Building Reliable Exchange Servers” is a very valuable source of information as it describes backup technologies from a programming perspective. It is sold for example via Amazon:

Online restore

When performing an offline backup you have to take care of the database, the log files, and the checkpoint file and take all the necessary steps in the correct order. An online backup does all the dirty work.

When performing an online restore make sure that you mark the database for being overwritten. This is a property of the database and can be set using the Exchange Management Console:

585-image004.jpg

Figure 2. This database can be overwritten by a restore

In NTBackup select the “Restore and Manage Media” tab and select the appropriate full backup set. After clicking on the restore button you are presented another, important Window:

585-image006.jpg

Figure 3. Is this really the last restore set? If so then checkmark the checkbox

If you want to restore an incremental backup DO NOT SELECT the “Last Restore Set”. If you do the database is being hard recovered immediately after the restore is finished and you do not have the option anymore to restore an incremental backup.

But also if this is the last restore set I’d like to not set the checkbox. This allows the possibility to check the database and the log files before a hard recovery is started. The log files that were written to the backup are restored in the Temporary location path c:\temp.

When the restore is finished you will see all the log files in the directory c:\temp\first storage group (or any other storage group that you’ve restored) and a file called restore.env. This file contains the information needed for hard recovery and can be read using the ESEUTIL tool:

The process to fix the database with the log files that were restored from backup is called hard recovery. You can manually start the hard recovery using the ESEUTIL /CC command. This will replay all log files in the temporary directory into the database. The database itself is already in the correct location as you can see in the output above. When more log files exist in the normal production environment beyond the point of backup (and thus beyond the point of restore) they will be replayed into the production database as well. This will bring your database into a consistent state up to the most current point possible.

If you set the checkmark at “Last Restore Set” in Figure 3 this will happen automatically. The database will be hard recovered and all additional log files will be replayed as well. When finished the database can be mounted automatically as well.

Note. This process is the same for all backup applications from major vendors that support streaming backups with Exchange Server.

VSS or snapshot backups

With Exchange Server 2007 Microsoft is shifting its backup focus from the traditional online streaming backup to VSS or snapshot backups. Why do I still spend a serious amount of time on streaming backups? Because the underlying ideas are still very important and it gives an understanding what steps to perform in a VSS or snapshot backup.

Note. NTBackup cannot create VSS or snapshot backups from an Exchange Server database. It does however contain functionality to create a file level VSS backup. If you see a snapshot backup of your Exchange Server database in NTBackup it is very likely that the Exchange Server database is selected on the filesystem level and not on the Information Store level!

A snapshot is just a point-in-time and at this point-in-time an image is created. This image can be used to roll back to in case of a disaster. The Volume Shadow Copy Service in Windows Server 2003 and later provides an infrastructure to create these point-in-time images. These images are called Shadow Copies.

There are two kinds of Shadow Copies:

  • Clone (Full Copy or Split Mirror) – a complete mirror is maintained until an application or administrator breaks the mirror. From this point on the original and clone are fully independent of each other. At this point it is effectively frozen in time;
  • Copy on Write (Differential Copy) – a shadow copy is created that is a differential rather than a full copy of the original data. Using the Copy on Write a shadow copy of the original data is made before the original data is overwritten. Effectively the backup copy consists of the data in the shadow copy combined with the data on the original location. Both need to be available to reconstruct the original data.

The Volume Shadow Copy Infrastructure consists of the following components:

  • Requestor – the software that invokes the VSS and creates, breaks or deletes the shadow copy. Typically the Requestor is the backup application;
  • Writer – a software part that is provided by an application vendor, in our case this is provided with the Microsoft Exchange Server. A writer is responsible for providing a consistent point-in-time image by freezing or quiescing the Exchange Server at the point-in-time. Please note that an Exchange writer is provided for Exchange Server 2003 and higher;
  • Provider – the interface to the point-in-time image. This can either be on a storage array (hardware provider) or in the Operating System (software provider). Windows Server 2003 provides a software provider with VSS functionality out-of-the-box.

585-image008.jpg

The following steps occur when a VSS backup is performed:

  1. The requestor, i.e. the backup application, sends a command to the Volume Shadow Copy Service to take a shadow copy of the Storage Groups;

     

  2. The VSS service sends a command to the Exchange writer to prepare for a snapshot backup;

     

  3. The VSS service sends a command to the appropriate storage provider to create a shadow copy of the Exchange Storage Group. This storage provider can be a hardware storage provider or the default Windows storage provider;

     

  4. The Exchange writer temporarily stops, or quiesces the Storage Group and puts them in read only mode and all data is flushed to the database. Also a log file roll-over is performed to make sure that all data will be in the backup set. This will hold a couple of seconds for the snapshot to be created (in the next step). All write I/O’s will be queued;

     

  5. The shadow copy is now created;

     

  6. The VSS service releases the Exchange server to resume ordinary operations and all queued write I/O’s are completed;

     

  7. The VSS service queries the Exchange writer to confirm that the write I/O’s were successfully held during the shadow copy creation. If the writes were not successfully held it could mean a potentially inconsistent shadow copy, the shadow copy is deleted and the requestor is notified. The requestor can retry the shadow copy process or fail the operation;

     

  8. If successful, the requestor verifies the integrity of the backup set (the clone copy). If the clone copy integrity is good the requestor informs the Exchange Server that the backup was successful and that the log files can be purged.

     

Note. It is the responsibility of the backup application to perform a consistency check of the shadow copy. The Exchange writer does not perform this check. This is also the reason why you have to manually copy the ESE related files like ESE.DLL to the backup server.

Steps 1 through 6 usually take about 10 seconds, this is the time needed to create the actual snapshot. This is not the time to create a backup though. A backup application still has to create a backup to another disk or to tape, which still can take hours to complete depending on the size of the databases.

When the backup application has finished the header information of the Exchange database is updated as well. Using ESEUTIL /MH as explained in my previous article you can check the status of the database:

Note. This screenoutput has been edited for readability.

Microsoft does not currently offer a GUI-based VSS requestor for Exchange and using command-line tools puts you in a difficult support situation in regards to creating restores. Microsoft is working on a solution which can be read on the  Microsoft Product Group teamblog:

 For testing purposes Microsoft offers the Exchange Server 2007 Software Development Kit, the accompanying documentation can be found on the Microsoft website: the Microsoft website:.

Microsoft also provides a VSS Software Development Kit. This SDK contains a very small test tool that is able to create VSS backups of the Exchange Storage Group. This command line tool is called BETest and can be used for test, development, troubleshooting or demonstrating VSS and Exchange 2007. More information regarding BETest can be found  here. The Exchange product team released a blog on troubleshooting VSS issues which can be found  here.

Using the BETest tool you can make very basic VSS backups. The backup is written to disk (on a location you can enter on the command line) and the log files are purged. Since the responsibility of performing a consistency check is at the backup application this check is not performed when using BETest.

585-image010.jpg

Figure  4. the backup on disk after using the BETest tool. The original database location was on disk k:\, the log files were on disk L:\

When the database contains a faulty page you will never notice during the backup. Even the backup of the database on the disk will contain the faulty page. You have to manually perform a consistency check on the backup of the database using the ESEUTIL tool with the /K option:

The Event log shows all VSS steps performed by the BETest tool, but since no checking was performed nothing is logged in the Event log about a possible corruption!

Since this is a correctable error you can still access the mailbox and access the message that contains this faulty page. This is automatically corrected by Exchange Server. When this happens a message is logged in the Event log with Event ID 399:

When using an Exchange Server 2007 Continuous Cluster Replication (CCR) solution a VSS backup solution is even more interesting. When the backup application is CCR aware it is possible to create a VSS backup against the passive copy of the CCR cluster. End users will never even notice a backup is created because all actions are performed against the passive copy. Any performance impact will only be noticed on the passive copy, where no users reside.

VSS Restore

A Volume Shadow Copy backup is created on a storage group level. Exchange Server 2003 is very rigid on restoring, it can restore to the same location, to the same server and only on a Storage Group level. Exchange Server 2007 is much more flexible, it can restore on the same location, but also to other servers or to a Recovery Storage Group for example.

Restoring a backup is a straight forward process and basically the same as an online restore. The database is restored, the log files in the backup set are also restored and hard recovery takes place to bring the database in a consistent state. If needed additional log files can be replayed as well to bring the database up-to-date to the last moment possible.

Replication and backups

One might ask if the replication technology is a good alternative for creating backups? The answer is simple and short: NO.

Since there is a copy of the database in an Exchange 2007 Cluster Continuous Replication solution, there is some more time available in a disaster recovery scenario. When a database crashes there’s a copy of the database available to resume operations. But deleting messages or deleting mailboxes is a legitimate action from an Exchange point of view, and these actions will be replicated to the copy of the database as well. Also the offsite storage that’s possible with backups (to tape) is a very important factor in creating backups.

3rd party application vendors

Microsoft does offer a VSS backup solution for Exchange Server via the System Center Data Protection Manager (DPM) 2007, but Microsoft does not offer a VSS backup solution for Exchange Server out-of-the-box like for example NTBackup. If you don’t want to use DPM 2007 but you want to use a VSS backup solution for Exchange Server you have rely on a 3rd party solution. Microsoft has several partners working with the Exchange team in Redmond to build really great products, each with its own feature set.

Conclusion

Creating online backups has major advantages over an offline backup solution since the backup application does all the logic for you. It also checks the database for any inconsistencies and if checksum errors are found they are automatically corrected before the data is sent to the backup set.

Recovery from an online backup is much easier than recovery from an offline backup. The database and the log files are automatically recovered from the backup set using the so called hard recovery. Additional log files that were created after the last backup set was created are automatically replayed, bringing the database up-to-date to the last moment.

A streaming backup is a default Windows 2003 solution, but Microsoft is shifting its focus from the streaming backup to the VSS (snapshot) backup due to the dramatic performance increase of VSS backups. Microsoft offers the System Center Data Protection Manager which can make backups, but it works very different from the standard VSS Backup solution that 3rd party vendors offer.

In these three articles (Exchange Database Technologies, Exchange Server Log File Replay, and this one) I have explained some basics about Microsoft Exchange database technologies, replaying log files, recovery techniques and backup and restore solutions.

It’s your turn now to start working with this information in a lab environment and start thinking about a disaster recovery solution for your Exchange Server environment. Document all the steps needed when a disaster strikes and perform regular fire drills on disaster recovery. Only this will help you recover more quickly when something bad happens in your Exchange Server environment.