You’re backing up your system. You’re monitoring the processes and the disk space. You’re even validating your backups, so you’re feeling like you’re in a pretty good spot.
And then, that day comes when you’ve got an emergency. You go for the backups, but…which full backup works with this set of log backups? What’s on this backup file anyway? Isn’t one of the systems configured to place multiple backups within a single file? Which one do you need? How can you even tell?
You might even be in a situation where someone handed you a backup from who-knows-where, and they want a restore run from it. How the heck do you get information about what’s in a backup?
Personally, I’d start with the backup header. There is an incredible amount of information stored in a backup header. Some of it is absolutely vital to your processes. Some of it is obscure and only occasionally useful. Some of it is completely trivial. Let’s explore what we can get.
Retrieving backup information
We’re going to look at three different commands: LABELONLY, HEADERONLY, FILELISTONLY. These three read the information stored about the backup in the backup file and display it all for you. The trick is to run a RESTORE operation from your backup to get the information. Let’s start with LABELONLY:
RESTORE LABELONLY FROM DISK = 'd:\bu\mm.bak' ;
The ouptut from my backup file is as follows:
|SoftwareName||Microsoft SQL Server|
This provides a great overview of what is on this backup file. Some of it you’ll never use, but most of it is useful. We’ll go through the information piece by piece, Starting at the top, with the first column. If you provide a name for your backup like this:
BACKUP DATABASE MovieManagement
TO DISK = 'd:\bu\mm_labeled.bak'
WITH MEDIANAME = 'BackupsToTheDDrive',
MEDIADESCRIPTION = 'Portable Drive Used for Backups',
Then you’d see that information in the first column. It gives you the chance to define a media set for your backups (i.e. to state explicitly which set of backups a particular backup file belongs to), and provide a description of the file through MEDIADESCRIPTION. If you have multiple backups in the same media, that is, tapes or disk, (not something I generally do), then you can label the backups themselves as one thing, through the NAME option, and describe the media separately through the MEDIANAME option.
The MediaSetID is a GUID that uniquely defines this set. What can you use it for? I’m not sure. But, it will also be recorded in msdb so you can link a backup to a particular server, for example:
FROM dbo.backupmediaset AS b
WHERE b.media_uuid = '720C1417-3889-4730-9D33-74EBF11E854A' ;
FamilyCount, FamilySequenceNumber, MediaFamilyId, and MediaSequenceNumber all come in handy if you perform your BACKUP operations to multiple locations at the same time. If you do, those locations make up a family. You can tell whether a particular file has a family because FamilyCount will be larger than 1. If it does, you’ve got the GUID of the family to hand, so you can track down more information as part of your restore process. Unsurprisingly, MediaSequenceNumber tells you whether the backup you’re looking at was the one that was made first, second, third, and so on.
MediaLabelPresent is a bool, and it tells you whether you’ve used a Tape backup label or not. If not, you can provide a MediaDescription like I did up above.
The SoftwareName and SoftwareVendorId come from the backup software that wrote the backup… except I can’t find evidence that this changes. I’ve run backups from a couple of different vendor tools and this always stays the same. I also searched on Boogle (or was it Ging?) and couldn’t find any evidence of it being anything other than the values you see above. I’d say, overall, this one is useless until I see evidence otherwise.
The last few are simple, but useful. MediaDate is when you created the file. Not confusing at all. MirrorCount is an indication of whether you used mirroring when running this backup file. Finally, IsCompressed says whether or not this backup file was compressed using SQL Server compression. This is great to know if you need to restore the backup to a system that doesn’t have compression.
In short, LABELONLY gives you a lot of useful information about the backup itself that’s going to come in handy when you’re managing backups. But it’s not that handy for getting backups done. For that, we’ll go on to where the beef is.
The amount of information you get when you read the entire header is fairly insane. You retrieve it the same way as you would retrieve the LABELONLY data:
RESTORE HEADERONLY FROM DISK = 'd:\bu\mm.bak' ;
This command outputs a lot of information, listed here:
|BackupDescription||Daily Full Backup|
A lot of information! I’ll walk you through it all and highlight some of the more important columns.
The BackupName and BackupDescription have to be provided when you run the BACKUP command. This is a great way to mark a database so that you know what it’s for, rather than having to rely on funky naming conventions for the file itself. I’d strongly suggest using these during backups as a clear and clean mechanism for identifying the purpose of a backup. Since you can describe the backup in perfectly plain language in the description, there’ll be no need for coded values in the backup file name.
After this, you get some information about the backup that may, or may not, be useful, depending on the type of backup you’re looking at and your current needs. BackupType is one of the 7 different backup types, but identified by number, not description. The full list is in the Books Online. ExpirationDate is another value set when you run the BACKUP command. If you’re going to allow your backup files to be removed after a certain date, it’s good to know how long this file will be around.
Compressed is a bool that indicates whether a database backup has been compressed by SQL Server or not. For what it’s worth, this backup header is from a backup that was compressed by SQL HyperBac, but the header doesn’t show it as being compressed. Just remember, this marker is for SQL Server compression, not third party compression. Position tells you a backup’s position within the backup set, for when you’re storing multiple backups within a single file. DeviceType tells you what sort of backup device this backup was initially stored on.
The next set of information is absolutely vital to tracking your backup information. UserName, ServerName, DatabaseName, and CreationDate, are all extremely important when you’re investigating a backup. They tell you who ran the backup, from which server, on which database, and when they did it – information you’re going to be asking for over and over again. Also in this set of details is the DatabaseVersion, but it’s a coded version and isn’t as clear as some other information in the header which I’ll address below.
Backup size… I guess this could be useful. Then again, you have access to the file. If it’s compressed, however, this value will be the size of the uncompressed backup, so it could be useful there.
For useful information, the next set can’t be beat. These are the log sequence numbers (LSNs) related to this backup. The FirstLSN shows the initial LSN for the log related to this backup. The LastLSN shows the final LSN for this backup. You’re going to be using these with full backups, differentials and log backups alike to understand how they relate to one another and in what order you can run them. It’s how you can identify gaps in the log chain when you’re restoring differentials. In short, this is absolutely vital information. You’ll also see the CheckpointLSN and the DatabaseBackupLSN. The Checkpoint is largely informational. The DatagbaseBackupLSN is the official LSN for the backup. It will correspond to the FirstLSN unless your database is extremely active, or you’re dealing with replication. In either event, use the DatabaseBackupLSN as the LSN of record for the database. Just don’t be shocked if it usually corresponds to the FirstLSN.
BackupStartDate and BackupFinishDate should be self-explanatory. They are useful measures, if nothing else showing how long it took for the backup to complete. But knowing when a backup is from is good information to have anyway.
SortOrder, UnicodeLocaleId, and UnicodeComparisonStyle are all supplied for backwards compatibility.
CodePage shows the character set in use by the server from which the backup was taken. This could be handy if you’re dealing with multiple languages and the like. CompatibilityLevel is informational about the database too. A database can be run on a server in Compatibility Mode, meaning it’s running as if it were on an older version of SQL Server. CompatibilityLevel lets you know if this is the case, through the backup header. SoftwareVendorId is the same as it was from the LABELONLY restore.
The next set of columns show you information about the version of the server used to take the backup. This is much more precise and clear than the coded DatabaseVersion up above. You have the SoftwareVersionMajor, SoftwareVersionMinor, and SoftwareVersionBuild so that you can know precisely what version of SQL Server was used and whether any service packs, cumulative updates, or hotfixes have been applied. There are sometimes backup versions that won’t work with other versions, so this information is absolutely vital for dealing with those issues.
The MachineName doesn’t show the server from which the backup was taken, but the name of the machine that performed the backup operation. This can be handy when you’re investigating where a backup came from, who ran it, etc.
Flags is a way of showing whether some of the settings available for a backup are on or off. It’s a bitmask. Blech. The good news is, you can just look at the series of bit fields lower down to get everything you need without having to decipher a bitmask.
BindingID is a uniqueidentifier assigned to the database within the system tables. It’s useful for referring back to the existing system that the backup was taken from, but doesn’t mean much for a restore operation, since this value will just get created on the system you’re restoring it to.
RecoveryForkID is more than a little complicated, but could prove very useful. Let’s say that you recover a database to a point in time prior to the latest possible log backup. Then, let’s say you run a transaction log backup on the new database. What you’ve just done is create a recovery fork. This means you could recover the database from two different sets of backups. The RecoveryForkID is an identifier that allows you to determine which recovery fork this backup will correspond with. In most situations, this doesn’t mean much. But, you might need to investigate which set of backups will recover you to the appropriate point in time and this identifier is what you use to do that. It will show a common set of data between log backups that are all on the same fork, but different data for logs on another fork. Honestly, this isn’t one I’ve ever used, but knowing it’s there gives me another tool in the toolbox.
Collation and FamilyGUID are markers for the database that will still be there when you restore. These are useful so that you know what you’re going to get when you run this restore.
The bit fields that follow are all very self-explanatory. They’re the values that can be retrieved from the bitmask field Flags: HasBulkLoggedData, IsSnapshot, IsReadOnly, IsSingleUser, HasBackupChecksums (very useful for verifying the backup), IsDamaged (again, good to know), BeginsLogChain, HasIncompleteMetaData (only set for tail log backups), IsForceOffline, and IsCopyOnly.
FirstRecoveryForkID tells you that this backup corresponds to the first recovery fork in the chain, assuming there is more than one recovery fork in the chain of backups. This should be used alongside RecoveryForkID. My backup above has the same value for both, which means that there is no recovery fork at all. The next value, ForkPointLSN is relevant when the RecoveryForkID and FirstRecoveryForkID don’t match, will be a non-null value corresponding to the LSN for the first recovery fork.
RecoveryModel tells you what kind of restore you’re going to be able to run from this backup. If you’re backing up a database that is in SIMPLE, you won’t be able to perform log backups, so don’t worry about it.
DifferentialBaseLSN and DifferentialBaseGUID will identify the base values that are needed to restore a differential backup. For non-differential backups these will always be null.
While there was the BackupType code at the beginning of the header, you can also use the BackupTypeDescription to understand what type of backup is represented by this backup file.
BackupSetGUID just identifies which backup set this backup belongs to. CompressedBackupSize gives you the size of the compressed file, if it’s compressed. You can compare this to BackupSize to get an idea of compression ratio.
With the incredible amount of information available from the header, it’s hard to say what’s most important. Being able to determine the LSN for full backups, differentials, and log backups is a hugely important mechanism for managing your recovery. All the general metadata about the backup and the source of the backup is going to come in handy too. There really is a lot of information, so you need to focus on the parts that are immediately important to you and ignore the rest. Unfortunately, that means knowing what most of them mean, and that’s why the summary above is worth knowing.
Now you can find out lots of information about a backup, but do you have enough to run a restore operation based on the information you’ve collected? Short answer: nope. You still don’t have any clue about how the database itself was distributed, and you need that if you’re running a full restore or any kind of file/filegroup restore. If you’re replacing an existing database with a backup taken from that database, you won’t need this. But if you’re restoring to servers with different disk layouts, you just might want to know how many files you have and where they’re stored. To get that information, you need a RESTORE FILELIST ONLY, which you run like this:
RESTORE FILELISTONLY FROM DISK = 'd:\bu\mm.bak'
The results come back like this:
|PhysicalName||C:\Program Files\Microsoft SQL Server\MSSQL10_50.GFR1\MSSQL\
|C:\Program Files\Microsoft SQL Server\MSSQL10_50.GFR1\MSSQL\
This, along with the information from the header, is everything you need to restore this database appropriately. Knowing the logical and physical names is what’s necessary to use the MOVE statement, or even to know that you need to use the MOVE statement when running the RESTORE operation. But there’s more information here too. Let’s run through it.
First, note that I’ve included two sets of data instead of the single set I used for the other commands in this article. That’s because this command will return multiple rows of data depending on the number of files that make up the database. The interesting things are that almost every single field for multiple files is included, and they have the same data from FULL, DIFFERENTIAL and LOG backups. If you’re looking at a file-only backup, you’ll still see references from the other files as additional rows returned by FILELISTONLY. The key is near the end of the values: IsPresent. If that’s a 1, then that file is in the backup referenced. If not, you’ve got information from the FILELISTONLY operation, but that file itself isn’t here. If you’re just looking at a FULL backup, for example, all the files will be included. This really only comes into play when you start doing FILE and FILEGROUP backups.
LogicalName and PhysicalName are the names that define the file and where it’s stored. These are the most fundamental pieces of information you’re pulling out of FILELISTONLY and, thankfully, they’re right at the top, so they’re easy to find. These are immediately followed by the Type, which shows ‘D’ for data and ‘L’ for log in my sample. You may also see ‘F’ for Full Text Catalog. After that is FileGroupName, which shows the distribution of the files into different file groups. For most people, this is all you need when using FILELISTONLY.
The rest of the columns are useful, but primarily in more specific situations or just as general information. Speaking of general information, Size, MaxSize, and FileID are all descriptions of the file within the database. Having the size values will help you make decisions on where you can restore these files to.
Within FILELISTONLY you get the log sequence number (LSN) information associated with the file in question. This is going to be vital when you’re trying to determine where a restore sits in its chain. CreateLSN is the LSN for when the file was created. For a simple database like the one above, you’ll note that both files were created at LSN = 0. That value will be different if you’re modifying your database structure. If you drop a file, it will remain in the description, but you’ll get a value at DropLSN.
The UniqueID is the identifier for the file.
ReadOnlyLSN and ReadWriteLSN show when these values were changed. This can affect how you’re going to do restores, if they were switched from one to another within the time frame you’re worried about. For example, if you’re restoring to a point in time and you’re wondering whether or not to include a particular restore because it overlaps the moment you’re interested in, take a look at the ReadWriteLSN. If it’s after your moment, you may not need the file, but you need to check the ReadOnlyLSN to be sure.
Differential backups will show additional, useful information in the DifferentialBaseLSN, which tells you the LSN from which this differential takes its data, and the DifferentialBaseGUID, which shows the GUID for the FULL backup that is the base for this differential. Both these pieces of information help identify where you can restore this differential.
The IsReadOnly bit lets you know if the file is marked that way. IsPresent tells you whether the backup media you’re referencing contains this file or not. Remember, all the file types will show the structure, but not all of the file types will contain the information you’re interested in. This bit is important.
Finally you get the TDEThumprint which shows an encrypted hash of the encryption key for this file.
The three pieces of information you’re going to use over and over are the LogicalName, PhysicalName, and IsPresent. Everything else is useful, but none of it is going to be constantly in front of you the way those fields will be.
In conclusion, when you are faced with that mysterious backup from some unknown source, or you’re stuck in the emergency response team and the obscure file names aren’t helping you recover the database, don’t panic. You can refer back to LABELONLY, HEADERONLY, and FILELISTONLY to put together the information you need. The best information is going to come out of HEADERONLY and FILELISTONLY and with these you should be able to figure out what’s needed to restore just about any database from the backup media you have available.