Simple Talk is now part of the Redgate Community hub - find out why

SQL Server 2005 Snapshots

Find out about Database Snapshots, new to SQL 2005 Enterprise Edition, which provide a read-only, "virtual" copy of a database, at a given point in time.

A DBA’s view on SQL 2005 Database Snapshots

Database Snapshots are a new Enterprise Edition-only tool, introduced in SQL 2005, that provide a read-only, “virtual” copy of a database, at a given point in time. This article will discuss how, as a DBA, you might use snapshots in a production environment and why you might not consider them suitable for some of the commonly-stated use cases. These include:

  • Protecting your system from user or administrator error
  • Offloading reporting
  • Maintaining historical data

However, I’ll also highlight two other areas where I believe snapshots can be a really effective tool, namely:

  • System upgrades
  • Materialising data on standby servers

Creating Snapshots

One of the most infuriating things I have found is that you can not use SQL Server Management Studio (SSMS) to create snapshots. I’m a big fan of scripts but I also like the convenience of using SSMS, and most junior DBAs tend to use SSMS over scripts, so I feel that this is quite a significant oversight. In any event, to create your first snapshot you will need to run something similar to the following T-SQL code:

You will notice that the code is very similar to a script you would use to create a database but you should note that the logical device name, in this case andrew_test_datafile, has to match the logical name defined on the source database.

In this example, I’ve created a snapshot of the database andrew_test called andrew_test_snapshot_250206_1800. Note the use of the <database name>_<date>_<Time> naming convention. It’s wise to name your snapshots clearly otherwise you may struggle unnecessarily when trying to identify when the snapshot was created. Note also that I gave the snapshot a meaningful file extension: .snap.

From an organisational and administrative point of view I recommend that you store your snapshots in a separate directory from your database files. It’s not such a big deal when you’re working with a small number of files, but if you have hundreds of other files in the same directory you soon start wishing you had files of one type in one directory. Additionally, this makes exclusions or filters even easier as you can now do them at a folder level. If you plan to use snapshots for reporting I would seriously consider putting the snap files on their own drive. This way, once they start populating, you will be keep some of the IO away from the drive that is hosting the database files, and therefore reduce disk contention.

How Snaphots Work

It’s essential that a DBA understands at least something of what actually happens “behind-the-scenes” when a snapshot is created, and of how snapshots actually work. There is a great explanation of how snapshots work on MSDN2, but I will summarize some of the particularly relevant points here.

A database snapshot is a read-only static view of a database. When you create a database snapshot, as described in the previous section, the following process is invoked.

  1. An empty file, called a sparse file, is created for each source database data file.
  2. The database is checkpointed (all dirty pages in the buffer cache are written to disk).
  3. If uncommitted transactions are in progress in the database, the snapshot will reflect the state of the database before these transactions started in other words, in the context of the snapshot, these transactions will be rolled back – but transactions in the database itself are not.
  4. The snapshot is brought online and is ready for use.

The empty file that is created is an NTFS sparse file. A NTFS sparse file is a file that contains no user data and has not yet been allocated space for user data. In Windows Explorer, such files are shown with a size matching that of the source database. However, when you go into the file properties you will see that the size on disk is 0KB or a multiple of 64KB (representing 1-8 data pages) if some data has been transferred:

137-image001.jpg

NOTE: This is where the meaningful file extension (.snap) comes into its own. If I’m navigating through Explorer and come across a bunch of 10GB .snap files I will hopefully remember that, while the size is showing 10GB in Explorer, .snap files are sparse files so I need to check what space is really being used. Use of a standard extension is also very important when you are, for example, setting up anti-virus scan exclusions, or backup exclusions, across a suite of servers.

Once the snapshot is online, a copy-on-write mechanism kicks in. Just before a page is updated for the first time in the source database, since creation of the snapshot, it is copied to the snapshot datafile. Now any queries to the snapshot that access the changed pages will get those changed pages from the snapshot. If the snapshot is queried for data on any page that has not been modified in the source database, since the time the snapshot was created, then the request is simply redirected to the source database files. In this way, the snapshot presents a consistent view of the data at the point in time that the snapshot was created.

The NTFS sparse files are at the heart of snapshots because only a small amount of disk space is initially required and creation is extremely fast, which means snapshots are generally created in a matter of seconds. However we must consider the disk space and ensure we have enough available for the files to grow without the drives running out of space. I discuss this in the next section.

Snapshot Pros and Cons

Following is a list of what I consider the main pluses and minuses associated with snapshots:

Pros:

  1. They provide a convenient, read-only point-in-time copy of your data.
  2. When you query a snapshot, you will not experience blocking due to update/insert operations that you might have to contend with when querying the source database (assuming snapshot isolation is not in use).
  3. Initially the snapshot data files are small and are very quick to create. They should only become large if your database is subject to frequent modifications.

Cons:

  1. You can not backup a snapshot so if you have to restore your source database your snapshots are lost.
  2. Addressing database index fragmentation negates the benefit of the files being small if the sparse files are kept for a period of time.
  3. Where data pages have not changed you will be accessing the source database file, which may cause contention at the file level since both the source database and the snapshot will be accessing the same MDF.
  4. Every update/insert transaction on the source server potentially bears the overhead of the page copy operation to maintain the snapshot(s).
  5. Because of the reliance on the source database, the snapshot has to exist on the same server as that source database.
  6. You cannot grant a new user access to the data in a snapshot. Permissions are inherited form the source database as it existed at the time of snapshot creation and subsequent changes to security in the source database do not filter down to the snapshots.

In the following section, I’ll review these pros and cons in the specific context of a given usage scenario, and also mention other potential issues, specific to each case. However, I’ll say upfront that the real snake-in-the grass here is the effect on snapshot files of defragmenting indexes. One of the most attractive benefits of using snapshots is that, because they use NTFS sparse files, they will use very little space and, unless you have a highly active database, should only ever grow to a fraction of the size of your database.

However, a DBA has to maintain indexes and this involves defragmenting them from time to time. Now defragmenting an index does not change the actual data but it does move it to different pages in an attempt to make it contiguous. Since the data is on the move any snapshot(s) have to take a copy of the page so as soon as you defragment your indexes for the first time all your snapshots are going to grow to be pretty close in size to your source database. Since it’s highly likely that we will be defragmenting our indexes, this virtually wipes out the benefit of sparse files.

Uses for Snapshots: Winners and Losers

When reviewing database snapshots I took a pessimistic approach and looked for what they might break, what might break them and, when considering other SQL technologies, where are they going to add value.

Losers

In this section I’ll review three of the commonly stated use cases for snapshots, namely to:

  • Offload Reporting
  • Protect against user/DBA error
  • Maintain historical data

In each case, I’ll summarize the pros and cons of a snapshot solution and indicate the main reasons why, generally, I would not consider them as a solution in these scenarios.

One of the frustrating things about snapshots, in my opinion, is their lack of availability in editions other than Enterprise. It seems to me that snapshots are most likely to be viable, and add value, in low throughput systems, which tend to have continuous spare capacity. Use of snapshots to offload reporting or protect from user error become much more viable in such systems, because the additional overhead of snapshots, in terms of page copying and additional query load, is easily borne. However, these same systems are the ones most likely to be using Standard edition, and therefore won’t have access to snapshots in the first place!

Enterprise edition servers often run high IO databases, with resources that may be further constrained in certain time windows, and so may find unacceptable the additional overhead of having snapshots on the Server. So, ironically, the installations that have access to snapshots are those that would be most likely to look towards other solutions, such as log shipping, for reporting and protection from user or administrator error.

Offloading Reporting

Below are what I consider the main pros and cons of using snapshots as a means of offloading reporting:

Pros:

  1. You have a read-only point-in-time copy of your data.
  2. You will not be contending with blocking due to update/insert operations (let’s assume snapshot isolation is not in use).
  3. Initially the snapshot data files are small.

Cons:

  1. You cannot grant a new user access to the data in a snapshot.
  2. You can not backup the snapshots so if you have to restore your source database your reporting snapshots are lost.
  3. Where data pages have not changed you will be accessing the source database file which may cause contention at the file level since both the source database and the snapshot will be accessing the same MDF.
  4. Full text indexes are not available on snapshots so if you require full text searching for your reporting then snapshots are not an option.

Additionally, you should bear in mind that snapshots have to be on the same server as the source database: you do not have a physical dedicated, reporting server. Therefore, physical resources on the database server, such as CPU and memory, still have to be shared.

One other thing to note is that NTFS sparse files grow in 64kb increments and these increments are likely to occur at different times. As such, it’s highly likely that a populated sparse file will not be contiguous at the NTFS level. The effect of this on reporting could lead to significant strain on your disks.

Ultimately, the main thing that would stop me from using snapshots for report offloading is the likelihood of increased physical resource contention.

Maintaining Historical Data

Below are what I consider the pros and cons.

Pros:

  1. Potentially, you can maintain a significant amount of history with little disk space usage.

Cons:

  1. If the drive hosting the snapshot runs out of space causing an update to fail, the snapshot is marked suspect and can not be recovered.
  2. Deframenting your indexes negates the benefits associated with sparse files (if they are kept for a period of time).
  3. Every update/insert transaction on the source server potentially bears the overhead of the page copy operation to maintain the snapshot(s).
  4. You can not backup your snapshots.

Although maintaining historical data is listed as a typical use I would never use snapshots for this simply because I can not back them up.

Protecting from User or Administrator Error

In theory, you could use a snapshot to restore a database to a point before a user or administrator error occurred simply by issuing:

Below are what I consider the pros and cons.

Pros:

  1. The snapshot process is very quick.
  2. The data and code is available immediately for restoration.

Cons:

  1. The snapshot is at a point in time so is unlikely to be suitable for data recovery.
  2. If you restore a snapshot you can’t then restore any transaction logs so you potentially lose a large amount of data.
  3. When you restore a snapshot all other snapshots of the database have to be deleted which would be a problem if you are maintaining reporting snapshots.
  4. If your administrative error involved damage to a physical file (say a drive was accidentally wiped) the snapshot will not help you.

To get any real value from snapshots in this area, you would have to take several snapshots throughout the day. I feel that the benefits of snapshots in this scenario will be far outweighed by the overhead of maintaining multiple snapshots, especially when considering other available options such as log shipping or even restoring from backup.

It goes without saying (but here it is anyway) that snapshots should not be considered as part of or a replacement for a proper backup. To use them in this way would be disastrous when you consider their limitations, such as their dependency on the source database.

Winners

In this section, I move on to the two scenarios where I do believe that snapshots can provide a real benefit for enterprise systems:

  • System upgrades
  • Materialising data on standby servers

Performing System Upgrades

The scenario that I consider a real winner for snapshots has to be when performing a system upgrade. Typically a system upgrade involves a release of code that changes the underlying schema and or data. It may also involve updating related application code. When performing such an upgrade the system administrator will ensure that full backups are taken of everything that will change as part of the upgrade, including the database. This process can be time consuming and requires additional space to store the backups, whilst the upgrade is in progress.

By creating a snapshot, the database “backup” time is reduced to seconds and you only require the amount of space required might be very small, assuming only a small amount of data is changing.

Once the system upgrade is complete, and after performing system checks, if you should you find yourself in the unfortunate position where you must restore to the point prior to the upgrade the recovery is greatly speeded and simplified. To restore your database to the point prior to the upgrade you would issue the RESTORE T-SQL command shown previously. The restore time is likely to be less than a minute, depending on how much data changed, as opposed to potentially hours when restoring a large database.

I have to point out though that there will still be some system upgrades where I would want a full backup prior to starting work. Upgrading the operating system is such an example – but with most upgrades a snapshot will suffice. I will also state that I am assuming the last full backup and transaction logs up until the upgrade are available should the worst happen when using the snapshot approach.

Materialising Data on your Standby Servers

The other winner for me is using snapshots to “materialise” (make available for querying/reporting) data on your standby servers. Due to business requirements, or to avoid connections interfering with log shipping, you may have chosen to keep your log shipped database in NORECOVERY mode. This means you cannot read from the database. By creating a snapshot of your log shipped database, you can read the data and effectively use the standby server to offload reporting requirements.

More importantly, this technique also works with mirrored databases and since a mirror destination can only ever be in NORECOVERY mode, it’s an excellent way to access the mirrored data for the purpose of reporting. However there is always a trade off and the key point to keep in mind when using snapshots on a mirrored database is that, in synchronous mode, transactions have to commit on both servers before SQL can mark the transaction as complete. By introducing the snapshot we have introduced a third step before a transaction will commit. Why? Well below are the steps you now have to take if you have a snaphot:

  1. Write transaction on server A source database.
  2. Write transaction on server B destination database.
  3. Copy page on server B destination database to server B snapshot.

Without the snapshot you only have to do steps one and two.

Of course, all this assumes that snapshots work smoothly and seamlessly with high availability solutions such as log shipping and database mirroring. So, what might you have to consider when using snapshots with these two HA technologies?

Snapshots and Log Shipping

  • There are no restrictions on creating a snapshot on a log shipped source (or destination) and log shipping is not disrupted when creating a snapshot.
  • You are not prevented from restoring the snapshot when log shipping is present.
  • When you restore the snapshot you must remember that the process is similar to a database restore so it is not a logged operation and as such log shipping will then fail.

Snapshots and Database Mirroring

  • There are no restrictions on creating a snapshot on a mirrored source (or destination) and mirroring is not disrupted when creating a snapshot.
  • You are prevented from restoring a snapshot whilst mirroring is active. You must first stop mirroring and once you have restored you must then restore the destination database and enable mirroring again.
  • Although you can snapshot the source and destination databases, when you issue a snapshot restore the recovery option is ignored. This means you can not restore a snapshot to a recovering state so your destination mirror database must be restored from a full backup as mirroring requires the destination to be in a recovering state.

Conclusion

Despite how it may seem I actually welcome snapshots with open arms to my DBA tool box. I do believe it’s a pity that this is an Enterprise edition only feature, but I do see it as a tool I will make good use of when doing system upgrades on my Enterprise edition servers. Snapshots are in their infancy but I feel that with a little more work they will become a widely-used tool.

How you log in to Simple Talk has changed

We now use Redgate ID (RGID). If you already have an RGID, we’ll try to match it to your account. If not, we’ll create one for you and connect it.

This won’t sign you up to anything or add you to any mailing lists. You can see our full privacy policy here.

Continue

Simple Talk now uses Redgate ID

If you already have a Redgate ID (RGID), sign in using your existing RGID credentials. If not, you can create one on the next screen.

This won’t sign you up to anything or add you to any mailing lists. You can see our full privacy policy here.

Continue