Performance of structured storage files.

We’ve been working on a new version of our DTS Compare tool for a couple of months now, and it’s starting to look quite good. I’ve been working on the snapshot functionality today, which allows a user to take a snapshot of the state of all the DTS packages on a given instance of SQL Server 2000.

My original plan was to use a structured storage file (sometimes also referred to as a compound file – see ms-help://MS.VSCC.2003/MS.MSDNQTR.2003FEB.1033/stg/stg/about_structured_storage.htm for more information) and just dump all the packages to this file, along with some other necessary information to complete the snapshot. This is nice because you can simply use the SaveToStorageFile method (ms-help://MS.VSCC.2003/MS.MSDNQTR.2003FEB.1033/dtsprog/dtspmethjz_9odg.htm) to save any number of DTS packages to a particular structured storage file. The other attraction of this is that it’s a ‘standard’ file format, with support built into the Windows shell, so you can display useful information to the user etc. Also, this means that you don’t need to bother to write or maintain any code to actually read/write the file yourself. I’m basically lazy, so this is excellent.

On my development machine I have some 1026 DTS packages on the default SQL Server 2000 instance, of which around 600 are automatically generated by a little utility we developed to allow us to test the performance of the new version of DTS Compare. The test packages are quite large, but not enormous – they weigh in at around 480K when saved to disk individually. I also have around 400 real DTS packages that are used by some of our customers, who have kindly supplied them to us for testing purposes.

So I started running my test to create a snapshot using a single structured storage file at around 9.45 this morning, thinking it would probably take 10 – 15 minutes to run… in fact it took a whopping 93 minutes to save all the packages to a single structured storage file. I was gobsmacked to say the least. I just couldn’t believe how long it had taken. The generated file was large, at 360 MB, but this is by no means outrageous, and it shouldn’t take anything like 93 minutes to write a file of this size. (For those of you planning to use the new version of DTS Compare, the snapshot file size will be much smaller once I add data compression, so don’t be too concerned.)

The key is perhaps to remember that a structured storage file is really a file system within a file, so I wondered if perhaps the DTS packages were taking longer to save to the file the more of them I added, since I also noticed that the first 10% of the packages were saved relatively quickly, and that the initial rate of writing packages to the file was roughly 1 per second, compared to the average rate of writing which was around 1 package every 5.5 seconds.

I decided to test this by saving each package to its own structured storage file, which you might reasonably expect to be slower because of the need to open and close all those files. In fact, what I found was that it was an order of magnitude faster, taking around 8 and a half minutes to save all 1026 packages to disk. Quite surprising to say the least.

So, the moral is, beware of structured storage if you’re planning to add lots of individual objects, even where these objects are quite large. Having said that, structured storage obviously makes a lot more sense if you’ll be making a lot of incremental changes to the file, but if you just want a lazy solution for storing lots of data look elsewhere. We’ll now be designing our own format for server snapshots to offer much better performance than a structured storage file.