Database Kill and Fill

When testing database code in development, you usually have a lot of tests that aim to prove that the various processes such as ETL, reporting or the processing of an order, still work as they always did. If it is a changed process, you need to ensure that, from a known dataset, the process yields the result that the business expects. These tests happen as part of both development and deployment.

These datasets are generally created for each release and maintained to keep in step with the current development, being saved on the local filesystem as native BCP files. Sometimes, all or part of the live data is used, pseudonymised or masked where necessary. It is also handy to generate data, especially when you need large datasets to check that the database can scale appropriately.

Copying a database using a BCP dataset from a DOS script or from PowerShell is fairly quick and trivial, but what if the data is already in another copy of the database on the same instance? Imagine you have a database build that is a copy of an existing database on the same instance and you want to run a ‘destructive’ test on the data, and do it over and over again.

Deleting all the data in a database should, you’d have thought, be trivial. Truncation doesn’t work because it doesn’t like to truncate tables with foreign keys, even if they are disabled. DELETE is safer but you will need to then reset any identity columns. Deleting large tables in one chunk isn’t scalable, so you have to do it in smaller chunks. I like to disable triggers while I do this as well. The only disadvantage comes if you have triggers that are supposed to be disabled because the simple code just re-enables them all.

Killing data

Basically, if you are prepared to use sp_MSforeachtable, then this code is all reasonably simple. This will delete all the data from Adventureworks in about half a minute. It does Pubs in a second! Don’t use it without due care! This code is dangerous in untutored hands. You can try it out within a transaction and roll it back while you are testing.

 

Filling data

Now to get that data back in. you have a copy of the database on the same server so you use that. All you need to do, surely is to …

… for all the tables in the database? No. For a start, you cannot insert into a timestamp column, and you need to SET IDENTITY INSERT TableA ON in order to insert into TableA, which has an identity column. If course, if you try this with a table that hasn’t an identity column you get an error. Inserting XML into an XML column results in an error unless you do an explicit conversion (because of the possibility of a different XML Schema). You have to ignore calculated columns as well. You can’t insert into a calculated column or a timestamp column. I have no idea whether it is possible or desirable to insert into a hidden column so I disallowed that too. Change it to taste.

Out of the box, this code doesn’t work on older versions of SQL Server because I use the lovely string_agg function to generate lists of columns. You can change this to the classic XML trick to get the same result if you are stuck on an old version. Anything older than SQL Server 2016 didn’t have the is_hidden attribute in the sys.columns system view

Here, I have it set, for the sake of demonstration, to delete all the data from a database called AdventureworksCopy, and fill it again from Adventureworks2016. You’ll want to change that!

Conclusions

Am I recommending Kill and Fill? It depends on what sort of testing you are doing. A BCP process is always nice, quick and clean, though it still needs the kill part of the process, but it is best done from a Dos shell script or a PowerShell script. This Kill and Fill process is easy to set up though I don’t think it is any quicker than a bulk process. I just thought I should make it available just in case anyone else needed a different way of filling a development database with data.