Database Kill and Fill

When testing database code in development, you usually have a lot of tests that aim to prove that the various processes such as ETL, reporting or the processing of an order, still work as they always did. If it is a changed process, you need to ensure that, from a known dataset, the process yields the result that the business expects. These tests happen as part of both development and deployment.

These datasets are generally created for each release and maintained to keep in step with the current development, being saved on the local filesystem as native BCP files. Sometimes, all or part of the live data is used, pseudonymised or masked where necessary. It is also handy to generate data, especially when you need large datasets to check that the database can scale appropriately.

Copying a database using a BCP dataset from a DOS script or from PowerShell is fairly quick and trivial, but what if the data is already in another copy of the database on the same instance? Imagine you have a database build that is a copy of an existing database on the same instance and you want to run a ‘destructive’ test on the data, and do it over and over again.

Deleting all the data in a database should, you’d have thought, be trivial. Truncation doesn’t work because it doesn’t like to truncate tables with foreign keys, even if they are disabled. DELETE is safer but you will need to then reset any identity columns. Deleting large tables in one chunk isn’t scalable, so you have to do it in smaller chunks. I like to disable triggers while I do this as well. The only disadvantage comes if you have triggers that are supposed to be disabled because the simple code just re-enables them all.

Killing data

Basically, if you are prepared to use sp_MSforeachtable, then this code is all reasonably simple. This will delete all the data from Adventureworks in about half a minute. It does Pubs in a second! Don’t use it without due care! This code is dangerous in untutored hands. You can try it out within a transaction and roll it back while you are testing.

---firstly delete all existing data

DISABLE TRIGGER ALL ON DATABASE;

--now disable all constraints

EXEC sp_MSforeachtable 'ALTER TABLE ? NOCHECK CONSTRAINT ALL';

--delete the data from each table in turn, doing tables with more than 10,000 rows in chuncks

EXEC sp_MSforeachtable '

Print ''Deleting all the data from ?''

SET QUOTED_IDENTIFIER ON;

DECLARE @SoFarDeleted INT=1;

WHILE (@SoFarDeleted > 0 and @@Error=0)

BEGIN

-- Delete a chunk of rows at a time

DELETE TOP (10000) ?

SET @SoFarDeleted = @@ROWCOUNT;

END ';

--switch on all constraints

EXEC sp_MSforeachtable 'ALTER TABLE ? WITH CHECK CHECK CONSTRAINT ALL';

-- Reseed identity columns if the table has an identity column

EXEC sp_MSforeachtable 'IF ObjectPropertyEx(Object_Id(''?''),''TableHasIdentity'')=1 DBCC CHECKIDENT (''?'', RESEED, 0)';

-- and enable all triggers

ENABLE TRIGGER ALL ON DATABASE;

Filling data

Now to get that data back in. you have a copy of the database on the same server so you use that. All you need to do, surely is to …

1	INSERT INTO tableA SELECT * FROM otherDatabase.tableA;

… for all the tables in the database? No. For a start, you cannot insert into a timestamp column, and you need to SET IDENTITY INSERT TableA ON in order to insert into TableA, which has an identity column. If course, if you try this with a table that hasn’t an identity column you get an error. Inserting XML into an XML column results in an error unless you do an explicit conversion (because of the possibility of a different XML Schema). You have to ignore calculated columns as well. You can’t insert into a calculated column or a timestamp column. I have no idea whether it is possible or desirable to insert into a hidden column so I disallowed that too. Change it to taste.

Out of the box, this code doesn’t work on older versions of SQL Server because I use the lovely string_agg function to generate lists of columns. You can change this to the classic XML trick to get the same result if you are stuck on an old version. Anything older than SQL Server 2016 didn’t have the is_hidden attribute in the sys.columns system view

Here, I have it set, for the sake of demonstration, to delete all the data from a database called AdventureworksCopy, and fill it again from Adventureworks2016. You’ll want to change that!

100

101

102

USE AdventureworksCopy;

DECLARE @source sysname = 'Adventureworks2016';-- the source of the database

---firstly delete all existing data

DISABLE TRIGGER ALL ON DATABASE;

--now disable all constraints

EXEC sp_MSforeachtable 'ALTER TABLE ? NOCHECK CONSTRAINT ALL';

--delete the data from each table in turn, doing tables with more than 10,000 rows in chuncks

EXEC sp_MSforeachtable '

Print ''Deleting all the data from ?''

SET QUOTED_IDENTIFIER ON;

DECLARE @SoFarDeleted INT=1;

WHILE (@SoFarDeleted > 0 and @@Error=0)

BEGIN

-- Delete a chunk of rows at a time

DELETE TOP (10000) ?

SET @SoFarDeleted = @@ROWCOUNT;

END ';

--switch on all constraints

EXEC sp_MSforeachtable 'ALTER TABLE ? WITH CHECK CHECK CONSTRAINT ALL';

-- Reseed identity columns if the table has an identity column

EXEC sp_MSforeachtable 'IF ObjectPropertyEx(Object_Id(''?''),''TableHasIdentity'')=1 DBCC CHECKIDENT (''?'', RESEED, 0)';

-- and enable all triggers

ENABLE TRIGGER ALL ON DATABASE;

--create a table variable giving all the table names so we can do each one

DECLARE @tables TABLE

(

TableName_id INT IDENTITY,

TheObject_ID INT NOT NULL,

TheName sysname NOT NULL,

TheSchema sysname NOT NULL

);

--fill the table with the names of the tables we need to fill

INSERT INTO @tables (TheObject_ID, TheName, TheSchema)

SELECT object_id, name, Object_Schema_Name(object_id) FROM sys.tables;

--and to prevent a generated script getting too untidy ---

DECLARE @CRLF CHAR(2) = '

-- We disable all constraints now

DECLARE @Script NVARCHAR(MAX) =

'EXEC sp_msforeachtable "ALTER TABLE ? NOCHECK CONSTRAINT ALL"' + @CRLF;

DECLARE @ii INT, @iiMax INT; ---our iteration counter

DECLARE @TheTableName sysname, @TheSelectColumns NVARCHAR(MAX),

@TheInsertColumns NVARCHAR(MAX), @TheTable_id INT, @HasIdentityCol INT;

--set our iteration counters

SELECT @ii = Min(TableName_id), @iiMax = Max(TableName_id) FROM @tables;

--we need to have a table with all the source tables in it just in case we

--have an extra table in the destination with no equivalent in the source

DECLARE @sourceTables TABLE (TableName sysname);

DECLARE @Expression NVARCHAR(MAX) = --the expression we execute

N'USE ' + @source

+ ' SELECT QuoteName(Object_Schema_Name(t.object_id)) + ''.'' + QuoteName(t.name) FROM sys.tables t';

--and get the list of the tables in the other database

INSERT INTO @sourceTables EXECUTE (@Expression);

--now we can get cracking and generate a script that fills each table in turn

WHILE @ii <= @iiMax

BEGIN

SELECT @TheTable_id = TheObject_ID,

@TheTableName = QuoteName(TheSchema) + '.' + QuoteName(TheName)

FROM @tables

WHERE TableName_id = @ii; --get the table name

SELECT @ii = @ii + 1; --don't fill it if it isn't in the source

IF @TheTableName NOT IN (SELECT TableName FROM @sourceTables) CONTINUE;

SELECT @TheInsertColumns = --get the list of columns in the insert expression

String_Agg(QuoteName(columns.name), ', ') WITHIN GROUP(

ORDER BY column_id ASC),

@TheSelectColumns =--get the list of columns in the select expression

String_Agg( CASE WHEN types.name = 'xml' --convert XML types

THEN 'Convert(XML, ' + QuoteName(columns.name) + ')'

ELSE QuoteName(columns.name) END,

', '

) WITHIN GROUP(ORDER BY column_id ASC)

FROM sys.columns

INNER JOIN sys.types

ON types.user_type_id = COLUMNS.user_type_id

WHERE object_id = @TheTable_id

AND is_hidden = 0 --don't use hidden columns

AND is_computed = 0--don't use computed columns either

AND types.name NOT LIKE 'timestamp';--don't use timestamp columns

SELECT @HasIdentityCol =Convert(INT,ObjectPropertyEx(@TheTable_id ,'TableHasIdentity'))

SELECT @Script =

@Script

+ CASE WHEN @HasIdentityCol > 0 THEN /*IF there is an identity column we

need to set identity insert on for the table */

'SET IDENTITY_INSERT ' + @TheTableName + ' ON ' ELSE '' END

+ @CRLF + N'Print ''Adding data to table ' + @TheTableName + N' '''

+ @CRLF + N'INSERT INTO ' + @TheTableName + N'(' + @TheInsertColumns

+ N') ' + @CRLF + N'SELECT ' + @TheSelectColumns + N' FROM ' + @source

+ N'.' + @TheTableName + @CRLF

+ CASE WHEN @HasIdentityCol > 0 THEN /*IF there is an identity column we

switched identity insert on so we need to switch it off */

'SET IDENTITY_INSERT ' + @TheTableName + ' OFF' + @CRLF ELSE '' END

+ @CRLF;

END;

SELECT @Script =

@Script

+ '

EXEC sp_msforeachtable "ALTER TABLE ? WITH CHECK CHECK CONSTRAINT ALL"';

EXECUTE (@Script);

Conclusions

Am I recommending Kill and Fill? It depends on what sort of testing you are doing. A BCP process is always nice, quick and clean, though it still needs the kill part of the process, but it is best done from a Dos shell script or a PowerShell script. This Kill and Fill process is easy to set up though I don’t think it is any quicker than a bulk process. I just thought I should make it available just in case anyone else needed a different way of filling a development database with data.

Register for Simple Talk

Database Kill and Fill

Killing data

Filling data

Conclusions

About the author

Phil Factor

Phil Factor's contributions

Articles

Books

Top topics

Phil Factor's latest contributions:

Accessibility, and the Need for Ingenious Simplicity in Computer Scripts, Languages and Programs

Data Documents and Common Sense

AI and Databases

Killing data

Filling data

Conclusions

Recommended

About the author

Phil Factor

Phil Factor's contributions

Articles

Books

Top topics

Phil Factor's latest contributions:

Accessibility, and the Need for Ingenious Simplicity in Computer Scripts, Languages and Programs

Data Documents and Common Sense

AI and Databases