
DevOps, Continuous Delivery & Database Lifecycle Management
Automated Deployment
This article is about how to stock an empty database with data from files.
Why?
There are a number of reasons for wanting to do this. I’ll explain why I do it. When I’m developing databases, source control will contain everything in the way of scripts that I need to build the databases, but I like to build them regularly, using an automated process, from the components. I do this in order to make sure that nothing has broken the build, that what’s in source control represents the true state of the database, and to do integration testing. Sure, I develop on a full system with SSMS, rather than go anywhere near the ‘disconnected model’ of SQL Server Data Tools (SSDT), so this regular build isn’t a vital process because the use of the shared development server will avoid many integration problems, but I might want to build from a number of versions. This needs the ‘test’ data. When I say that source control contains everything I need, I generally store the test data separately because it’s big, it is binary, and it changes less frequently than the database versions. When this happens, I archive off the data and replace it (I’ll describe this in another article). This is no big deal compared with the main task of a refactoring, which is having to change the integration tests. I like to have at least one current set of the data of all the tables saved as BCP
Native format files on disk, in a directory for each server, with a subdirectory for each database. As well as building the integration databases, it has a range of other uses in the long road through the database deployment process.
For a wide-scale test, you would build your database on a virtual server that you can then spin up to run each of your tests on. This means that, once you have all your integration databases built from the current version in source control, you will have no further need for messy data imports. You just run your tests on the virtual server. You run your tests against the test environment to do a full integration
For much testing, a virtual environment is possibly overkill. Normally, you would want to run your integration tests overnight as a daily process. The tests can each build the database as part of the build-up process. AdventureWorks, for example takes less than a minute to build entirely from scripts to a known data state. All third-party interfaces can be ‘mocked’. The databases can be torn down very quickly.
To create, copy, modify and otherwise manipulate a database, there are several activities that need to be automated. We’ll here deal with one of these: getting data in. I’ll be talking about other ‘bits’ in other articles.
I’ll be providing sample scripts. They are intended to illustrate and they work, but they aren’t industrial strength. For effective automation, we’ll tend to use PowerShell with SQL, but I’ll provide alternatives where possible if you are restricted to SQL. If you want a good pile of data to use, then go to the companion article ‘Database Deployment: The Bits – Copying Data Out’ to find out how. I use a build script for AdventureWorks and a directory with all the data in order to try it out.
Deleting the existing data
If you are running a series of tests, and are not using virtualization, you will want to get the database to a known state of data consistent with the tests. You will build the database just the once and then import the data for each test rather than adding it. This means that, unless you do this as part of the teardown process of the previous test, you will want to delete the existing data. Deleting all data from databases should be easy, but referential integrity constraints will stand in your way to prevent you iterating through a list of tables, deleting all the data from each one. To delete all the existing data in a database, you need to temporarily disable referential integrity constraints and any triggers that are there to prevent you doing precisely this. Then we execute the following code. Before you even attempt to do this, make sure your data is all successfully read out and you have a backup. Otherwise this will end in tears. Make sure you know what you’re doing and you’re in the right database and server before you run this.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
USE MyDatabase EXECUTE sp_msforeachtable 'ALTER TABLE ? WITH CHECK NOCHECK CONSTRAINT ALL' --now create a batch that deletes all the tables EXECUTE sp_msforeachtable 'ALTER TABLE ? DISABLE TRIGGER all' DECLARE @Command NVARCHAR(MAX) SELECT @Command= (select 'Delete from ' + QUOTENAME(Table_Catalog) + '.' +QUOTENAME(Table_Schema)+'.' + QUOTENAME(Table_Name) + ' ' FROM information_schema.Tables WHERE table_Type='BASE TABLE' FOR XML PATH(''), TYPE).value('.', 'nvarchar(max)') EXECUTE (@Command) EXECUTE sp_msforeachtable 'ALTER TABLE ? ENABLE TRIGGER all' EXECUTE sp_msforeachtable 'ALTER TABLE ? WITH CHECK CHECK CONSTRAINT ALL' |
So what are we doing here? We are first disabling all the triggers and constraints in the database, then we are deleting all the tables, and finally we are re-enabling both triggers and constraints. You can’t use TRUNCATE this way to speed things up because you’d need to actually drop the constraints first. See TRUNCATE TABLE (Transact-SQL). There are other ways of doing this which are more complicated but which don’t involve disabling constraints, which is a rather blunt knife in the database surgeon’s set of instruments. I will show later on in this article how you can usually delete data without touching constraints.
Importing the data.
You have a wide choice of techniques for getting the data in, but if you have taken the wise choice of storing the data in native format, then BCP
, or one of the other techniques that use the bulk copy library, is best. BCP
is good for a number of formats, but native format is best, and is fastest.
It is possible to do bulk import in SQL Server by a variety of methods that use the same underlying Bulk copy library. There is BCP
, BULK INSERT
and INSERT….Select * from OPENROWSET(BULK…). The .NET System.Data. SqlClient also provides a SQLBulkCopy class to do it. It is also reasonably easy to make a PowerShell script that uses this library, or you can merely execute the BCP
utility to load the data into the remote server.
Tweaking the performance of the import
These approaches are all fast. The BCP
command-line is a good alternative but is out-of-process. This is a bit more insecure, but may give you better performance if you are short of CPU grunt.
When you’re using TSQL, the BULK INSERT
statement is the obvious way to use Bulk Copy functionality. There are a number of knobs you can twiddle to improve Bulk Import performance
For performance reasons, the best place to have your data is on a local drive within the integration database server, though it is perfectly possible to have the BCP
process on a different server to the database, or the data file, if you don’t mind waiting. You can use BCP
across a network to a remote SQL server, with the files being either local with a pathname, or remote with a UNC name. However, don’t do it across a network unless you have to . It will usually be faster if you first copy the files to reside on the same server as the BCP
application and the destination server (the integration server in our case). Ideally the source data should be on a different physical drive to the destination database
Both The BCP
command and the BULK INSERT
statement disable triggers by default. The INSERT ... SELECT * FROM OPENROWSET(BULK...)
statement will, by default, execute triggers an this can cause problems.
As well as disabling constraints or triggers, and killing indexes, you can get an edge in the import process if you use ‘Native format’ and you can also speed the process by ensuring that you use minimal logging. If your database is using the simple recovery model, then you’re there already, otherwise you’ll need to switch the database to the bulk-logged recovery model just for the import session, remembering to switch the recovery model back to the full recovery model after you’ve finished: (see this too). You can import into a table from more than one client process in parallel. You can use batches , disable triggers or order the data in the file in the same order as that which is imposed by the table’s clustered index. You can control the locking behaviour
The problem of constraints
One thing that you’ll may discover when you try to insert bulk data into a table that has foreign key constraints is that you can get errors if the relevant rows in the other table aren’t there.
1 2 3 4 5 6 7 |
BULK INSERT [MyDatabase].[MySchema].[MyTable] from 'MyDirectory\MySchema_MyTable.bcp' WITH (KEEPIDENTITY, DATAFILETYPE = 'native', CHECK_CONSTRAINTS) /* Msg 547, Level 16, State 0, Line 1 The INSERT statement conflicted with the FOREIGN KEY constraint "FK_MyTable_MyColumn_MyColumnID". The conflict occurred in database "MyDatabase", table "MySchema.MyOtherTable", column 'MyColumnID'. The statement has been terminated. */ |
You’ll have noticed that I specified 'CHECK_CONSTRAINTS'
: That means both referential and check constraints. That is a great way of ensuring that bad data never reaches the database.
Importing by disabling constraints
By default, both the BCP
command and the BULK INSERT
statement temporararily disables constraints, whereas INSERT ... SELECT * FROM OPENROWSET(BULK...)
doesn’t. If you opt to let the bulk copy process disable check constraints, as happens if you do BCP
/BULK INSERT
in their default setting, then it will enable the constraints so that they work for subsequent insertions or updates, but won’t check existing rows that have been imported. You’ll be left to check the data and set the constraints to being trusted afterwards (see Controlling Constraint Checking by Bulk Import Operations). If you neglect to enable constraints with a retrospective check, then you will get subsequent performance loss in queries since these constraints can’t be used in query plans until they are set to ‘trusted’. In this case, if you populate a table without a check, by temporary disabling the constraint, then it is set to being ‘untrusted’ until some process checks those constraints, including both referential and check constaints.
After you’ve done an import, you can check to make sure everything is trusted by running this query.
1 2 3 4 5 6 7 8 9 10 |
SELECT sys.objects.name AS [Table Name], sys.check_constraints.name AS [Constraint], is_not_trusted, is_disabled FROM sys.objects INNER JOIN sys.check_constraints ON sys.check_constraints.parent_object_ID = sys.objects.object_ID WHERE sys.check_constraints.is_not_trusted = 1 OR sys.check_constraints.is_disabled = 1 ORDER BY sys.objects.name, sys.check_constraints.name |
…or on old versions of SQL Server…
1 2 3 4 5 |
SELECT name, OBJECTPROPERTY(id,'CnstIsNotTrusted') as is_not_trusted FROM sysobjects WHERE COALESCE(OBJECTPROPERTY(id,'CnstIsNotTrusted'),0)<>0 |
Please see sys.check_constraints (Transact-SQL)) for more details.
Before setting the constraints to being trusted, you might want to make sure you’re not going to hit an error. You can check that there are no constraint violations before you set them to being trusted by executing DBCC CHECKCONSTRAINTS
You can check all your tables in one go by executing this
1 |
EXECUTE sp_msforeachtable 'DBCC CHECKCONSTRAINTS("?")' |
you can ensure that a single contraint is checked and, if successful, trusted by doing this…
1 2 3 |
ALTER TABLE [MyDatabaseCopy].[MySchema].[MyTable] WITH CHECK CHECK CONSTRAINT FK_MyTable_Contact_ContactID |
To do all the constraints for the table, you do this
1 2 |
ALTER TABLE [MyDatabaseCopy].[MySchema].[MyTable] WITH CHECK CHECK CONSTRAINT ALL |
To switch them all back on, for every table in the database run:
1 2 |
-- enable all constraints EXECUTE sp_msforeachtable 'ALTER TABLE ? WITH CHECK CHECK CONSTRAINT ALL' |
So lets put this together with an import routine
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
USE MyDatabase --the path to your root directory where you store the files DECLARE @PathToBCPFileDirectory VARCHAR(100) SELECT @PathToBCPFileDirectory= 'MyRootDirectory\' --the filetype you use (there is no standard so I don't bother) DECLARE @Filetype VARCHAR(10) SELECT @FileType= '' --Your database (leave as a null to get this automatically set to the database name) DECLARE @DatabaseSourceDirectory VARCHAR(100) SELECT @DatabaseSourceDirectory= 'MyDatabase' --set this to null if you use the current database name DECLARE @Directory VARCHAR(255) SELECT @Directory=@PathToBCPFileDirectory--construct a valid path +REPLACE(REPLACE(@@ServerName,'\','-'),'/','-') +'\'+REPLACE(REPLACE(COALESCE(@DatabaseSourceDirectory,DB_NAME()),'\','-'),'/','-')+'\' DECLARE @ImportCommand NVARCHAR(MAX) SELECT @ImportCommand= (select 'BULK INSERT ' + QUOTENAME(db_name()) + '.' +QUOTENAME(TABLE_SCHEMA)+'.' + QUOTENAME(TABLE_NAME) + ' from ''' +@Directory +REPLACE(REPLACE(TABLE_SCHEMA+'_'+ TABLE_NAME,'.','-'),'\',':') + @FileType + ''' WITH (KEEPIDENTITY, DATAFILETYPE = ''native'') ' FROM information_schema.Tables WHERE table_Type='BASE TABLE' FOR XML PATH(''), TYPE).value('.', 'nvarchar(max)') EXECUTE (@ImportCommand) EXECUTE sp_msforeachtable 'ALTER TABLE ? WITH CHECK CHECK CONSTRAINT ALL' |
This runs in 40 seconds for AdventureWorks on most of my database servers.
If you want to run this in PowerShell, using BCP
, something like this will do the trick. (your windows account must have the required permission on all the tables to use this)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
# you'd need to fill these in with the details # this routine assumes that the BCP files already exist $PathToBin='' #put this in only if you hit trouble locating BCP. # often 'C:\Program Files\Microsoft SQL Server\100\Tools\Binn\' $directory='MyRootDirectory' # the directory where you want to store them $SourceServer = ("MyServerInstance") # used to find the subdirectory where the files are $SourceDatabase='MyDatabase' #where we take the data and build script from $DestinationServer = ("MyServerInstance") #the destination instance $DestinationDatabase='MyDatabase' #the destination database #Load SMO assemblies $MS='Microsoft.SQLServer' #now load in the SMO DLLs @('.SMO','.SmoExtended') | foreach-object { if ([System.Reflection.Assembly]::LoadWithPartialName("$MS$_") -eq $null) {"missing SMO component $MS$_"} } set-psdebug -strict $ErrorActionPreference = "stop" # # now log into the server and get the server object $My="$MS.Management.Smo" # $s = new-object ("$My.Server") $DestinationServer if ($s.Version -eq $null ){Throw "Can't find the instance $DestinationServer"} $SQL=@' SET NOCOUNT ON; select '"' +TABLE_CATALOG + '"."'+Table_Schema+'"."' + Table_Name+'"' QualifiedTableName, Table_Schema+'_' + Table_Name FileTableName FROM information_schema.Tables WHERE table_Type='BASE TABLE' '@ #now get the destination database object $Destination = New-Object ("$My.Database") ($s, "$DestinationDatabase") if ($Destination.name -ne $DestinationDatabase) #on failure {Throw "Can't find the database '$DestinationDatabase' in $DestinationServer"}; #now get the message handler to get SQL messages $handler = [System.Data.SqlClient.SqlInfoMessageEventHandler] {param($sender, $event) Write-Host $event.Message}; #and assign it to the message pipeline. We just choose to show it in the output $s.ConnectionContext.add_InfoMessage($handler); #execute the SQL to get the tables into a dataset $result=$Destination.ExecuteWithResults("$SQL") #execute the SQL #now, for each table in the pipeline $result.Tables[0] | foreach { $filename = "$($_.FileTableName)" -replace '[\\\/\:\.]','-' #work out the name of the filename #and now we get the whole path to the file we want to import $TableSource = "$directory\$($SourceServer -replace '[\\\/\:\.]','-')\$($SourceDatabase -replace '[\\\/\:\.]','-' )" #and we execute the BCP command $WhatHappened=&"$($pathToBin)BCP.exe" "$($_.QualifiedTableName)" in "$TableSource\$filename" -q -n -T -E "-S$($DestinationServer)" if ($WhatHappened -like '*Error *') {throw ($WhatHappened)} } #all is now in place. We just have to set the contraints to trusted $Destination.ExecuteNonQuery(@" EXECUTE sp_msforeachtable 'ALTER TABLE ? WITH CHECK CHECK CONSTRAINT ALL' "@ ) #flag constraints as trusted $s.ConnectionContext.remove_InfoMessage($handler); |
This seems good and simple, doesn’t it? This is why BCP
and BULK INSERT
default to doing it this way. Why not just always opt to ignore constraints? The answer is that, in general, It isn’t a good idea to modify the metadata unless it is necessary. When constraints are disabled, a schema modify lock might be taken to update the metadata. This can interfere with other commands such as an online index build or affect transactions. In an active OLTP system you will need to avoid doing this, but for our operation, this is safe.
There are alternatives, of course.
Kill ‘n Fill
Some articles advise ‘Kill and fill’, a technique whereby all indexes and constraints are deleted before the BCP
operation and then reinstated. This makes the whole operation a lot easier and faster, but even if we aren’t going to suffer a schema modify lock, I dislike the idea because it is altering the metadata, which isn’t good practice for a test that is reliant on the metadata being at a known state. You don’t avoid the errors on replacing the DRI if there is an error when you try to reinstate the indexes and constraints.
What could possibly go wrong that would cause a constraint violation? There is always the risk that data is accidentally inserted twice into a table, or incorrect data can get in. If you originally exported the data from a database that had active connections modifying the data, then you can get referential integrity errors.
Filling Tables in the right order
Copying tables in the right order is the most trouble-free approach since you import the tables in such an order that any conflicts with a foreign key constraint are avoided without disabling them. You just do a topological sort on the tables. This code will need the temporary stored procedure that I provide in the subsequent listing.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
DECLARE @Filetype VARCHAR(10) DECLARE @DatabaseSourceDirectory VARCHAR(100) DECLARE @Directory VARCHAR(255) DECLARE @PathToBCPFileDirectory VARCHAR(100) DECLARE @Command NVARCHAR(MAX) SELECT @FileType= ''--the filetype you use e.g. .DAT (there is no standard so I don't bother) SELECT @PathToBCPFileDirectory= 'MyRootDirectory\' SELECT @DatabaseSourceDirectory= 'MyDatabase' --set this to null if you use the current database name DECLARE @tables TABLE (TheObject_ID INT NOT NULL, TheName SYSNAME NOT NULL, TheSchema SYSNAME NOT NULL, HasIdentityColumn INT NOT NULL, TheOrder INT NOT NULL) INSERT INTO @tables EXECUTE #AllTablesInDependencyOrder SELECT @Directory=@PathToBCPFileDirectory--construct a valid path +REPLACE(REPLACE(@@ServerName,'\','-'),'/','-') +'\'+REPLACE(REPLACE(COALESCE(@DatabaseSourceDirectory,DB_NAME()),'\','-'),'/','-')+'\' SELECT @Command= (select 'BULK INSERT ' + QUOTENAME(db_name()) + '.' +QUOTENAME(TheSchema)+'.' + QUOTENAME(TheName) + ' from ''' +@Directory +REPLACE(REPLACE(TheSchema+'_'+ TheName,'.','-'),'\',':') + @FileType+ ''' WITH ('+ CASE WHEN HasIdentityColumn<>0 THEN 'KEEPIDENTITY, ' ELSE '' END +'DATAFILETYPE = ''native'', CHECK_CONSTRAINTS) ' FROM @tables ORDER BY theOrder FOR XML PATH(''), TYPE).value('.', 'nvarchar(max)') if @@error=0 EXECUTE sp_ExecuteSQL @Command |
So how do we do a topological sort? Well, all we need to do is to get a list of tables in order of their dependencies. High at the top of the list are tables that have no foreign key relationships, followed by those that are only referenced by other tables but do not themselves refer to any tables. These are then followed by tables that only refer to tables higher in the list. Easy really if one does it with little bits of paper before trying any coding.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO IF OBJECT_ID (N'TempDB..#AllTablesInDependencyOrder') IS NOT NULL DROP PROCEDURE #AllTablesInDependencyOrder GO Create PROCEDURE #AllTablesInDependencyOrder /** summary: @' This routine returns a table containing all the tables in the current database, their schema, object_ID, whether that have an identity column in them, and their dependency level. You would use this for deleting the data from tables or BCPing in the data. '@ Author: Phil Factor Revision: 1.1 dealt properly with heaps Created: 25th November 2011 example: ~ @' Declare @tables Table( TheObject_ID INT NOT null, TheName SYSNAME NOT null,TheSchema SYSNAME NOT null, HasIdentityColumn INT NOT null,TheOrder INT NOT null) insert into @tables Execute #AllTablesInDependencyOrder Select * from @Tables '@ returns: @' TheObject_ID INT,--the tables' object ID TheName SYSNAME, --the name of the table TheSchema SYSNAME, --the schema where it lives HasIdentityColumn INT, --1 if it has identity column TheOrder INT) --Order by this column '@ **/ AS SET NOCOUNT ON; DECLARE @Rowcount INT, @ii INT CREATE TABLE #tables ( TheObject_ID INT,--the tables' object ID TheName SYSNAME, --the name of the table TheSchema SYSNAME, --the schema where it lives HasIdentityColumn INT, --whether it has an identity column TheOrder INT DEFAULT 0) --we update this later to impose an order --let's do a Topological sort --firstly we read in all the tables from the database. INSERT INTO #tables (Theobject_ID, TheName, TheSchema, HasIdentityColumn) SELECT TheTable.OBJECT_ID, TheTable.NAME, TheSchema.NAME, CASE WHEN identityColumns.Object_id IS NULL THEN 0 ELSE 1 END FROM sys.tables TheTable INNER JOIN sys.schemas TheSchema ON TheSchema.SCHEMA_ID = TheTable.schema_ID LEFT JOIN (SELECT DISTINCT Object_id FROM sys.columns WHERE is_identity = 1) identityColumns ON Thetable.object_id = identityColumns.object_id /* We'll use a SQL 'set-based' form of the topological sort First, find a list of "start nodes" which have no incoming edges and insert them into a set S; at least one such node must exist in an acyclic graph*/ --flag all the immediately safe tables to insert data in UPDATE #tables SET TheOrder = 1 FROM #tables parent --do not reference any other table and aren't referenced LEFT OUTER JOIN sys.foreign_Keys referenced ON referenced.referenced_Object_ID = parent.Theobject_ID LEFT OUTER JOIN sys.foreign_Keys referencing ON referencing.parent_Object_ID <> parent.Theobject_ID WHERE referenced.parent_object_ID IS NULL AND referencing.parent_object_ID IS NULL --do not reference tables but might be referenced by others UPDATE #tables SET TheOrder = 2 FROM #tables parent LEFT OUTER JOIN sys.foreign_Keys referencing ON referencing.parent_Object_ID = parent.Theobject_ID AND referencing.referenced_Object_ID <> parent.Theobject_ID WHERE referencing.parent_object_ID IS NULL AND TheOrder = 0--i.e. it hasn't been ordered yet SElECT @Rowcount=100,@ii=3 --and then do tables successively as they become 'safe' WHILE @Rowcount > 0 BEGIN UPDATE #tables SET TheOrder = @ii WHERE #tables.TheObject_ID IN ( SELECT parent.TheObject_ID FROM #tables parent INNER JOIN sys.foreign_Keys ON sys.foreign_Keys.parent_Object_ID = parent.Theobject_ID INNER JOIN #tables referenced ON sys.foreign_Keys.referenced_Object_ID = referenced.Theobject_ID AND sys.foreign_Keys.referenced_Object_ID <> parent.Theobject_ID WHERE parent.TheOrder = 0--i.e. it hasn't been ordered yet GROUP BY parent.TheObject_ID HAVING SUM(CASE WHEN referenced.TheOrder = 0 THEN -20000 ELSE referenced.TheOrder END) > 0--where all its referenced tables have been ordered ) SET @Rowcount = @@Rowcount SET @ii = @ii + 1 IF @ii > 100 BREAK END SELECT TheObject_ID,TheName,TheSchema,HasIdentityColumn,TheOrder FROM #tables order by TheOrder IF @ii > 100 --not a directed acyclic graph (DAG). RAISERROR ('Cannot load in tables with mutual references in foreign keys',16,1) IF EXISTS ( SELECT * FROM #tables WHERE TheOrder = 0 ) RAISERROR ('could not do the topological sort',16,1) GO |
Here, we use the topological sort from within PowerShell to import the database’s data without having to upset any constraints. All these variants run at around the same speed.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
# you'd need to fill these in with the details # this routine assumes that the BCP files already exist $PathToBin='' #put this in only if you hit trouble locating BCP. # often 'C:\Program Files\Microsoft SQL Server\100\Tools\Binn\' $directory='MyRootDirectory' # the directory where you want to store them $SourceServer = ("MyServerInstance") # used to find the subdirectory where the files are $SourceDatabase='MyDatabase' #where we take the data and build script from $DestinationServer = ("MyServerInstance") #the destination instance $DestinationDatabase='MyDatabase' #the destination database #Load SMO assemblies $MS='Microsoft.SQLServer' #now load in the SMO DLLs @('.SMO','.SmoExtended') | foreach-object { if ([System.Reflection.Assembly]::LoadWithPartialName("$MS$_") -eq $null) {"missing SMO component $MS$_"} } set-psdebug -strict $ErrorActionPreference = "stop" # # now log into the server and get the server object $My="$MS.Management.Smo" # $s = new-object ("$My.Server") $DestinationServer if ($s.Version -eq $null ){Throw "Can't find the instance $DestinationServer"} $SQL=@' SET NOCOUNT ON; DECLARE @Rowcount INT, @ii INT CREATE TABLE #tables ( TheObject_ID INT,--the tables' object ID TheName SYSNAME, --the name of the table TheSchema SYSNAME, --the schema where it lives HasIdentityColumn INT, --whether it has an identity column TheOrder INT DEFAULT 0) --we update this later to impose an order --let's do a Topological sort --firstly we read in all the tables from the database. INSERT INTO #tables (Theobject_ID, TheName, TheSchema, HasIdentityColumn) SELECT TheTable.OBJECT_ID, TheTable.NAME, TheSchema.NAME, CASE WHEN identityColumns.Object_id IS NULL THEN 0 ELSE 1 END FROM sys.tables TheTable INNER JOIN sys.schemas TheSchema ON TheSchema.SCHEMA_ID = TheTable.schema_ID LEFT JOIN (SELECT DISTINCT Object_id FROM sys.columns WHERE is_identity = 1) identityColumns ON Thetable.object_id = identityColumns.object_id /* We'll use a SQL 'set-based' form of the topological sort First, find a list of "start nodes" which have no incoming edges and insert them into a set S; at least one such node must exist in an acyclic graph*/ --flag all the immediately safe tables to insert data in UPDATE #tables SET TheOrder = 1 FROM #tables parent --do not reference any other table and aren't referenced by anything LEFT OUTER JOIN sys.foreign_Keys referenced ON referenced.referenced_Object_ID = parent.Theobject_ID LEFT OUTER JOIN sys.foreign_Keys referencing ON referencing.parent_Object_ID = parent.Theobject_ID WHERE referenced.parent_object_ID IS NULL and referencing.parent_Object_ID is null UPDATE #tables SET TheOrder = 2 FROM #tables parent --do not reference any other table but might be referenced LEFT OUTER JOIN sys.foreign_Keys referencing ON referencing.parent_Object_ID = parent.Theobject_ID AND referencing.referenced_Object_ID <> parent.Theobject_ID WHERE referencing.parent_Object_ID is null AND TheOrder = 0--i.e. it hasn't been ordered yet SElECT @Rowcount=100,@ii=3 --and then do tables successively as they become 'safe' WHILE @Rowcount > 0 BEGIN UPDATE #tables SET TheOrder = @ii WHERE #tables.TheObject_ID IN ( SELECT parent.TheObject_ID FROM #tables parent INNER JOIN sys.foreign_Keys ON sys.foreign_Keys.parent_Object_ID = parent.Theobject_ID INNER JOIN #tables referenced ON sys.foreign_Keys.referenced_Object_ID = referenced.Theobject_ID AND sys.foreign_Keys.referenced_Object_ID <> parent.Theobject_ID WHERE parent.TheOrder = 0--i.e. it hasn't been ordered yet GROUP BY parent.TheObject_ID HAVING SUM(CASE WHEN referenced.TheOrder = 0 THEN -20000 ELSE referenced.TheOrder END) > 0--where all its referenced tables have been ordered ) SET @Rowcount = @@Rowcount SET @ii = @ii + 1 IF @ii > 100 BREAK END SELECT TheObject_ID,TheName,TheSchema,HasIdentityColumn,TheOrder FROM #tables order by TheOrder IF @ii > 100 --not a directed acyclic graph (DAG). RAISERROR ('Cannot load in tables with mutual references in foreign keys',16,1) IF EXISTS ( SELECT * FROM #tables WHERE TheOrder = 0 ) RAISERROR ('could not do the topological sort',16,1) '@ $Destination = New-Object ("$My.Database") ($s, "$DestinationDatabase") if ($Destination.name -ne $DestinationDatabase){Throw "Can't find the database '$DestinationDatabase' in $DestinationServer"}; $handler = [System.Data.SqlClient.SqlInfoMessageEventHandler] {param($sender, $event) Write-Host $event.Message}; $s.ConnectionContext.add_InfoMessage($handler); $result=$Destination.ExecuteWithResults("$SQL") #execute the SQL $s.ConnectionContext.remove_InfoMessage($handler); $result.Tables[0] | foreach { $filename = "$($_.TheSchema)_$($_.TheName)" -replace '[\\\/\:\.]','-' $TableSource = "$directory\$($SourceServer -replace '[\\\/\:\.]','-')\$($SourceDatabase -replace '[\\\/\:\.]','-' )" $WhatHappened = &"$($pathToBin)BCP.exe" "`"$DestinationDatabase`".`"$($_.TheSchema)`".`"$($_.TheName)`"" in "$TableSource\$filename" -q -h "CHECK_CONSTRAINTS" -n -T -E "-S$($DestinationServer)" if ($WhatHappened -like '*Error*') {throw ($whatHappened)} } |
Now we have a topological sort, we can also use it to perform surgery on a database. I started this article by showing you how to clear out all the data in a database in order to re-fill it with a different set of data. We just build a batch of deletes in the right order and execute it. (Don’t try this unless you have a good backup and you’ve double-checked that you are in the right database!)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
Declare @tables Table( TheObject_ID INT NOT null, TheName SYSNAME NOT null,TheSchema SYSNAME NOT null, HasIdentityColumn INT NOT null,TheOrder INT NOT null) insert into @tables EXECUTE #AllTablesInDependencyOrder DECLARE @Command NVARCHAR(MAX) SELECT @Command= (select 'Delete from ' + QUOTENAME(db_name()) + '.' +QUOTENAME(TheSchema)+'.' + QUOTENAME(TheName) + ' ' FROM @tables ORDER BY theOrder desc FOR XML PATH(''), TYPE).value('.', 'nvarchar(max)') EXECUTE sp_msforeachtable 'ALTER TABLE ? DISABLE TRIGGER all' EXECUTE sp_ExecuteSQL @Command EXECUTE sp_msforeachtable 'ALTER TABLE ? ENABLE TRIGGER all' |
What this is doing is to start with all those tables that aren’t referenced by any other table other than themselves, and then chipping away at the others, picking those that are only referenced by tables that have had all their data deleted. On my test server, MyDatabase data is all deleted within 25 seconds. On some databases, this won’t work because of horrible circular references, or mutual references. In these cases, you can use the blunter instrument of disabling constraints.
Wrapup.
From my experience, the four different methods of reading data into a database take roughly the same amount of time. The bulk load process is so efficient when you give it a chance that the process is governed more by the speed of accessing the data from file, and the network speed. I prefer a topological sort even if the code looks a bit intimidating, since it is less intrusive and no checks are ever disabled. However, I can appreciate that the simpler system makes a lot of sense if the user is aware of the need to mop up afterwards.
When I started writing this article, I felt it would be easy since we all know how to import data into a database. As I was intending on also writing about getting data out, scripting out the metadata and copying databases, I decided to do the entire process of copying a database, with extracts from my own scripts, using a range of databases and testing before-and-after with data and schema comparison tools. I also tried to double-check everything I wrote, even when I was sure of my facts. The exercise convinced me that I had started out knowing far less than I should have done about the process. I hope that the lessons I learned will be of benefit to others!
Further Reading
- Getting Data out of SQL Server from PowerShell by Chad Miller
- Loading Data With PowerShell by Chad Miller
- bcp Utility from MSDN
- BULK INSERT from MSDN
- OPENROWSET from MSDN
- Optimizing Bulk Import Performance
- Managing Bulk Copy Batch Sizes
- Managing Batches for Bulk Import
- Controlling Trigger Execution When Bulk Importing Data Importing Bulk Data by Using BULK INSERT or OPENROWSET(BULK…)
- Controlling Constraint Checking by Bulk Import Operations
- Prerequisites for Minimal Logging in Bulk Import
- Importing Data in Parallel with Table Level Locking
- Scenarios for Bulk Importing and Exporting Data
- Basic Guidelines for Bulk Importing Data
- Controlling the Sort Order When Bulk Importing Data
- Controlling Locking Behavior for Bulk Import
- Use Native Format to Import or Export Data (SQL Server) from MSDN
- TRUNCATE TABLE (Transact-SQL). from MSDN

DevOps, Continuous Delivery & Database Lifecycle Management
Go to the Simple Talk library to find more articles, or visit www.red-gate.com/solutions for more information on the benefits of extending DevOps practices to SQL Server databases.