Product articles
Flyway
Database Testing and Quality
Getting Data In and Out of SQL Server…

Getting Data In and Out of SQL Server Flyway Builds

This article provides SQL "bulk copy" routines to extract data from and load it into a SQL Server database, rapidly, and demonstrates a way to automatically bulk insert the test data in a Flyway build, using a SQL callback.

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.


This article is part of a series on managing test data in a Flyway development:

Also relevant are the series of related articles on Database testing with Flyway.


I’ll start by briefly reviewing a strategy for storing datasets that maintains separation of data and metadata, and then get into the practicalities of bulk import and export of data, during Flyway development. Row-by-row inserts are fine for smaller data volumes, but are not generally fast enough for a build-and-fill operation. If you want to do ‘builds-while-you-wait’, then you’ll need bulk inserts, unless you want to wait hours. Finally, it makes the build script files much shorter.

I’ll assume you have a SQL Server database at the right version, stocked with the data you need, and I’ll demonstrate SQL routines to extract and load the data, using BCP, and then provide a SQL callback script that will automatically bulk insert the test data as soon as Flyway successfully builds the database.

The separation of data and metadata

When you are building a database of any size, you’ll probably have a policy of separating data from metadata. It is a lot easier, and it gives you more flexibility to build the metadata to the correct level and then insert the data. There are several reasons for this. For a start, you shouldn’t store data, other than the barest essential of enumeration data, in database source control: it is fine for data that requires change control and a new version of the database, but that is a rare event.

Data should almost never be handled in the same way as a structural change. Data is under a very different type of control and has entirely separate repercussions for auditing. It should not be in source control, because it may contain personal or sensitive information that must be curated in a very different way to the application source. It will probably need redaction or masking before it can be used in development. If you have data that is part of the database, then the table that contains it is best handled as a view with the literal data rendered in multi-line VALUE statements so that it is read-only and cannot be changed accidentally.

Storing datasets

The management of data sets will require a bit of thought. In this example, I store them on the SQL Server hosting the source database. The directory is structured by project name, usually the name of the database. By storing them on the server, it means that you don’t have to give the SQL Server login any network access. You can, if you need to, copy the data sets to your project folders stored somewhere on the local area network, as a separate task. However, I’d advise instead that you to leave them on the server hosting the SQL Server and share them to a restricted list of users.

Maintaining datasets per database version

To demonstrate this in Flyway, we need to be aware that a user might need any of the current versions of the database. These are likely to have different ways of arranging the data, particularly in the early stages of development. We’ll therefore save the table data in a different directory for each change to the tables, such as a column change. We’ll copy the Flyway convention and save each dataset in a directory whose name is the starting version number that is appropriate. The same dataset may well be appropriate for several versions, so when we select a dataset, we choose the most recent version before or equal to the version of the database we need to build. By doing this we use the minimum necessary disk space for our data sets with the smallest amount of duplication.

To do this, we need to compare standard version numbers. If your language supports the comparison of arrays, the comparison of standard version numbers is trivial. We need to do this in SQL for this exercise, so it isn’t quite so easy.

Multiple datasets at the same version

You may need several datasets. If you are doing testing, you’ll probably have a standard dataset that allows you to compare the result of a process with a version that has been verified as being correct. You might also want a data set with the largest likely predicted volume of data to check scalability. You might also want a small dataset for a quick test cycle of basic function. This can be handled by Flyway, but is out of the scope of this article, because so much depends on the way you store your development data.

Writing data out and reading data in

You will need two routines. They should not be part of the database but held separately as deployment scripts. The first one will extract the data from a database at the correct version, using BCP. You run this batch ad-hoc, as required. With SQL Server, you will probably want to choose native Unicode format to store your data.

The second routine loads the data into an empty database build, of the same version. It will be in a Flyway SQL callback, executed when the migration ends. It will need to know what version the migration got to and then access the corresponding dataset. Our data folder will need a dataset for every table change, so that each subdirectory holds a dataset that is appropriate for one or more revisions, until superseded by a higher version.

Extract the data from the source database using BCP

The first SQL script is to extract data from the ‘donor’ or source database. To output the contents of every table of a source database, as a series of ‘wide’ native BCP files, is pretty trivial in a development environment. You might have used SQL Data Generator to create data, or you could have used SQL Data Masker. If you are using unmasked ‘production’ data, the task becomes less trivial and requires controls.

You will need a login that is powerful enough to have write-access to the file system of the server that hosts SQL Server. The login must also be able to temporarily allow SQL Server to execute DOS commands. This is relatively simple in code if you are a system admin (sa). If you haven’t got this sort of superpower, then you’ll have to get a DBA to set up a task to do this, perhaps on SQL Server Agent.

Here is the script. The @CurrentDatabase should be a copy of the database from which you wish to write out the data. I’m using AdventureWorks2016. The data folder location will be a subdirectory of ‘C:\BCPData’, called <dbname>\databaseContents\VersionNumber, where <dbname> is the value assigned to @SourceDatabase, which is the project folder name for this database. In this example, the @SourceDatabase is just taken from the current database, AdventureWorks2016, but in most cases you would want to use a more generic project name such as AdventureWorks.

The script will first try to grab the version number of that database from the flyway_schema_history table, if it’s a Flyway-controlled database. If there is no flyway history table, it will then check for a version stored in an extended property. If neither exist, it just assigns a version number of your choosing and which you need to add to the code.

Note that if you are getting data from a Flyway database, you need to prevent it copying out the data from the Flyway schema history table, because you really don’t want it copied into the target database(s) This may not be in dbo so you need to replace that with its chosen schema. If you execute this within Flyway you can, of course, use the placeholder – ${flyway:defaultSchema}.

Load the data into the target database from the BCP data files

We have a parallel batch to read it into your currently empty copy of the database, at the same version. If this database is on another server, you’ll need to sync the data files over to it. You can also use a network attach but remember to alter the value in the data folder location (@DataFolderLocation) accordingly.

The script will select the data files in the folder matching the version of the target database, or lower. Before loading the data, it disables all constraints, then after the BULK INSERT, reenables them, checking there are no violations.

As this is executed by Flyway, I’ve added two placeholders:

  • ${flyway:defaultSchema} – a default placeholder that specifies the schema in which the flyway history table is placed
  • ${datasource} – a custom placeholder for the project name of the database, whose value is passed to Flyway on the command line, when we run the build.

If you use SSMS, you’ll need to remove these placeholders, but if you use SQLCMD mode of SQLCMD, you can change the placeholders slightly to provide values using the setvar command. for example: setvar dataSource "AdventureWorks2016 "

The current version of all the files is held on GitHub here.

Test it out: Build AdventureWorks and fill it with data

We can now try it out on AdventureWorks. We’re going to build it and check that it is the same as current AdventureWorks. We’ll use AdventureWorks2016, though any version will do. Here is our data folder location, with an AdventureWorks2016 directory, which is on the SQL Server that hosts the database from which we’re writing out the data.

The data folders for each database to be built

Write out the source data

Now, we run the first routine for writing out the data to the data folder location, and we end up with a databaseContents directory with the data files for all the versions we need. We only need the one version, which I’d assigned V1.1.3.

Maintaining datasets for each database version

Within this folder are a whole lot of files:

The BCP files

Create the empty target database

This now provides the data for a new build of Adventureworks. For this build, we create a blank database of AdventureWorks, which I called AdWorks. This must be done before the build with a separate process, such as SQLCMD. I’ve added the logic to delete it if it already exists. AdWorks is hosted on the same server as the source AdventureWorks2016 database.

Build-and-fill the database using Flyway and PowerShell

Now that we have an empty database, with the awkward bits of the fulltext database in position, we can build the database, using Flyway, and fill it with data. This is the contents of the Scripts directory of the GitHub project:

The Flyway migration scripts

The V1.1.3_AdventureWorks.sql migration script is an ordinary build script. I generated it using SQL Compare, but you could also do it in SSMS. I gave it the version 1.1.3 to match what we used earlier. Flyway must run it outside a transaction, and we must declare it as ‘mixed’. meaning that Flyway should allow mixing transactional and non-transactional statements within the same migration. Normally Flyway attempts to run each migration within its own transaction but if it detects that a specific statement cannot be run within a transaction, it won’t run that migration unless you allow transactional and non-transactional statements to be mixed within a migration run. The entire script will be run without a transaction. I’ve added a config file to specify these options. It has the same name as the build file but with a .conf filetype.

The SQL callback script, afterMigrate_ReadEveryTableViaBCP.sql, will run after the V1.1.3 migration completes and has the SQL code to import the data and check that it conforms to the constraints. I also use the afterMigrate script from my previous article, Customizing Database Deployments using Flyway Callbacks and Placeholders. It applies the new version to the database as an extended property and puts an entry in the server log that the migration took place. We need this to pick out the correct dataset for the version if we have more than one version.

With all this in place, we just need the PowerShell code to connect to the database and run the migration. If the migration succeeds, the callback will automatically run and bulk load the data. The PowerShell automation script is more elaborate because we use a technique that allows a user who has UserName/Password credentials to keep passwords in a secure place within the user folder. If a password is not already stored, you are asked for it once. All you need to do is to fill in the variables at the start of the script, appropriately. You don’t need to provide the $username if you use integrated security, but you need to specify in the connection string that you are using integrated security (integratedSecurity=true). The $ProjectName and $ProjectDescription are used for adding a SQL Server Extended property on the database, and for logging the migration in the SQL Server log. The $datasource refers to the project name within which the actual data files are stored.

Under the covers, we are then assembling the parameters that we pass to command-line Flyway, ‘splatting’ them as an array. Just so you can see what is happening under the hood. the actual Flyway parameters are:

  • url=jdbc:sqlserver://MyServer:1433;databaseName=AdWorks
  • -locations=filesystem:MyPathToTheScripts
  • -user=MyUserName
  • password=MyFiendishPassword
  • -placeholders.projectDescription=A sample project to show how to build a database and fill it with data
  • -placeholders.projectName=AdventureWorks
  • -placeholders.datasource=AdventureWorks2016

Here is the code:

Check the build

So, if this works error-free, and I’m quietly confident that it will, you can then check that the data in the new build matches the original, using SQL Data Compare:

Using Data Compare to verify results

In this example, the source database was built using Flyway, but with slightly different details. We deliberately didn’t copy across the flyway_schema_history table, so these don’t match. Otherwise, the data is identical.

The same goes for SQL Compare with the script compared with Adworks, except that one or two constraints and a trigger had been disabled in the source but not the target, because we just enabled them all.

Adding migration scripts

AdWorks is now a Flyway-managed database. although you need to be aware that the data is being added by a method that Flyway isn’t controlling directly.

If we wanted to add migration scripts to what we’ve done, that will be fine. How you do it depends on whether you want the data reloaded at the end of every build. If you do, then empty the database before you start each run with the supplied routine, and make sure that you have a new dataset in your datastore each time you change any of the tables.

However, if you want subsequent migration scripts to manage any data movement, then simply remove or rename the AfterMigrate script that loaded the data so that Flyway doesn’t attempt to BCP data into tables that are already filled. When you have added the required chain of migrations, you can save the new layout of the data using the first SQL script and then replace the AfterMigrate script back into the directory, create a new database, run the whole migration set, and it will load the new data at the end. If you have a lot of data, you’ll find this to be the fastest way of doing it.

Conclusions

I’ve now given you all the information you need to use Flyway to do a conventional build. You can, of course, do the same with a migration, but if you can provide the data once you reach the right version of the database, why bother with a migration that preserves data? Flyway will obligingly take you straight to the version you want if you can provide it with a build script, and then it will quietly fill it with the data for you via the script in this project. I’m not suggesting that there is any ‘better’ approach to producing the database at the correct version. Flyway will fit in with the way that a development team has chosen to work and feels comfortable with.

 

Tools in this post

Flyway

DevOps for the Database

Find out more