16 April 2020

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

16 April 2020

SQL Server Database Deployment: What Could Go Wrong?

Phil Factor tackles the tricky questions you will need to answer when deciding how to automate your SQL Server database builds and deployments with Redgate's development, version control and deployment tools.

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

How do you deploy a SQL Server database?

It depends. No really. There was a time that the only method with an RDBMS was to take the existing version offline after outputting its data with a bulk copy utility, and build the new version from a single script that you executed to create the database. Then you filled the database with the data that you exported and finally put the system back online. How you got to that point of using that single script was another matter. It might have started out being the generated output from an Entity-Relationship diagramming tool, or some other design tool. Even the most fanatical database designer wouldn’t hand-cut table build scripts if there was an easier and more reliable way of doing it.

The ALTER TABLE command came a bit later to SQL. This gave the opportunity for database developers to opt to change existing databases while preserving their data, though it was only in the minority of cases could the RDBMS do this without supplementary scripting. If this sort of ‘migration’ script was used, the matching build script was altered at the same time so that it would subsequently create the ‘altered’ object from scratch.

Nowadays, the way you develop and deploy databases depends on the type of application and setting. A retail bank or government department will not tolerate some of the practices of a ‘startup’ culture. However, there is the common insistence from any organisation that it must be possible to build the database from the source code, lodged in a repository that is visible to the team. Beyond this, a development team should have the freedom to choose a development and release system that is most appropriate.

I suggest that, whereas the release process for an existing database application should be dominated by the testing and use of one or more migration scripts that preserve existing data, the regular automated build is the quickest and least distracting way of developing a database, just so long as each potential data migration issue is logged and solved when it is met.

How could a team work together on a single script without ‘knocking heads’?

With a single build script document, it is easy to cause conflicts with updates that interfered with concurrent work by another developer. The solution was to use a SQLCMD script that pulled in as many separate files as you wanted in the correct order. You could then use source control as usual to track the evolution of each object. The -i command-line option allows you to specify a list of files. SQLCMD checks that they all exist before it executes them in the order you specify. Each component file could be placed in source control. As each file could in turn have a list of the :r<filename> SQLCMD commands in the file, you aren’t going to be forced to have large script files, even with the most complicated database, and you can guarantee them to be executed in the correct order that you specify. With a migration script, each release needs its own script and it can only be used to move from one version to the next ‘target’ version. This is in source control and is treated like any other file. They can easily be chained to move a target more than one version level.

Just a sec. Why do these separate script files all have to be in order?

Even tables depend on other SQL Server components being defined. When you have UDTs (user-defined data types), for example, these need to be in place before the tables. Functions can’t be created until the objects that they reference are created as well. You need to have all the constraints and references in place with the table you create, so it is easy to understand the design, which means that tables often need to be built in the right dependency order. These are just examples. Migration files must be sequenced in order, for a different reason: if you need to move an existing database more than one version step. Migration scripts by convention move from one version to the next. To upgrade two version steps, you migrate to the next version and then to the following one.

Can’t I restrict myself just to SQL Server objects that don’t cause dependencies? UDTs and functions are a bit esoteric after all.

You won’t be the first to try that, but maybe you’ll be the first to succeed. Good luck anyway.

How do I get all my source files in the right order?

If you’ve inherited a mess, then it depends on how messy it is. If you already have an up-to-date working database, you can, by using SSMS, produce a single-file build script that will tell you the correct order. If there is no current build of the database, it is perfectly possible to get SQL Compare to produce a single document build script (target) from the mess (source), if you put all the files in one directory. You just synchronize the messy directory with an empty database or use a handy switch, /empty2, in SQL Compare.

If you make both the source and target a scripts folder, it takes the mess in the source directory and does it all neatly in the target. While you’re about it, it can even act like your mother and sort out the original mess by placing the source of each table and routine separately and neatly in a directory, with sub-directories for every type of database object and one file per table or other type of database object: (make both source and target a source directory).

Whichever way you choose to get the single document build script, it will tell you the correct order of building the database. Then, you just had to create a SQLCMD file that specified the correct order of executing scripts, and you had a base from which to work as you modify and expand the database. Tools like SQL Change Automation will allow you to establish the deployment order of scripts too, but you have to change the names of the files. In any case, you can build directly from this directory, and you can insert any necessary DML scripting. A similar file can manage a set of migration scripts in sequence for releasing a new version to production.

Why not just continue with generating single-document build scripts from all the script Files?

Time, effort and repeatability. A skilled database development team will think nothing of rebuilding a database several times a day, if required. I built a database twenty times in an hour the other day. It means that they can make U-turns out of design mistakes very quickly, to be less inhibited in trying out different strategies or using features that cause more dependencies. Even if it isn’t vital to do it, they do it regularly, in order to be the first to know of a change in dependency throwing the order.

When you use SQL Compare to create a single-document script, it doesn’t simply execute your SQL files, it parses those files, uses them to build a model of the source database and then compare it with that of the target. It then generates its own SQL to make the target the same as the source. For a build, this wouldn’t matter much, even though any DML, such as adding extended properties, or inserting reference data or enumerations, is going to be lost. However, when you are creating a deployment script, your cunning migration steps will no longer exist.

What about all those problems that a migration script approach solves?

There aren’t any such problems with a database build. A migration script solves the problems of updating an existing database while preserving existing data, not the problem of building it. It is far easier to build a database from scratch and then put the data in it than to do it with the data in there, in the same way that it is easier to build a house before the occupants move in. Migrations is like building a house extension with the house still occupied; it’s just as likely, but it’s a different problem. Also, if you use ALTER TABLE statements to upgrade a database, you have the extra work of ensuring that the changes are also reflected in the build CREATE scripts in source control.

Why worry about migrations then?

Migrations are important for deployment. Once you’ve finished testing, and the team is happy that a release candidate is suitable for release and therefore ready for the deployment pipeline (Testing, UAT, Staging, and so on), they need to create and thoroughly test a migration script.

This migration script is only useful for changing a database from one precise version to another precise version. As you reach the point where you are set to do a release, you focus on doing a migration script. SQL Compare can generate a first-cut migration script automatically, but if you’ve done awkward things like renaming tables or re-engineering your entire data model, then it is inevitable that you will have to hand-cut some code to include so as to preserve your existing data. This will need testing, and some of this testing will need to be in Staging.

Can you release databases without migrations?

There was once a time that SQL databases could only be built, not altered. In those days we had to take the application offline, export the data, and then import the data into the new version before bringing it back online. If the data model had changed this required a special routine to save the data in such a way that it could be imported into the new table structure, or for the data to be altered in its exported form, such as CSV. You could, alternatively import the data into a temporary version of the existing database on the same server and migrate the data across, slicing and dicing the data as necessary to fit the new model. With the whole application offline, you would then wait, drumming your fingers and trying to ignore the phone, while the data slid in, and you’d be hoping there wouldn’t be a glitch. Have I said enough?

So now that we have a migration script that only takes a precise version to the next, how do we cope with other versions that might be out there?

Each release candidate has with it a migration script that will take a production database, or any database stocked with data, from the production version existing at the time to the release candidate’s version. In order to go from other versions, you apply the migration script for each version, in order.

Wow, so I can just apply each migration script one after another to build a database from scratch!

It would do so, if you execute them in order and have the complete set, but the whole point of a migration script is to preserve the existing application data. How would you have the data from the existing database in that first release when some of the tables that contain it maybe weren’t even there?

Nope. All you would do is to recapitulate every failed design mistake and correction, and you wouldn’t have the data. Another problem is that any build script should either be successful or otherwise leave no trace. That usually means doing it inside a transaction but even so things can get complicated and sometimes you must do parts of the operation outside a transaction. In this case, you must ‘mop up’. A series of migration scripts must, if they are applied in a string, fail tidily so that they leave the database as it was before you started. This isn’t easy.

Unless you also save each change into the object’s CREATE script, you no longer get the benefit of source control in knowing who did what, when and why. You can’t get this from a migration script. It is basic accountability and supervision. How do you do an audit trace of changes to a table?

Yes, but having a migration script in development seems so much simpler

If you were a single developer, working with a copy of the production data, then sure, it would. However, most database developers need to make mistakes, try things out, and optimise performance by tweaking code. If each step is a migration file, then inevitably, these files will stack up. It isn’t unknown to have hundreds of migration files in a release. At some point you must consolidate them into a single migration. Then you have another developer on the project. You inadvertently change the same migration step because you are both working on a vital table or set of procedures. You then have to resolve those conflicts by merging the two branches. Then you discover that the application has names and addresses of people, maybe other details unencrypted. The boss discovers that it is no longer legal to develop applications using this data and that the boss is personally liable.

Even if you solve all these problems and work out how to maintain the source in source control, you can still be caught out when you apply the migration script to the production server with the updated data. The classic examples are when a member of the Null or O’Brien family register to use your system in the meantime.

How do you deal with changes in table structure that would result in the data no longer importing?

When you make a ‘breaking change’ to one or a group of tables, you prepare, test and execute the migration code that prepares the data. This might be a change in where the data is stored, but it could easily be changes to deal with a CHECK constraint, a change in column nullability (NULL to NOT NULL), or extra unique constraint or foreign keys. You execute the code to fix the data and make sure it is in the right place and form for the new build. Then, you add the code to the file that will become the migration script for the next release.

You then save the data as the new build data for the next build. Things can go wrong with the next regular build, but nobody gets hurt. The mistakes are all valuable insights when preparing the migration script.

Why bother to build databases repeatedly in development?

Even if you are working on just one version or variant of a database, you are likely to have at least two copies of a database in development. You’ll have a copy that you can commit unspeakable crimes on without bothering anyone else, tear down, rebuild and experiment. You’ll have another, shared, one which you can use for regression testing changes that are checked in. You may have others with different sets of data or variants. Everything has to be kept up to date with the checked-in source code. You’ll have different standard datasets because, after all, the quickest way to test end-to-end processes is to start with a defined set of data and see if you end up with the right result. It isn’t just a matter of checking that the data that should have changed has changed but also of ensuring that the data that shouldn’t have changed has remained the same.

You also must keep in step with the application development. Usually the database is just a small component of an application. If the developers of the application at large are making or testing major changes, you will need to provide at least one copy of the database to keep in step with their progress, particularly if the application is close-coupled with the database to the point of accessing tables directly. They will want to test their code to a step-version of the database that they support and that they are ‘building to’. It may not be the same step version with which you are experimenting.

My database is so huge that I can’t build from scratch. I have to migrate it

The build process must be entirely automated and optimized because it is otherwise slow and tedious. Besides, humans are error-prone when tired, as my digital friends keep reminding me. Inserting data can be a sluggish process, but if you use native BCP with constraints disabled, a database must be a pretty reasonable size to outgrow an overnight build. You soon find that the development cycle falls naturally into the use of an overnight build. Where you have dev teams in radically different time-zones, this can be awkward, but I’ve found that there is always a ‘dead spot’ where a regular build can be scheduled.

Even if the development database is too big for an overnight build, then there is no need to panic. This isn’t a new problem, and there are solutions to it. Most databases hold data that is entirely unnecessary for the functioning of the database. I’ve never found a database that hasn’t got a rich trove of useless and unreferenced information, even image data, swilling about in it. We call it ‘Bedroom-cupboard syndrome’. The answer is to load the essential information first. Then you can make the database available. Then you load the rest as a background process in order of its usefulness. Nobody will notice. Humanity often retains data merely for its comfort value.

What exactly is the difference between a build script and a migration script?

A build script and a migration script both take a database from one version to another. The only difference is that a build script takes it from bedrock to the required version whilst ignoring any existing data because there isn’t any at this stage, whereas a migration script preserves existing data and performs any necessary changes to the data between the two versions. Generally, it remains good practice that build scripts are wrapped in transactions, but often aren’t because it is so easy to mop up after an error. A migration script requires rollback and idempotence (i.e. no damage in running it again inadvertently)

I’m struggling to work out the difference between a synchronization script and a migration script?

The term ‘synchronization’ is misleading. Data synchronization usually implies that data manipulation on all synchronized nodes are carried through to the other nodes. If you put a new MP3 on your Mobile it appears on your laptop, and vice versa. If you delete that Rick Astley single on the laptop, it gets deleted on the mobile. A database synchronization script makes the target database the same as the source whilst preserving its existing data, but not vice versa. A migration script takes a database and its data from one version to the next version increment. Not a great deal of difference. Traditionally, migration scripts tended to be ‘hand-cut’ whereas synchronization scripts tended to be generated by a third-party tool. They have the same requirements for rollback on error.

So, are migration scripts a good thing?

Good? They are, and always have been, indispensable in the deployment pipeline if you are upgrading an existing live system. The more you use them, the more likely you are to have a trouble-free release to production. However, database development itself, before a release candidate is selected is usually far better and more flexible with a build.

How would you sum up?

The database developers have the task of making life as easy as possible for others. This means thinking as early as possible about issues of deployment, security, maintenance, training, supervision and compliance. To help with this, the database must be in source control at the object level so that the broader team can check early on for potential issues, and changes can be tracked. Each time there is a release planned for deployment to the production database system, this must be accompanied with a migration script lodged in source control that will safely upgrade the production version to the new release. The databases must, therefore, be tagged with a correct version number to prevent accidents.

I recommend a regular build of a database, usually overnight but more regularly, if necessary. I recommend a clean routine build in development rather than creating and applying a new migration script for this routine build, both because it cuts out unnecessary work, and because it is an approach that encourages routine regression testing with standard data sets. However, developers need to carefully flag potential data migration issues in advance of a release, and so work on the migration script well before the script is needed. They will use many of the routines that are needed anyway to prepare the data for the regular build whenever a ‘breaking change’ is checked in.

You may also like