Database Migration Scripts: Getting from place A to place B

We’ll be looking at a typical database ‘migration’ script which uses an unusual technique to migrate existing ‘de-normalised’ data into a more correct form.

So, the book-distribution business that uses the PUBS database has gradually grown organically, and has slipped into ‘de-normalisation’ habits. What’s this? A new column with a list of tags or ‘types’ assigned to books. Because books aren’t really in just one category, someone has ‘cured’ the mismatch between the database and the business requirements. This is fine, but it is now proving difficult for their new website that allows searches by tags. Any request for history book really has to look in the entire list of associated tags rather than the ‘Type’ field that only keeps the primary tag.

1437-Refactor.1.PNG

We have other problems. The TypleList column has duplicates in there which will be affecting the reporting, and there is the danger of mis-spellings getting there.

The reporting system can’t be persuaded to do reports based on the tags and the Database developers are complaining about the unCoddly things going on in their database.

In your version of PUBS, this extra column doesn’t exist, so we’ve added it and put in 10,000 titles using SQL Data Generator.

So we’ve refactored our PUBS database without pain. We’ve even put in a check to prevent it being re-run once the new tables are created. Here is the diagram of the new tag relationship

1437-ERD.jpg

We’ve done both the DDL to create the tables and their associated components, and the DML to put the data in them. I could have also included the script to remove the de-normalised TypeList column, but I’d do a whole lot of tests first before doing that. Yes, I’ve left out the assertion tests too, which should check the edge cases and make sure the result is what you’d expect.

One thing I can’t quite figure out is how to deal with an ordered list using this simple XML-based technique. We can ensure that, if we have to produce a list of tags, we can get the primary ‘type’ to be first in the list, but what if the entire order is significant? Thank goodness it isn’t in this case. If it were, we might have to revisit a string-splitter function that returns the ordinal position of each component in the sequence.

You’ll see immediately that we can create a synchronisation script for deployment from a comparison tool such as SQL Compare, to change the schema (DDL). On the other hand, no tool could do the DML to stuff the data into the new table, since there is no way that any tool will be able to work out where the data should go. We used some pretty hairy code to deal with a slightly untypical problem. We would have to do this migration by hand, and it has to go into source control as a batch. If most of your database changes are to be deployed by an automated process, then there must be a way of

  • over-riding this part of the data synchronisation process to do this part of the process
  • taking the part of the script that fills the tables, Checking that the tables have not already been filled, and executing it as part of the transaction.

Of course, you might prefer the approach I’ve taken with the script of creating the tables in the same batch as the data conversion process, and then using the presence of the tables to prevent the script from being re-run.

The problem with scripting a refactoring change to a database is that it has to work both ways. If we install the new system and then have to rollback the changes, several books may have been added, or had their tags changed, in the meantime. Yes, you have to script any rollback! These have to be mercilessly tested, and put in source control just in case of the rollback of a deployment after it has been in place for any length of time.

I’ve shown you how to do this with the part of the script ..

…which would be turned into an UPDATE … FROM script.

You’ll notice that it isn’t quite a round trip because the tags are in a different order, though we’ve managed to make sure that the primary tag is the first one as originally.

So, we’ve improved the database for the poor book distributors using PUBS. It is not a major deal but you’ve got to be prepared to provide a migration script that will go both forwards and backwards. Ideally, database refactoring scripts should be able to go from any version to any other. Schema synchronization scripts can do this pretty easily, but no data synchronisation scripts can deal with serious refactoring jobs without the developers being able to specify how to deal with cases like this.