Versioning an existing database using SQL Source Control
How to bring an existing database under Git version control.
This article describes a few of the important planning considerations for database source control, and then provides a simple walkthrough of how a development team can implement database version control, for an existing database, using Git and SQL Source Control.
Maintaining version control for databases
Many development teams struggle with maintaining a reliable database change history. Our team was no different. It seemed we were forever asking questions such as “Who made this change?”, “When was the stored procedure last edited?”, “When was this view created?”, and so on.
Our initial approach to database versioning was, believe it or not, simply to store a list of the database changes for each version, in a Word document! I cannot count the number of times we forgot to update that document and missed an important component. As a result, pushing changes to production was often a time consuming and error-prone task.
None of this was a problem with our C# or JavaScript code, because it was all under version control, in Git, so the answer seemed simple: get the database in version control too! Finding the right solution was harder. It had to support Git, it had to integrate with SQL Server Management Studio, the tool almost every developer in the company uses to query SQL Server, and it had to be easy to setup.
After some evaluation, we chose Redgate’s SQL Source Control. It met all our criteria, and the first time I used it, I had my database under source control in under five minutes, without resorting to the manual for help. Nevertheless, getting database source control set up and running on your own laptop is one thing. Reaching the point where every development team is ready to use it is another. I had to learn some hard lessons to move from evaluation to company-wide adoption, at Farm Credit Services of America. Something new would come up each time a team started using the tool.
Planning for SQL database version control
Before embarking on database source control, a.k.a. database versioning, the team of course needs to consider the practicalities: which version control system would suit them best, which databases need to come under version control and what that entails, whether each developer works on their own local, ‘dedicated’ copy of the database, or all developers use one shared database, and so on.
In my experience, these are two of the most important initial steps to get right:
- Reconciling schema differences in existing versions of a database, prior to version control
- Making the right choice between the shared and dedicated database development models
I’ll only cover these two considerations in this article, but of course, the team also need to think about how they can co-ordinate their development efforts, once the database is under version control, how they will build and deploy new databases from version control, how to automate these processes, which branching strategy will best support their database deployments, upgrades, bug fixes and so on. You can find a good discussion of broader database versioning strategies and challenges in the eBook, SQL Server Source Control Basics.
Reconciling database schema differences
Before you even think of putting a database under version control, you need to reconcile all the schema differences between each environment.
Once a database is in source control, then the build scripts that define the state of each database object, for a given database version, become the ‘truth center’ for that database. If you’ve not reconciled the environments properly, and a mission critical table exists in one environment but its DDL script is not in source control, then the next time you deploy to the environment you risk that object being wiped out. Deleted. So, let me say this again, because this is the most important step:
Reconcile all schema differences between each environment before putting a database under source control.
When I first put a legacy application’s database under version control, I could not believe the differences I found between environments. Test tables existed in development. Column properties existed in development and test but not production. Views existed in production but not in development.
The hard part in this situation is determining which environment should be the environment of record. The production environment seems like the obvious choice, but we often had many database changes sitting in our final testing environment, fully tested, but waiting to be deployed. Generally, therefore, I chose Test as the environment of record, and made sure all other environments matched it, in terms of schema and static data, using SQL Compare and SQL Data Compare. Having done this, you’re then ready to get that instance of the database put under source control.
Choosing between the shared and dedicated models
You have two choices, dedicated or shared. Shared means everyone uses the same database on the same server, while dedicated means each developer has their own copy of the database running locally. Shared is a great place to start because lowers the barrier to entry due to the lower setup time, and the fact that a DBA can be responsible for all the configuration changes of which most developers are not even aware, such as security and file groups. That being said, I recommend the dedicated model (as does Redgate). It offers many benefits:
- Schema changes and code changes (C#, JavaScript) go hand in hand. This allows developers to get all the schema changes needed (table changes, stored procedures, views, functions, and so on) working locally before pushing them out.
- Supports branching strategies. A shared model only supports one branch, master. With shared you better hope all new changes made to your database are non-breaking changes or you will have issues. In one case, one pair of developers were making a change to the UI code, while another pair were renaming a database column. The rename took five seconds the code changes took a few hours. Without the new UI code in place, the column rename operation brought down anyone who was pointing their local code to that database. It should have all be done in a branch.
- Local data to mess up. If I mess up my database no one but me cares. I can easily recover by recopying data using SQL Data Compare or restoring from a backup. I don’t hurt anybody but myself or waste anyone else’s time.
Demo: source controlling AdventureWorks using SQL Source Control
Having taken care of the preliminaries, the actual process of version controlling an existing database, for a given team, involves the following steps
- Add the database to the master branch in source control
- Install SQL Server on each developer machine
- Setup the database on each developer machine (using database creation scripts)
- Bring changes down from source control and apply them to each development database
For this demo, I’ll assume you have SQL Source Control installed (I’m using version 5.2), and we’ll put the AdventureWorks2014
under source control. The version control system in this case is Git. I’ve put the project up on GitHub if you would like to see what everything looks like before getting started.
Step 1: Add database to source control
Let’s say you’ve established the AdventureWorks2014
database that you want to put into source control (such as the one in the test environment), and you’ve reconciled any schema differences in other environments.
Make one developer responsible for this initial step of getting the chosen database into the source control master branch. I saw one project where three developers had attempted to do this independently. The results were…interesting.
In step 4, later, I’ll show how all other developers can subsequently pull down the changes from master.
Link to version control system or to working folder?
If you right-click on the database you want to put under source control, AdventureWorks2014
in this case, and select Link database to source control…, SQL Source Control will open up and you’ll immediately need to choose between linking directly to the source control system, or linking instead to a working folder.
If you are using TFS or SVN then direct source control integration with SQL Source Control will work, just fine. However, GIT is rather unique because of the way it handles branches, and SQL Source Control doesn’t yet fully support some of the GIT functionality like complex merge conflicts. Critically, for us, there is no way to see easily all pending changes (C# and database) not checked into GIT. We store our application source code (C#, JavaScript) in the same repository, right alongside the database. This is key for us because it means we can bundle all related changes into one commit.
As such, I recommend to the developers on my team to use the “working folder” option (see Figure 1). By checking into the working folder the developers can then follow their typical workflow, using their GIT tool of choice to commit, update, push and pull those changes. It is treating the database just like C# or JavaScript code.
The next step is to specify the working folder location. The team should have agreed up-front on a folder structure for the source control repository (which of course the working folder will mirror). Our standard is as follows:
- GIT or Mercurial:
/[RootSourceControlFolder]/Database/[DatabaseName]/ - All Others:
/[RootSourceControlFolder]/[BranchName]/Database/[DatabaseName]/
Enter the working folder location, and below that select the Shared database option, since we are establishing the database in the master branch of source control, to which all developer’s will have access. Subsequently, in step 5, all developers will link to this shared location, pull changes down to their local machines, and synch their local, dedicated databases.
Commit changes to the working folder
Click on the Commit button on the top menu, in order to commit all the changes to the working folder. SQL Source Control will present a list of all database objects that exist in the AdventureWorks2014
database but not in the working folder, or that exist in both but are different. In this case, since the working folder is currently empty, you’ll simply be reviewing the CREATE
scripts that will be created for each object in the working folder.
Select all the objects by checking the very top checkbox and hit Save changes.
The Commit screen should now register no changes to be committed to source control, and your working folder should look like as shown in Figure 4. SQL Source Control, has generated object creation script for all the tables and code objects in the database, automatically categorized according to object type.
You can now use your version control tool of choice to commit the changes to the local repository, and then push them to the remote master branch, where the rest of the team can access them. In this example, I am using GIT as my version control system and GitKraken as my GUI.
Finally, if each developer will pull from here, in order to synchronize their own dedicated database, then you will want to unlink this database from source control.
From a developer’s perspective, putting the tables and all database code objects into source control, along with any necessary static data (not covered in this article) is the major requirement. However, just bear in mind that SQL Server has a lot more components, such as ETL scripts, SQL Jobs, and so on, which you should also consider versioning. The ultimate objective is to be able to recreate a fully operational database system, at any version, just from what’s in source control.
Step 2: Script local SQL Server setup
If you use the dedicated database model, then each developer will need to install SQL Server on his or her local machine. For uniformity, I highly recommend creating a .bat
file that will perform the install. Microsoft provides some excellent documentation on how to do this. I recommend putting this in a shared location or in source control. The creation folder (/[RootSourceControlFolder]/Database/[DatabaseName]/creation) works fine for this.
Step 3: Script database creation
To establish a local working copy of the AdeventureWorks2014
database, synchronized with the master source control repository, each developer first needs to create an empty local copy of the database. SQL Source Control will not create the database creation scripts for you. It’s a good idea to generate a basic database creation script in SQL Server Management Studio and then place it somewhere where all developers can access it, such as:
/[RootSourceControlFolder]/Database/[DatabaseName]/creation.
Each developer should open this script and run it.
Step 4: Synchronize local database with source control master
The final step is for each developer to synchronize their local database to source control master branch. Having pulled from master (remote) to their local repo, and then updated their local working folder, each developer can then link their local database to this working folder.
The steps are very similar to those outlined earlier. Each developer will choose to link to a working folder, specify its location, but will choose the dedicated development model.
You will be prompted to get the latest changes and clicking on Get latest will reveal a list of changes that will synchronize the local database with the working folder.
Unfortunately, you’ll be presented with a warning, shown in Figure 10. SQL Source Control (which uses the SQL Compare engine under the covers) wraps the synchronization script within a transaction, so that it can roll back all changes should something go wrong. AdventureWorks
uses a Full-Text Catalog and one of the quirks is that a full-index cannot be created in a transaction.
In this case, the JobCandidate
table has a full-text index to support full-text queries against the Resume
column. As the warning indicates, SQL Compare will deploy the full text index changes after the transaction completes, but if changes to objects that reference that index are included in the transaction, then the synchronization script may fail.
If you click Acknowledge, the synchronization will fail, because the uspSearchCandidateResumes
stored procedure uses the full text index on this table to search within the candidate resumes.
There are a couple of ways around this sort of problem. If you have a lot of objects that reference a full-text index, you might just consider running the whole synchronization script outside a transaction. To do this, click Cancel instead of Acknowledge.
Go to the Setup menu, click Comparison options, and then check the Don’t use transactions in Deployment scripts option, and then hit Save.
When you rerun Get latest, you’ll receive a high-level warning about the lack of transactions, and advice to back up the database before you start (it’s good advice!). Once the synchronization is complete, remember to re-enable transactions for deployment scripts.
An alternative, if there are only a few dependencies, is to exclude from the initial synchronization any objects that reference that full text index, in this case uspSearchCandidateResumes
.
On applying changes, you’ll receive the same warning, as shown in Figure 10, but this time the synchronization will succeed. When complete, the Get latest screen will report that uspSearchCandidateResumes
is the only object that exists in the working folder but not in the database. Simply run Apply changes a second time to complete the synchronization.
Smoothing the path to team adoption of SQL versioning
Schedule a meeting and have the developers bring their laptops. When it was time for my team to start using the tool, I took two different approaches. My first approach was to send an email with step-by-step instructions. That had a 40% success rate. I ended up at a lot of developer’s desks helping them setup. For the next database I scheduled a meeting. The success rate shot up to 90%. When you are scheduling the meeting for the developers on the team an ideal set of goals should be:
- Install SQL Server
- Install Red Gate SQL Source Control
- Create their local databases, using database creation scripts
- Bring changes down from source control and apply them to the database
- Do test commits.
It takes about an hour to do all of this and by the time the developers walk out the room they should be able to make changes and check them in.
Conclusion
Getting our databases in version control has solved a lot of problems for us. We no longer wonder when a change was made or who made it. We know exactly what the changes need to be deployed to production. And finally, it has allowed us to start making our database changes locally, which in turn has allowed us to adopt a modified version of the Gitflow branching strategy.
Redgate’s SQL Source Control makes this all very easy. The hardest part is getting all the prep-work done. I have put multiple databases under source control, all with varying complexity, some with a few tables, and others with 100s of tables. It now takes me about a couple of hours to a day, from start to finish, with most of that work being prep-work.