Using database replication with automated deployments
For the SQL Change Automation team, it’s important that we take time out from development, occasionally, to explore some of the issues our customers face when automating database deployment. Following on from previous posts about cross-database and cross-server dependencies and production database drift, this article shares some of our thoughts about how to deal with database replication.
Database replication is the copying and synchronizing of selected data and database objects from one database to another. When making automated deployments to databases that are being replicated, you can easily run into problems. We experienced this recently when we tried to make changes to a table that was being replicated. In this article, we share some of our findings.
In the SQL Change Automation team, we were investigating issues that occur when making automated deployments on databases using replication. To do this, we created a system for a fictional company called Lightning Works. We used SQL Change Automation, and Octopus Deploy, to deploy two databases, a website, and associated services. The two databases were our Foreign Exchange database (Forex) and the LightningWorks database used to hold the application data.
We had two environments:
- Quality Assurance (QA), which we used for testing for bugs.
- Production, which was our live system that should always be working.
We set up replication on the production environment, such that changes to the Transactions table in the LightningWorks database were replicated to the OrdersReplica database. The OrdersReplica and LightningWorks databases were on different servers. Using transactional replication, we made the LightningWorks server the publisher and the OrdersReplica database the subscriber.
Issue #1: Drift
After enabling replication, we redeployed the existing schema in source control to the LightningWorks database. We expected this to have no changes and succeed. However, we could see from the SQL Change Automation change report there would be unexpected changes, so we aborted the deployment. It turned out we were dropping information needed for replication, because the changes made for replication were not in source control. This is an example of production drift.
You can see the SQL Change Automation change report below. It shows users and roles related to replication being removed as part of the deployment. Because they were being removed, they were no longer members of the db_owner role. This can be seen by the modification to the db_owner role. The change report also shows modifications to the Transactions table because one of the columns was now marked “NOT FOR REPLICATION”.
If we had gone ahead with the deployment, replication would stop working because the users required for replication would have been removed.
To prevent the loss of the data needed for replication, we filtered out the users and added the roles into source control. We also made sure -IgnoreUserRolesAndPermissions was set in SQL Change Automation. This meant users would not be dropped when making the deployment. We would manage users and role membership in the production database manually. To stop the change to the Transactions table being dropped when deploying, we included the change in source control.
This time when we tried to deploy to the production environment, the change report showed no changes. When we went ahead with the deployment, replication continued to work.
Issue #2: Replicated objects
We then wanted to make some changes to the Transactions table. The first change we made was to add another column. We made the change on our development database and checked it into source control. However, when we tried to deploy it to the production environment we got the error:
WARNING: Error 'You can only specify the READPAST lock in the READ COMMITTED or
REPEATABLE READ isolation levels.' occurred when executing the following SQL:
ALTER TABLE [dbo].[Transactions] ADD [TransactionDate] [datetime2] NULL
This error occurs because when you make changes to replicated objects, you must use either the Read Committed or Repeatable Read transaction isolation level. To fix the issue, we altered SQL Change Automation to use the Repeatable Read transaction isolation level. After making the change, the deployment worked. The change was also successfully replicated in the OrdersReplica database.
We tried making several other changes to the Transactions table including:
- adding a constraint to the table
- deleting a column (other than the primary key)
- changing a column to be nullable
- all of the above in the same deployment
When we deployed these changes they all worked and were replicated correctly in OrdersReplica.
Issue #3: Primary keys
We then tried to modify the primary key of the Transactions table. We added a column to make it a composite key. This worked on development but when we tried to deploy the change to the production environment, SQL Change Automation gave the following error:
WARNING: Error 'Cannot alter the table 'Transactions' because it is being published for replication.
Could not drop constraint. See previous errors.' occurred when executing the following SQL:
ALTER TABLE [dbo].[Transactions] DROP CONSTRAINT [PK_Transactions]
The problem occurred here because we were trying to drop the primary key: replication relies on the primary keys and so doesn’t permit any modifications to them. To fix it, we disabled replication while we made the deployment. After deployment, we re-enabled replication and reinitialized the subscribing databases. This made sure any changes made while replication was disabled would be passed on to any subscribers.
We didn’t want to turn off replication for every deployment. So, we looked into using migration scripts to disable replication, make the change, and then re-enable replication. We only wanted this to happen for changes that required replication to be disabled. We managed it by having the full disable and re-enable scripts in each migration script. We also added a guard clause so replication was only disabled if the current database was replicated.
These migration scripts quickly got nasty. We could see they would be difficult to maintain because every script needed to include the full script for disabling and re-enabling replication. Another problem was they couldn’t re-initialize subscribers that weren’t managed by the LightningWorks server, because the scripts ran on the LightningWorks database.
Instead of using migration scripts to turn off and on replication, we decided to use deployment steps, which by default would not be run. If replication needed to be turned off for a deployment, these steps would be run. If our replication setup changed in any way, we would still need to update these steps, but only in one place.
At this point, we had replication working in our production environment. Even better, we could now make changes that required replication to be disabled by using our deployment steps that turn off and on replication. We were able to set these to run only for changes that needed them. This was useful because disabling and re-enabling replication can take a long time for large datasets.
We wanted to check for any possible deployment issues before attempting to deploy to production. So we set up replication on the QA database. This allowed us to find any replication issues when deploying to the QA environment rather than the production environment, preventing the production environment from being in a partially deployed state.
Issue #4: Source control
If you have bidirectional transactional replication set up, then deployments work in the same manner. But there is a catch: you must be careful with the replication stored procedures in the deployed database. The best way to do this is to check the replication procedures into source control.
To check the procedures into source control, we would recommend replicating your development database and using SQL Source Control on the subscriber rather than the publisher. This means you need to have the same publication in development and production. To do this, you should work on the publisher. It also means you can spot any replication breaking changes during development rather than deployment.
Another solution would be to use another technology to achieve the same results as replication. For example, we could have a SQL Server Integrations Service (SSIS) project that monitors changes to the Transactions table and updates the same table in OrdersReplica. With this, there would be no replication issues, but now you would have another application dependent on the LightningWorks database.
In conclusion, most schema changes to replicated database don’t cause problems when being deployed. When they do, the only solution is to disable replication during the deployment. This can be automated, but shouldn’t be done for every deployment as re-enabling replication is slow for large datasets. And, as always, make sure any problems that could occur in deployment to production are caught as early as possible.
Do you have any replication war stories? If so, please share them in the comments below.