E ntity Framework (EF) provides an interface that makes a relational database look to like list of linked .NET classes that represent the domain-specific objects. As a developer I have used a number of database access frameworks, such as ADO.NET and LINQ to SQL, and more recently EF. Of all the frameworks I have used, I find that EF is the most productive, with a robust and comprehensive set of features.
However, before EF can be used for real-world applications, it needs to meet all the rigours of performance and deployment. In another of my articles I have talked about tuning the performance of EF. In this article I want to talk about the issues of deployment into production, especially around updating the database schema, i.e. the structure of how the data is stored in the relational database.
This article comes out of my latest project, which is an e-commerce web site using ASP.NET MVC with EF and is running on Microsoft Azure. I did not feel confident that EF’s built-in data migration feature was robust enough for a deploying database changes to a live, production site. This led me to study the problem to find a better answer, and I thought that my research and proposed solution might be useful for others.
The challenges of database updates
There is a great deal of literature that describes the problems of deploying and updating databases, and the Simple-Talk web site has a whole section on Database Lifecycle Management. Therefore I am only going to give my brief perspective on the database update challenge.
Databases contain data, and that data is often very valuable to someone. In the e-commerce site that I am working on, the database holds customer orders which they have paid for; and they expect them to be delivered. If we want the business to succeed, then that data had better not be lost or corrupted.
The problem is that, as the e-commerce business grows, it is very likely that there will be changes to the type, format and structure of the data held in the database, known as the database schema. It is normally simple to make changes such as adding a column to hold a comment, as the new table will work with the existing software. The difficulties come with more complex additions such as a new table which links to existing data, i.e. in SQL terms it has a foreign key.
Updates also become especially challenging if you want the existing web site to stay ‘live’ while you change the database, so that the old software still works while you are updating its schema and data. In the diagram below we see a typical deployment via Azure’s staging slot.
The point about this arrangement is that the staging slot holds the new version of the software, which is accessing the same database as the live site. When the new version checks out, the DevOps person then swaps the staging-live site. The users should see no interruption to their use of the site.
The live update requires that the database schema must be updated while the existing version of the software is still running. This means that the database must support both the old and the new software until the swap is done.
My requirements for the database deployment process
In order to reliably deploy a database, the process must be
- Testable: I can test any database change before running it on the Production database.
- Automated: I can automate the whole process so that I can’t get it wrong.
- Trackable: Each database should have a log of what has been done to its schema.
- Atomic: The update process must either be completed successful or entirely rolled-back.
- Recoverable: Each update should automatically make a backup in case the worst happens.
- EF-compatible: The database changes have to work with Entity Framework.
- Comprehensive: There must be Full SQL access; I needed to change tables, views, stored procedures, constraints etc.
Why I decided EF’s standard migration feature was not sufficient
I am very familiar with EF’s database migration feature and I have used it in the early stages of developing an application. However, it’s this experience that made me decide it wasn’t sufficient for this current project. The main reason is that it is not very testable in that it is hard to run the update on a test database. A second reason is that I had come across problems when applying migrations to a Production system – the update runs as part of the application start-up and there is only limited feedback of errors.
Note: there is a way to run EF migrations from a command line, which does improve the process, but the problem of testability still remains.
All in all, I just didn’t feel comfortable with using EF’s own database migration feature, so I set out to find a better way. After a lot of searching I found a package called DbUp.
Database Migrations the DbUp way
DbUp is an open-source, .NET-based tool for running change-scripts on a database. DbUp is elegantly simple, and uses an “Apply These Scripts…” approach, i.e. you write a list of scripts, SQL or C# based, with a name that sets the order, e.g. Script001, Script002 etc. You can call DbUp via a Console App or directly from PowerShell. It uses a special table in the database to see which scripts have already been run and runs any scripts that are new.
Here is an example of using DbUp from my application:
var upgrader = DeployChanges.To
var result = upgrader.PerformUpgrade();
Looks simple, but there are lots of little things in DbUp that shows someone has really thought the process through. I have to admit I found the documentation a little obscure in places, but that is because I was most likely thinking in an EF way. Once I got me head around it I started to appreciate the quality of DbUp.
A couple of pointers for people that are used to EF Migrations:
- DbUp only does forward changes, i.e. there is no DOWN script like in EF migrations. The philosophy of DbUp is that, at each stage, you are transitioning the database to the new state. So going back is simply another transition. Considering that I have only used an EF DOWN script about once in four years then I don’t think this is a problem.
- There is no ‘Run Seed Every Time’ approach like EF. When you think about it you run one seed at the start why do you need to run it again? In DbUp if you want to change the data in the seed you just have a new script to update or add to the original seed.
Note: DbUp does have a way of running a script every time if you what to, see DbUp NullJournal.
So, DbUp coupled with some sort of automation and SQL backup/Snapshots ticks off 6 of my 7 requirements. This only leaves the problem of making sure the migrations are compatible with EF, which I describe next.
Matching Entity Framework changes to Database changes
EF is great to develop with as it does a number of clever things in the background to help developers. It is designed to make a relational database look like a set of linked classes. This is very clever, but to do this, EF’s view of the database relationships must match the reality of what is in the database. To understand why this is problem then I need to describe how EF knows about the database schema.
How EF finds out about the database schema
EF caches its knowledge of the metadata of the database. EF has what it calls the ‘database model’ that holds a view of what it thinks the database schema is. This is a set of metadata that it builds at application start-up from the EF DbContext class and the associated classes referred to by your DbContext. There are therefore two main ways of telling EF what the database schema looks like.
- EF uses your code to define the database schema. In this mode, the developer normally defines or designs the classes and configures a special class called DbContext, which EF uses to calculate the EF database model, i.e. its view of what the database schema should look like. It then uses what EF calls a ‘database initializer’ to create the database. There are then ways to update the database if you change the EF database model: see Code First Migrations.
- EF imports the schema of an existing database to create the data classes.The other way is to allow EF to build the classes and DbContext based on the schema of an existing database. It does this by a one-time scan of the database schema and using templates to build the classes and DbContext (see another of my articles which describes this in detail). If you change the database then you can either re-import the database schema again, or there is also a way of using Code First Migrations with an existing database to handle update.
The important thing to see is that, other than the one-time scan when the schema is imported, EF never looks at the schema of the database, even when it is doing migrations. It just assumes the database in the state that EF’s metadata model says it is. (For more on how this works under the hood then see Max Vasilyev’s interesting article about how EF migrations are calculated and stored).
EF will only find out there is a mismatch between its view of the database schema and the actual database schema when it accesses the part of the database that does not match. I have not tried all possible combinations, but the error I got when I added a properly called ‘EfOnlyProperty’ to an EF class without a corresponding column in the table was:
System.Data.Entity.Infrastructure.DbUpdateException : An error occurred while updating the entries. See the inner exception for details. ----> System.Data.Entity.Core.UpdateException : An error occurred while updating the entries. See the inner exception for details. ----> System.Data.SqlClient.SqlException : Invalid column name 'EfOnlyProperty'.
In addition, when EF creates a database, it can add some hidden tables in order to implement many-to-many relationships (see my article on how EF handles many-to-many relationships). This means that any changes to the database must include these extra tables for EF to work properly. If a class called ‘Parent’ has a many-to-many relationship with a class called ‘Child’ and the linking table is missing then you get this error.
System.Data.Entity.Infrastructure.DbUpdateException : An error occurred while saving entities that do not expose foreign key properties for their relationships. The EntityEntries property will return null because a single entity cannot be identified as the source of the exception. Handling of exceptions while saving can be made easier by exposing foreign key properties in your entity types. See the InnerException for details. ---->
System.Data.Entity.Core.UpdateException : An error occurred while updating the entries. See the inner exception for details. ---->
System.Data.SqlClient.SqlException : Invalid object name 'dbo.ParentChild'.
I needed a EF/SQL sanity checker
The idea of changing my database and having EF out of step was unthinkable. If I wanted to take over the handling of database migrations and continue to use EF then I needed to make sure that EF’s database model is in step with the actual database schema.
I could build the database using SQL and use EF’s import existing database method described earlier. However I know from experience that it can be a pain when database updates come along and you need to re-import again. Also EF’s database import uses set generic property names for relationships and exposes every relationship, which isn’t ideal for good programming practice.
So I wanted to use EF Code First approach yet define the database through T-SQL. That meant I needed something to check that my EF/software view of the database schema and the actual SQL database schema were in step. Ideally this would be something I could run as part on my Unit Tests so that any mismatch shows up immediately.
My solution to comparing EF/SQL databases
I therefore created some software that compared the EF database model metadata against the actual SQL schema as read from a database. This method runs as a Unit Test and reads the EF database model metadata and reads the actual SQL database schema. It then compares the following:
- Each table: does a table exist for each EF class?
- Each column: does a column match EF class properties in name, type/nullable, size, primary key(s) and key order?
- Each relationship:
- Do SQL foreign key constraints exist for each EF relationship?
- Have the extra many-to-many tables that EF used been configured properly?
- Are the Cascade Deletes the same between EF and SQL?
If any of the above is out of step then the Unit Test fails and outputs useful error messages. At the same time it also produces warnings for tables and/or columns in the SQL database that EF does not use. These should not cause a problem to EF, but might show something that EF has missed.
Here is an example of a call to the
CompareEfWithDb method to check that the EF’s model matches the schema of the database that the ‘YourDbContext’ is connection string points to:
using (var db = new YourDbContext())
var comparer = new CompareEfSql();
var status = comparer.CompareEfWithDb(db);
//status.IsValid is true if no errors.
//status.Errors contains any errors.
//status.Warnings contains any warnings
In my Unit Test, I fail the test if status returned is ‘not valid’ and I print out any error messages. I also tend to fail the test on warnings too, as it often points to something I have missed.
I developed a variant of the
CompareEfWithDb method which takes an extra parameter with a connection string to a separate database. This allows a check of EF’s database model against any database schema, e.g. my production database. By running a test like this before deployment it is possible I make sure that the production database is at the right level for the new software to work on. This extra test gives me more confidence that the deployment will succeed.
Here are two examples of the type of error output I get. The first is a simple example is of a missing column in the database. The error message from calling the
CompareEfWithDb method is:
Missing Column: the SQL table [dbo].[Child] does not contain a column called EfOnlyProperty. Needed by EF class Child.
The second example is a class called ‘Parent’ which has many-to-many relationship with a class called ‘Child’ and the linking table is missing. The error messages from calling the
CompareEfWithDb method are:
Missing Link Table: EF has a Many-to-Many relationship between Parent.ManyChildren and Child but we could not find a linking table with the right foreign keys.Missing Link Table: EF has a Many-to-Many relationship between Child.ManyParents and Parent but we could not find a linking table with the right foreign keys.
CompareSqlToSql for a fuller picture
Although I found
CompareEfWithDb to be very helpful, it doesn’t quite cover everything. As time goes on I am planning to move some of the more complex access to SQL Stored Procedures (SPs) to both gain better performance and facilitate the decoupling of the database from the software. EF can use SPs but they don’t appear in the EF database model, so
CompareEfWithDb couldn’t help.
Having created all the code for CompareEfWithDb, it was pretty simple to create a SQL-to-SQL compare, called CompareSqlToSql. This could check one database against another, and can include a simple comparison of other SQL features. It does a more detailed comparison than
CompareEfWithDb because at the SQL level there is better access to all aspects of the database. It can also check non-EF parts such as SPs and their parameters.
Note: You still need good integration and system level testing to catch anything that these tools miss.
Using SQL compare to understand what EF is doing
I should say that
CompareSqlToSql has proved to be much more useful than just checking SPs. It turns out that EF is somewhat forgiving if you make mistakes when you use EF’s Data Annotations or EF’s Fluent API to alter the way the database works. For instance if you configure a required to optional relationship but give it a non-nullable foreign key it will add its own hidden nullable foreign key. In fact I found about three errors like this in my 22-table application, all around relationships.
If you want to create a database that the EF framework can use, you need to fix these errors in your code, or at least replicate what EF has done for the database to work.
CompareEfWithDb won’t spot them for you, but comparing an EF-generated database with your script-generated database will find them. To do this you need to:
1. Create a new database to create a SQL database based on EF’s database model (see code below). We will call it MyEfDb.
using (var db = new YourDbContext(EfDatabaseConfigName))
2. Then create a database using your scripts and DbUp – see earlier section ‘Database Migrations the DbUp way’. We will call it MySqlDb.
3. Now run
CompareSqlToSql (or another compare scheme tool – see below) with MyEfDb as the source database and MySqlDb as the target database.
You can spot EF corrections to your minor configuration mistakes mainly by the column names, which normally contain the name with a _ in the middle. I have also found places where the Cascade delete option was different, again through me misconfiguring a relationship.
Using a SQL comparison tool to build your scripts
The other reason for doing a SQL schema comparison of an EF-generated database with your current script-generated database is to build the scripts you need to update your database after you have made changes to the EF classes. There are a number of SQL schema comparison tools out there – see this useful list on stackoverflow, although it is a bit old.
If you don’t have one of these, then the output of
CompareSqlToSql will show all the SQL differences, but not in a nice SQL script way. For instance if we repeat our many-to-many example with a class called ‘Parent’ which has many-to-many relationship with a class called ‘Child’ and the linking table is missing. The error messages from
Missing Table: The 'MyEfDb' SQL database has a table called [dbo].[ParentChild], which is missing in the MySqlDb' database.Missing Foreign key: The 'MyEfDb' SQL database has a foreign key Parent: ParentChild.Parent_ParentId, Referenced: Parent.ParentId, which is missing in the 'MySqlDb' database.Missing Foreign key: The 'MyEfDb' SQL database has a foreign key Parent: ParentChild.Child_ChildId, Referenced: Child.ChildId, which is missing in the 'MySqlDb' database.
Note: If anyone is interested in the
CompareSqlToSql Unit Tests I could make the package publicly available. However the first version was a massive 2-day hack and does not handle all possible EF combinations, such as TPT and TPH inheritance, complex types etc. It therefore needs a lot more work before it can be released for general use.
Telling Entity Framework that you will handle migrations
Finally, we must prevent EF handling database changes when you change your EF code. If you don’t turn this off then when you change EF database classes, then you’ll discover that EF will block you from running the application until it has updated the database, using whatever EF ‘Database initializers‘ is set up (EF’s default initializer is CreateDatabaseIfNotExists) .
To stop EF trying to handle migrations we have to do is provide a null database Initializer. There are two alternative ways of doing this:
- Call ‘Database.SetInitializer<YourDbContext>(null)‘ at startup
- Add the following to the <appSettings> part of your web/application config file:
<add key="DatabaseInitializerForType YourApp.YourDbContext, YourApp" value="Disabled" />
See this useful documentation for more in null database Initializers.
Putting it all together
So, having got DbUp, and written my
CompareSqlToSql Unit Tests, I have my key parts. So, let me go through each of my points in my earlier specification and see how they are fulfilled.
With DbUp callable from Unit Tests or manually it is now possible to create database at a certain state very quickly. This provides each developer with their own database and also allows temp database to be created quickly for checking against, or for developing and testing new scripts. Also the CompareEfWithDb/CompareSqlToSql methods allow checking that these scripts have added the desired changes.
DbUp can be run inside PowerShell, or a package designed for database deployment system like Octopus (see free community version for small projects), Redgate’s DLM Automation (paid for) etc. By automation the process you significantly reduce the danger of making a mistake when deploying a database fix under pressure.
DbUp adds a table to each database it runs on which has a row for each script that was successfully run, holding the name of the script and when it was run. These scripts should be saved as part of your applications code source control, or some of the more sophisticated deployment automation systems provide their own script source control system.
DbUp version 3 has a .WithTransaction() option, which applies one SQL transaction across multiple scripts. This means that at the end of calling DbUp the database is either successfully updated or, if there is an error, the database is in the same state it was before the DbUp tried to update it.
This is covered by the standard SQL Backup/Snapshot features. This backup should be part of your automation so that you don’t forget it.
6. EF compatible
By using the CompareEfWithDb and CompareSqlToSql Unit Tests, a developer can continue an EF Code-First approach and the Unit Tests will show any mismatch between EF’s database model and the SQL database schema. Additionally an extra Unit Test can check other database, such as production, to ensure that the developer is aware of what needs to happen before they try to deploy new software.
DbUp uses T-SQL scripts to update the database schema and data. Therefore you have full access to all the power of SQL commands, plus the ability to run C# code when that is a more suitable approach
Any down sides to this approach?
I always like to consider any down-sides of a system I plan to use. Here are the negatives I can see to the approach described above:
- Requires you to use/learn T-SQL to write your DbUp scripts
EF Migrations hides the SQL commands it uses behind a thin API. Most other database migration packages, including DbUp, primarily make use of T-SQL. If you don’t have a DBA (Database Administrator) then the developer will need to learn T-SQL.While I am not a DBA, I have written quite a bit of T-SQL in my time. In fact I rather prefer T-SQL to configuring a database through EF’s fluent interface, which seems to have some rather oblique method names like .WithRequiredPrincipal() or .WithRequiredDependent(), which can be a bit confusing.
- My Unit Tests may miss an important change to the database
It is not easy to keep EF’s database model and actual database schema in step. I am relying on my Unit Tests to spot any differences, but if it misses anything I could have a problem. I have obviously tested CompareEfWithDb, but I know there are some holes, e.g. where I make a mistake in setting up a relationship which EF then reinterprets in a way I did not expect. I therefore run the CompareSqlToSql method between a SQL database created by EF against a SQL database created by my DbUP scripts. This provides a definitive comparison between what EF thinks the database schema should be and what I have written in my update scripts. Finally I need good integration and system testing to make sure everything works together as expected.
- Database updates in a team environment can be challenging
When several people might be changing the database schema then a change made by one developer might clash with a change from another developer. This problem isn’t unique to the approach I defined above, but it is worth mentioning. However bigger projects normally have a DBA who will coordinate/police any changes and thus control this problem.
What I have not covered
This article is already very long (thank you for reading this far!) so I have left out the actual process of how one might go about changing the database. I would recommend Edward Elliott’s article Non-Breaking Online Database Deployments as a good starting point.
This article describes why I really like EF for quick software development. However when I looked at EF’s own data migration approach I felt it was not suitable for the e-commerce live production site updates. To help you understand why that is I described how EF handles changes in the database schema.
To meet my requirements for database deployment I needed to find various tools to fill the slots. My primary solution was to use a change-script deployment such as DbUp which allows you to run T-SQL scripts, or .NET code, to update the database. DbUp coupled with some automation package and SQL Backup/Restore ticks most of my specification points. I should point out that DbUp is not a unique approach and many other packages, free and paid for, are available to do this job.
However there was one tool I could not find – that was something to help me keep EF’s database model in step with the actual database schema. My solution was to build two Unit Test tools: CompareEfWithDb which checks that EF’s database model against the SQL schema and CompareSqlToSql to do a more comprehensive SQL schema to SQL scheme check. While I still need good integration and system testing, these Unit Test tools allow me to find and fix problems early in the development cycle, and hopefully keep me safe when it comes time to do that big update.
Note: If you are interested in using the CompareEfWithDb and CompareSqlToSql Unit Tests then please let me know. I won’t be able to get a release out of this package for some time as it needs a lot of extra work to support all EF’s features and I am busy working on another project.