Catching Bad Data in Entity Framework

Any website that rejects the users' input without giving enough information to correct what they're doing is doomed to be unpopular. Entity Framework offers three different ways of validating data before writing it to the database. As well as describing how to harness these validation methods Jon Smith shows how capture these errors to make the feedback to the user less cryptic.

Most of the time, bad data gets to a system because our users get things wrong with data entry. For them, we need to provide meaningful error messages to help them quickly correct their input otherwise they will get annoyed or, on ecommerce sites, they may give up and not buy anything. There are other ways that bad data will arrive: it could mean an attack by hackers, for example. Whatever the cause, we need to prevent bad data reaching the database.

Entity Framework’s (EF) data validation is part of the solution for catching bad data in an application. By default, EF validates all data before it is written to the database, using a wide range of data validation methods. However, because EF comes after the user-interface data validation, its error feedback is often relegated to catching any exceptions that  EF throws and showing a generic ‘there was a problem’ message. I want to show you how you can use EF’s data validation to improve your error checking and how to pass EF’s error messages back to the user. That way you have a cleaner database and, hopefully, less frustrated users.

The many levels of EF’s Data Validation

In EF there are four possible checks that can be applied to data before it is written to the database. Three of these are inside EF and are easy to set up while the last one involved capturing errors reported by the underlying database.

The four EF data validations are:

  • Data Annotations, e.g. [MinLength(100)] to check the length of a string.
  • IValidatableObject. This is a method where to can code your own checks.
  • ValidateEntity. A DbContext method that allows access to the database while checking.
  • DbUpdateException Capturing database-generated errors

By default, EF checks for all three of these data validations on any entry that has been added or modified. If found it will run the requested validations and only commit the database update if all checks pass. 

To help you understand how to use each of these techniques, let me go through each of them in turn with an example.

Data Annotations

Data annotations are attributes that add validation rules to a property or class (see System.ComponentModel.DataAnnotations). These data annotations define rules that a separate validation stage checks against the current content of the property. Many of the .NET frameworks include validation via data annotations, like Windows 8 apps and ASP.NET MVC. (see Dino Esposito’s article on data annotations in MVC https://www.simple-talk.com/dotnet/asp.net/asp.net-mvc-annotated-for-input/). Like these other .NET frameworks EF also validates data against any data annotations before it commits that data to the database.  

Below is a simple example of a database to hold blog posts where we have a Tag class.. Tags are often added to Posts to indicate which topics the post is about, e.g. ‘SQL Server’ or ‘EF6’. I have also included a property called Slug, which in this context is the part of a URL which identifies a page using human-readable keywords.

The data annotations are the items in square brackets above the properties it relates to. [MaxLength] is fairly obvious, in that it sets the maximum length of the string (there is also a MinLength too). [Required] means that the property cannot be blank and [RegularExpression] allows more complex tests on strings.

I should also point out that in EF Code First, where you define the database through code, some of these attributes also define the table structure, e.g. [Required] means the column will not be NULL and [MaxLength] sets the size of the string in the database.

IValidatableObject

IValidatableObject is an interface that can be applied to any class. You then add a ‘Validate' method which then allows more complex tests to be written than can be achieved the data annotations. Again, the same validation stages in MVC, EF etc. that check data annotations will also validate any classes that have the IValidatableObject interface. Let’s look at a simple example by adding a Validate method to our Tag class:

As you can see. you have access to the properties of the class instance you are checking. In this sample test I check that the slug relates to the Name of the tag, so a Tag with a Name of “News HeadLines” could have a Slug of “news”, but not of “ReadAllAboutIt”.

As you can imagine, you can write almost any test you like as long as it only needs to access members of the class. I have personally used it to check some complex data where the state of one property in a class depends on the state of another.

The really nice thing about IValidatableObject is because many other .NET frameworks use this, such as ASP.NET MVC then any tests you write here are also checked closer to the user interface, so feedback is quicker.

ValidateEntity in EF’s DbContext

ValidateEntity is more complex than the previous validation procedures, but it can check things that nothing else can. Therefore it is a good tool to have in your armoury. If you are using Code First EF then you will be defining your own DbContext to put your DbSets in. It is in your DbContext that you can override the ValidateEntity method and add your own tests. The powerful thing about ValidateEntity is that you can access the current DbContext, which means you can look at the database when doing your validation.

Below is a check to see if the Slug in our Tag class is unique, which is actually something you would want to do on a real Tag if you were using the slug to make a unique URL.

If you look at the test below you can see that I have to check that the entity is a Tag and that it is being added or updated. If it is then I can check that no other tags, other than itself, have that slug.

One of our readers has pointed out that this check is insufficient in a multi-user environment because the test and commit are not simultaneous. This is quite true and the database would need some form of UNIQUE constraint to ensure the database was correct. In the next section I show how database generated errors can also be captured, although the error messages may not be quite as clear as the one above.

DbUpdateException during SaveChanges

EF works on top of a database and that database will have its own data integrity rules, which are the ultimate arbitrator of whether the database is changed. When EF’s SaveChanges() or SaveChangesAsync() methods are called if the database encounters any errors then the whole commit is rolled back and the errors are passed back via an exception. By capturing this exception we can gain access to the database specific errors. However it is quite complex to unravel this error information so let me describe the parts first, using SQL Server as our target database provider.

Because EF is designed to work with multiple database providers then the structure of the System.Data.Entity.Infrastructure.DbUpdateException is layered.

  1. DbUpdateException: This says that saving changes to the database failed. It contains a list of all entries that were in this failed commit. Its inner exception is System.Data.UpdateException.
  2. UpdateException: This says that the modifications could not be persisted to the database. Its inner exception is dependent on type of database it is connected to. In this example SQL Server returns a System.Data.SqlClient.SqlException. A different exception type would be returned for other database providers such as MySQL.
  3. SqlException contains information on all the errors that occurred inside the SQL database. There are a number of ways of accessing this information via the SqlException's Number, Errors properties or the more general exception Data property.

There are various problems with decoding these errors. First and foremost these are internal database errors and therefore are a) not user friendly and b) may contain information on the internals of the system which we do not want to reveal. For this reason we need to interpret them and turn them into useful messages. However this can be a challenge as, unlike the three previous approaches, here we cannot be sure exactly what entity or property each error is linked to.

Below is an example that traps two types SQL server error. The first, 516, is a general constraint error that can be triggered by a number of issues, for instance when the user tries to delete a row that another entity links to by a non-nullable foreign key. Because it could happen in a number of ways it is difficult to come up with a good user error message. The second, 2601, is a much more specific error about a duplicate key so we can do a much better job with the error message.

These two sql errors show some of the challenges of decoding database errors and turning them into useful user feedback. There are multiple ways we could improve the code above to provide better error messages but as good project leaders or developers we need to look for the best effort/reward. I would suggest that sometime it would be easier to add duplicate EF checks before we commit, like the test for uniqueness of the Slug in the ValidateEntity section, as these have more information with which to produce a useful message.

Why should I bother passing EF errors back to the UI?

Most of the examples I see of using EF with ASP.NET MVC normally just lets DbContext‘s SaveChanges() throw an exception and show a generic ‘there was an error’ message. However in my applications I catch the EF errors and show them to the user. That way the user gets more meaningful error message that they can act on. This is more work for the developer, so let me make the case as to why you should bother.

Firstly, if you add tests using DbContext‘s ValidateEntity then your specific error messages will be fed back to the user. In our case a message saying “The Slug on tag ‘xxx’ must be unique.” is a pretty important message for the user to see.

However the main reason I pass back EF error messages is because I often have to use Data Transfer Objects (DTO) to ‘reshape’ data between the database and the user. The side effect  of using DTOs is that some errors are only found when EF tries to update the database. In that case I need to make sure the useful error messages are not lost, but passed to the user.

DTOs are a common way of solving the mismatch between the database orbusiness objects and the user interface. This isn’t the place for a full explanation of system design patterns and DTOs (see Dino Esposito article on DTOs by Dino Esposito http://msdn.microsoft.com/en-us/magazine/ee236638.aspx). Instead I will give you an example that I hope shows you why DTOs are useful and why they mean some errors are only caught by EF.

Below is a class definition of the Post database entry. This post class has links to two other data classes: The Blogger of the post, which is the Author, and the Tags which have been assigned to the Post.

Now if we want to allow an authorized user to edit the Blogger and Tags properties of a Post then we need to add some more information, and maybe hide the Blogger and Tags properties from the UI as they can’t be displayed. We therefore  create another class like I have listed below. This type of class is often called a DTO.

As you can see, we have taken the original Post class and added two new properties, Bloggers and UserChosenTags, and removed the Post’s Blogger and Tags properties. The Bloggers list is filled in with all the possible authors and the UserChosenTags list is filled in with all the possible tags. Once the user has chosen the Blogger and Tags the DTO converts these back to the Post’s BlogId and Tags properties and then EF can save the changed Post entry.

This is a very typical operation in any application which needs a DTOs or similar adapter to link the user interface to what is in the database. However the effect is that the actual Post class is not seen by the user interface so any data errors inside the Post class, like ensuring there is a valid author attached, will only be caught by EF when it write the data to the database.

In fact for the analytical applications I work on I would say 95% of my data goes through DTOs. This very often means there are properties that exist in the data classes that are never seen by the user interface layer, so they cannot be validated at the user level. Therefore the first check on those properties is done by EF, and it might find a problem. This is why I always pass EF validation errors back to the user interface so that the user gets a meaningful error message they can act on.

Passing EF errors back to the user interface

Having made the case for passing EF errors back up to the user I will now describe one way this can be implemented.  BeBelow I show a new method inside DbContext called SaveChangesWithValidation, which returns a status. This needs two methods, the SaveChangesWithValidation and a supporting class I have called EfStatus. The code is as follows.

And a the new method SaveChangesWithValidation inside your DbContext

As you can see, the EfStatus stores the errors in the standard ValidationResult format. I use this format because data annotations and IValidateableObject produces ValidationResult, so your application most likely uses this type already. In my ASP.NET MVC applications I add  a method that copies the errors to the model state of the class to be reshown to the user so that the EF errors appear on their screen.

Summary

It is important to check that the data that you want to write to the database is correct. EF provides a number of independent features for making comprehensive checks on data before it is written to the database. Now you know what options you have for data validation in EF, you might like to consider what data would be a problem if it was written to the database. When you’ve determined what should be prevented from reaching the database, you can choose the appropriate approach to stop that vulnerability. A properly-written SQL Server database will have its own system of constraints as well, but this would never be a substitute for what I’ve described. Both together provide defence in depth against bad data.

I believe I have made a case for passing the error messages that are generated by EF’s validation back up to the user interface. If you are working on an existing project, then  that might be hard to retrofit: On the other hand a little thought should allow you to design this into  new projects using methods like those listed in this article.

You may be interested in visiting a live ASP.NET MVC5 web site at http://samplemvcwebapp.net/ from which most of these examples were taken. It contains some fictitious blog posts you can edit and includes links to the application code on GitHub.  This is part of an open-source project that I am currently building to help developers quickly build applications with good error checking and user feedback.