Set-based Constraint Violation Reporting in SQL Server

When you're importing data into an RDBMS in bulk and an exception condition is raised because of a constraint violation, you generally need to fix the problem with the data and try again. The error won't tell you which rows are causing the violation. What if you've thousands of rows to search when it happens? There are solutions, writes William Sisson.

Introduction

Many of us have experience on more than one SQL-DBMS, but it is almost inevitable that you will become a specialist on one product or another. All SQL-DBMSs are, of course, based on the relational model as originally defined by Edgar Codd (how well SQL follows the relational model is another issue, which we won’t go into here). Though the basic principles of all products are the same, sometimes the details of particular implementations can lead to difficulties when you make the move from one product to another. I have worked with Sybase and Oracle in the past, but today I make my living with Microsoft SQL Server, so I am most up to date with that product and have the deepest knowledge of it.

It can be interesting to look at the way the different products solve the same problems, and to observe what features may be implemented in one and not in the other. One of these differences is the way that Oracle and SQL Server handle the reporting of multiple constraint violations from a single statement. The Oracle approach is elegant and straightforward, in SQL Server we have to put a lot of effort into devising a workaround.  

Oracle doesn’t always do better  in this sort of comparison; SQL Server specialists are often puzzled by the dire warning issued against using triggers by many Oracle experts. To those of you are familiar with SQL Server but not with Oracle, triggers in Oracle can be used effectively but it is considerably more challenging to use them correctly [1].

On the other hand Oracle specialists may be equally puzzled by what I am about to say about SQL Server.

The Problem

For the most part SQL Server deals fairly well with set based operations, but with constraints in general, some serious shortcomings quickly become apparent. Any SQL insert, update or delete command will cause an exception condition to be raised that  returns one constraint violation message, no matter how many constraints are violated, how many different types of constraint are violated,  or how many of the affected rows violate the constraint.

Before we go any further, let’s have a closer look at the kind of problems that SQL Server’s standard constraint violation reporting leads to.

We will use the following small database design for the examples.

We now attempt the following insert:

Resulting in the following message:

Let’s say we now add the primary key reference to fix this problem:

Attempting the insert on sales_order_line again, we once again get a foreign key violation constraint message, but for a different foreign key.

 

Only one key violation is reported at a time. The situation gets even worse if we are inserting multiple rows.  First let’s fix the foreign key violation on product:

Now we attempt to insert several rows into sales_product_line, some of which are valid and some of which will violate foreign key constraints:

We observe that the insert values contain not just foreign key violations, but also primary key violations.

When issuing the insert command, as usual, only one of the constraint violations is reported:

If your database is at the size of this demonstration, it would seem absurd to consider that this is a difficulty. However, with the enormous amounts of data being routinely imported into, or captured by, databases, a constraint violation exception can lead to a great deal of subsequent forensic work to find the rows that did it.

Our Oracle user friends would probably say at this point that they don’t understand what the problem is. We just use the LOG ERRORS clause on the insert. This feature has been available since Oracle 10g Version 2.

I created our test database in Oracle and issued the following commands:

This creates a table to hold the errors with the same columns as sales_order_line and some additional columns to hold the error information.

We then issue the insert with the LOG ERRORS clause added at the end (the syntax has been adapted as Oracle does not support multiple value lines in the insert statement, also I had to shorten the constraint names as Oracle will only accept 30 characters):

This gives us the following result:

A Solution

There are ways of avoiding this problem in SQL Server. Some workarounds are:

  • Using Stored procedures or Table-valued functions. Inserts and updates will be done by the application using a stored procedure. This would allow us to do whatever validation and checking is appropriate before the insertion is done. 
  • Doing inserts a row at a time in a try-catch block This solution has been described here on Simple Talk in Handling Constraint Violations and Errors in SQL Server
  • The use of an INSTEAD OF trigger would allow us to find a way of executing this INSERT statement  so that we get a report of all the constraint violations before we actually apply it.

The first two solutions will work well if you are in control of the import process. However, this isn’t always the case. By using the INSTEAD OF trigger, users can use their existing ways of inserting, deleting or updating data. In addition we can get much closer to the behaviour of the elegant Oracle solution. Clearly this cannot be achieved in a standard trigger on the table, because the database update has already taken place before the trigger has run and, if any constraints are violated, then the  trigger code will never be executed.

Here is the INSTEAD OF insert trigger to report all the constraint violations for the primary key and the two foreign key constraints in a global temporary table.

We now do the insert again:

We now get this result:

We can now execute the query from the RAISERROR message:

Which now gives us all the rows that violated constraints and the corresponding constraint name.

I have deliberately used temporary tables here as the we are probably only concerned with the results during the current session and we can save ourselves a lot of housekeeping by not holding the data permanently. Of course triggers for update and delete would also be required.

Now I can think of a number of objections to this approach.

  • Maintenance  First of all, maintaining these triggers looks like a lot of work. We can ease the burden by generating the triggers automatically. The trigger code above was generated by a query (you can find the code for the query at the end of the article). Furthermore, the trigger could be regenerated in a DDL trigger whenever an ALTER TABLE is issued against the table. This way the trigger will always be in step with the constraints defined in the database.
  • Scalability. If we do a mass update and every row violates every constraint on the table then we could be faced with a massive number of output rows from the constraint reporting. It would be useful to have some way we could switch off or limit the number of rows returned. We could hard code a limit into the triggers but this seems a little arbitrary and may lead to misleading results.

Another refinement would be to have the INSTEAD OF triggers on views rather than on the tables themselves. If the view was updated then all the constraint violations would be reported, a direct update on the table would only report the first constraint on the first row that violated that constraint. Of course this would mean maintaining a whole set of views with the same structure as the underlying tables, but here again DDL triggers could come to our aid to modify the views automatically whenever the table structure changes (we would need a completely consistent naming convention for the views to achieve this).

Conclusion

As we have seen, different SQL-DBMS products can vary considerably in how they handle particular issues. Though it can be hard enough to keep up to date on the details of our “home” product, it can be very interesting to have a heads up from time to time and look at how the other half lives. Sometimes the grass on the other side of the fence is greener, but sometimes the concrete is grayer.

I think we can be fairly hopeful that Microsoft will eventually bring a similar kind of constraint reporting into the core product, thus matching the Oracle functionality, but until that time I hope that this article has provided some ideas about how a workaround could be developed.

References

[1] Oracle: Triggers Considered Harmful, Considered Harmful

Acknowledgments

Though  I have been thinking about this problem for some time, I was prompted into action by the following article on Fabian Pascal’s dbdebunk website :
Weak Entities, Referential Constraints & Normalization

Appendix: Code to Generate Insert Trigger

Here is a test example: