Robyn Page’s SQL Server Data Validation Workbench

Robyn Page provides essential techniques for ensuring the validity of the data being entered into your SQL Server tables.

Checking your data on entry

Contents

Introduction

If you are completely confident about the data that is being inserted or updated in a table, then you won’t need constraints. Even with a complete logical interface of stored procedures you’ll still require them for development work because the impossible always seems to happen.

In the real world, however, and particularly if the real world includes programmers using ‘Dynamic SQL’, or data feeds into tables and so on, then they are a vital defence for the integrity of your data.

There are several different ways of putting constraints and checks on data, and the purpose of this workshop is just to point out the odd quirk that might be of practical help, whilst running through the list.

Rules, defaults, indexes, and triggers are used for:

  • Requirement integrity – a column must have data in every row, no nulls
  • Check or Validity – Is the data valid? a valid Email address for example. Triggers or rules are generally used.
  • Uniqueness – no two table rows can have the same non-null values for the selected columns. Unique Indexes are used for this.
  • Referential integrity – data inserted into a table column must already have matching data in another table column or another column in the same table.

I have a certain difficulty with this workbench, as the SQL Server team have given advanced notice of its intention to replace Rules and Defaults, probably with ANSI Domains. However, until that time comes, they are the most effective way to implement certain business rules in the database.

One could argue that they shouldn’t be used because they are deprecated, but the fact is that there is no realistic substitute in SQL 2005 for the functionality, and safety mechanisms, they provide.

Would you refuse to wear a seatbelt, on hearing that a better design was due in four years time?

Rules

Rules are the best way of ensuring that business rules, especially those that are liable to change, are implemented in one place only.

As mentioned above, Rules are considered deprecated in SQL Server 2005. However there is nothing yet that replaces the functionality they provide until Domains are properly implemented. One should be aware of possible problems when upgrading. BOL gives the mysterious advice to use Check constraints instead, as if the two devices had the same functionality.

Rules are used to check data that is being inserted or updated in columns. If you need to perform a fairly complex check, and it has to be done in more than one table, then a rule is very handy. One rule can be referenced from any number of tables or user-defined data types in the database. You can bind a rule to to a user-defined data type, so it will then apply to any column where the datatype is used. This is even more convenient and safe.

The process is…

  1. Create the rule using CREATE RULE.
  2. Bind the rule to a column or user-defined datatype using sp_bindrule.
  3. Test the bound rule by inserting or updating data.

You can create a rule easily in EM or SMSS but don’t even think about altering the rule if you have bound it to an object. The interface was not implemented with any enthusiasm.

Imagine that you have a database in which you wish to handle postcodes. Normally, you’d want to put the whole messy business of checking that the format is valid into a stored procedure, as well as requiring that the front-end applications do all the appropriate filtering. However, this isn’t always possible. Whatever happens, complex code like this must be kept in one place and heavily documented.

Postcode validation criteria are as follows (slightly simplified):

  • The entire length must be between 6 and 8 characters
  • A space must be included
  • The local (inward) code, to the right of the space, always 3 chars
  • The first character of the local code must be numeric
  • The second and third characters of the local code must be alpha
  • The Sorting Office (outward) code the left of the gap, can be between
  • 2 and 4 characters
  • The first character of the Sorting Office (outward) code must be alpha

We put these criteria, which will define our RULE, in a test harness to develop it and test it as much as possible, as they are not the easiest of objects to change.

With that done we can then bind the rule to a column of a table. You can bind a rule to as many columns as you wish within the database: it keeps everything neat without duplication of the implementation of business rules.

Now, could we use a function in a rule? This would make it so much easier to develop and we can do a bit more processing.

Sadly no. A missed opportunity.

Not what you’d expect; it accepts a silly postcode. There is, sadly, no checking of a rule on variable with a user-Defined datatype.

Defaults

Defaults and rules seem to have attracted the displeasure of the SQL Standards committee.

Defaults are deprecated by SQL Server 2005 but are actually rather useful because they can be ‘bound’ to a user-defined Data Type as well as any column in the entire database. There is no other way of doing this.

Take a common example. I believe that every table should have a column that gives the insertion date. You can, of course laboriously set up a default constraint on every table but why not do the following…

Triggers

Triggers can be quite complex, as Pop Rivett explains elsewhere on this site with his excellent ‘Auditing’ trigger. We’ll only discuss triggers as data constraints.

Imagine we want to check addresses that are being placed in an address table. We want to ensure that we’ve done reasonable sanity checks to the address. For a start, we believe that an address should contain at least four words and alphanumeric characters with just a sprinkling of punctuation

Here is one of Phil Factor’s routines (thanks Phil).

Integrity Constraints

Integrity Constraints, like rules, limit or ‘constrain’ you in what you can put in a table or column. Unlike triggers, integrity constraints cannot cascade changes through related tables in the database, enforce complex restrictions by referencing other database objects, perform “what if” analyses or roll back the current transaction as a result of enforcing data integrity. (With triggers, you can either roll back or continue the transaction, depending on how you handle referential integrity.) Nevertheless they are the most commonly used constraint and simple to define and alter.

There are three types of Table-level constraint:

  1. Unique and primary key constraints ( no two rows in a table have the same values in the specified columns – with a primary key constraint it can’t be null either).
  2. Referential integrity constraints (enforces the rule that, for a specific column, there must already be matching data in the column it references).
  3. Check constraints limit the values of data inserted into columns.

We’re most concerned with Check constraints in this article. Like a rule, the check has to be an expression that would fit in a WHERE or IF Clause, but can’t involve a subquery. You can apply several constraints on the one column.

Imagine we want to check that an email address is valid.

So we see that the table-level check has done a check based on both columns.

A unique index permits no two rows to have the same index value, including NULL. A unique index checks that all data in the column is unique when it is created and also when an insert or update is attempted.

You cannot create a unique index on a column that contains null values in more than one row – an error is, obviously, triggered.

You can use the unique keyword on composite indexes.

The use of referential constraints is beyond the scope of this article, as it is difficult to talk about them without going on to write the entire chapter of a book, touching on the complexities of referential data models, cascading deletes and so on.

This workshop started off when I was staring at the tree view of Enterprise Manager, and suddenly realising I’d never used Rules and wondering what they were for! I hope you’ve learned something new too.

Just a few questions just to check …

  1. When would you choose a Rule as the best way of checking data going into your database
  2. What are the advantages of defaults over default constraints?
  3. Why would you choose a table-level constraint over a column level constraint?
  4. How do you ensure that the value entering a table is unique to the column?
  5. When would you choose a trigger to check on data over either a rule or default constraint?
  6. What is the point of putting a default on a user-defined datatype?
  7. What happens if you create a Unique index on a column that has two null values in it? Why?
  8. how would you check in SQL whether a default or rule is bound to a user-Defined datatype?
  9. What is the easiest way of ensuring that a column containing foreign keys actually reference valid primary keys in another table.
  10. What are Rules and Defaults likely to be replaced by in future?