Checking your data on entry
Contents
Introduction
If you are completely confident about the data that is being inserted or updated in a table, then you won’t need constraints. Even with a complete logical interface of stored procedures you’ll still require them for development work because the impossible always seems to happen.
In the real world, however, and particularly if the real world includes programmers using ‘Dynamic SQL’, or data feeds into tables and so on, then they are a vital defence for the integrity of your data.
There are several different ways of putting constraints and checks on data, and the purpose of this workshop is just to point out the odd quirk that might be of practical help, whilst running through the list.
Rules, defaults, indexes, and triggers are used for:
- Requirement integrity – a column must have data in every row, no nulls
- Check or Validity – Is the data valid? a valid Email address for example. Triggers or rules are generally used.
- Uniqueness – no two table rows can have the same non-null values for the selected columns. Unique Indexes are used for this.
- Referential integrity – data inserted into a table column must already have matching data in another table column or another column in the same table.
I have a certain difficulty with this workbench, as the SQL Server team have given advanced notice of its intention to replace Rules and Defaults, probably with ANSI Domains. However, until that time comes, they are the most effective way to implement certain business rules in the database.
One could argue that they shouldn’t be used because they are deprecated, but the fact is that there is no realistic substitute in SQL 2005 for the functionality, and safety mechanisms, they provide.
Would you refuse to wear a seatbelt, on hearing that a better design was due in four years time?
Rules
Rules are the best way of ensuring that business rules, especially those that are liable to change, are implemented in one place only.
As mentioned above, Rules are considered deprecated in SQL Server 2005. However there is nothing yet that replaces the functionality they provide until Domains are properly implemented. One should be aware of possible problems when upgrading. BOL gives the mysterious advice to use Check constraints instead, as if the two devices had the same functionality.
Rules are used to check data that is being inserted or updated in columns. If you need to perform a fairly complex check, and it has to be done in more than one table, then a rule is very handy. One rule can be referenced from any number of tables or user-defined data types in the database. You can bind a rule to to a user-defined data type, so it will then apply to any column where the datatype is used. This is even more convenient and safe.
The process is…
- Create the rule using CREATE RULE.
- Bind the rule to a column or user-defined datatype using sp_bindrule.
- Test the bound rule by inserting or updating data.
You can create a rule easily in EM or SMSS but don’t even think about altering the rule if you have bound it to an object. The interface was not implemented with any enthusiasm.
Imagine that you have a database in which you wish to handle postcodes. Normally, you’d want to put the whole messy business of checking that the format is valid into a stored procedure, as well as requiring that the front-end applications do all the appropriate filtering. However, this isn’t always possible. Whatever happens, complex code like this must be kept in one place and heavily documented.
Postcode validation criteria are as follows (slightly simplified):
- The entire length must be between 6 and 8 characters
- A space must be included
- The local (inward) code, to the right of the space, always 3 chars
- The first character of the local code must be numeric
- The second and third characters of the local code must be alpha
- The Sorting Office (outward) code the left of the gap, can be between
- 2 and 4 characters
- The first character of the Sorting Office (outward) code must be alpha
We put these criteria, which will define our RULE, in a test harness to develop it and test it as much as possible, as they are not the easiest of objects to change.
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | --start of the test harness  DECLARE @Postcode VARCHAR(10)  SELECT @Postcode='CO10 7SP'  SELECT CASE WHEN  --start of the validation rules  LEN (LTRIM(@Postcode)) BETWEEN 6 AND 8--between 6 and eight chars  AND  LEN(REPLACE(@Postcode,' ',''))=LEN(@postcode)-1--one space only!     --local (inward) code, to the right of the space, 3 characters   AND--The second and third characters of the local code must be alpha   SUBSTRING(@Postcode,CHARINDEX(' ',@Postcode)+1,3) LIKE '[0-9][a-Z][a-z]'  AND--first character of the Sorting Office (outward) code must be alpha   SUBSTRING(@Postcode,1,CHARINDEX(' ',@Postcode)-1) LIKE '[a-z][0-9a-z]%'  AND--Sorting Office (outward) code the left of the gap, between 2 and 4   CHARINDEX(' ',@Postcode) BETWEEN 3 AND 5  --one number followed by two letters.  --end of the validation rules  THEN 'good' ELSE 'bad' END  --end of the test harness  /* Now we create the RULE  You can use any name for the argument (we chose @Postcode), but the  first character must be "@." Make sure it is something meaningful  You can use built-in functions but anything that references other  database objects is forbidden.    */   IF EXISTS (SELECT * FROM dbo.sysobjects          WHERE id = OBJECT_ID(N'[dbo].[PostcodeValidation]')          AND OBJECTPROPERTY(id, N'IsRule') = 1)  DROP RULE [dbo].[PostcodeValidation]  GO  CREATE RULE PostcodeValidation  AS  LEN (LTRIM(@Postcode)) BETWEEN 6 AND 8--between 6 and eight chars  AND  LEN(REPLACE(@Postcode,' ',''))=LEN(@postcode)-1--one space only!     --local (inward) code, to the right of the space, 3 characters   AND--The second and third characters of the local code must be alpha   SUBSTRING(@Postcode,CHARINDEX(' ',@Postcode)+1,3) LIKE '[0-9][a-Z][a-z]'  AND--first character of the Sorting Office (outward) code must be alpha   SUBSTRING(@Postcode,1,CHARINDEX(' ',@Postcode)-1) LIKE '[a-z][0-9a-z]%'  AND--Sorting Office (outward) code the left of the gap, between 2 and 4   CHARINDEX(' ',@Postcode) BETWEEN 3 AND 5  --one number followed by two letters.  GO  --then we apply the documentation  EXEC   sp_addextendedproperty 'MS_Description',   'length between 6 and 8 characters  Containing one space only   3 characters to the right of the space  of which the first myst be numeric and the other two alpha  Before the space can be between 2 and 4 characters   of which the first must be alpha',   'user', dbo, 'rule', PostcodeValidation  | 
With that done we can then bind the rule to a column of a table. You can bind a rule to as many columns as you wish within the database: it keeps everything neat without duplication of the implementation of business rules.
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | CREATE TABLE TestOfRule (MyID INT IDENTITY(1,1), Postcode VARCHAR(10))  /*  You cannot bind a rule to a text, image, or timestamp datatype column.   You cannot bind rules to system tables. Still, who cares?*/  --First bind the rule to the table column  EXEC sp_bindrule PostcodeValidation, 'TestOfRule.Postcode'  --Now let's try out some inserts  INSERT INTO TestOfRule(postcode) SELECT 'CM20 3EQ'--OK  INSERT INTO TestOfRule(postcode) SELECT 'CM2 3EQ'--OK  INSERT INTO TestOfRule(postcode) SELECT 'CM 3EQ'--OK  INSERT INTO TestOfRule(postcode) SELECT 'CM2 30EQ'--not so good  INSERT INTO TestOfRule(postcode) SELECT 'CM2  3EQ'--not so good  INSERT INTO TestOfRule(postcode) SELECT 'SQUIFFY'--Not so good  /*---error---!  A column insert or update conflicts with a rule imposed by a previous   CREATE RULE statement. The statement was terminated. The conflict   occurred in database 'master', table 'TestOfRule', column 'Postcode'.  The statement has been terminated.  | 
Now, could we use a function in a rule? This would make it so much easier to develop and we can do a bit more processing.
Sadly no. A missed opportunity.
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | CREATE  FUNCTION [dbo].[ufiCleverStuff]   ( @string VARCHAR(8000)) RETURNS INT AS BEGIN RETURN 1 END  ALTER TABLE dbo.TestOfRule ADD    Address VARCHAR(255) NULL  CREATE RULE AddressValidation AS dbo.ufiCleverStuff(@address)=1  --This will cause an error!  /* However, we can bind the rule to a user-defined data type */  EXEC sp_addtype Postcode,'Varchar(10)',NONULL --NONULL=no nulls allowed  EXEC sp_bindrule PostcodeValidation, 'Postcode'  sp_help Postcode --just to chck that the rule is there (Rule_Name)  --And use it wherever we want  CREATE TABLE SecondTestOfRule (MyID INT IDENTITY(1,1), Postcode postcode)  --and try it out  SELECT * FROM SecondTestOfRule  INSERT INTO SecondTestOfRule(postcode) SELECT 'ME3 5EQ'  --OK  INSERT INTO SecondTestOfRule(postcode) SELECT 'AR3 567'  --didn't like the 567 bit  INSERT INTO SecondTestOfRule(postcode) SELECT 'ME2 30RP'  --not so good  INSERT INTO SecondTestOfRule(postcode) SELECT 'ME 3RP'  --OK  INSERT INTO SecondTestOfRule(postcode) SELECT 'DE52  3EQ'  --not so good  INSERT INTO SecondTestOfRule(postcode) SELECT 'SQUIFFY'  --Not so good  UPDATE SecondTestOfRule SET postcode='SE34 2DS' WHERE myid=5  UPDATE SecondTestOfRule SET postcode='squiffy' WHERE myid=5  --good, it chucked it out  /*  Rules bound to columns take precedence over rules bound to user   datatypes.  */  --now what happens when we create a variable of the datatype  DECLARE @MyPostcode Postcode  SELECT @Mypostcode='&^bananas****'  /*  | 
Not what you’d expect; it accepts a silly postcode. There is, sadly, no checking of a rule on variable with a user-Defined datatype.
Defaults
Defaults and rules seem to have attracted the displeasure of the SQL Standards committee.
Defaults are deprecated by SQL Server 2005 but are actually rather useful because they can be ‘bound’ to a user-defined Data Type as well as any column in the entire database. There is no other way of doing this.
Take a common example. I believe that every table should have a column that gives the insertion date. You can, of course laboriously set up a default constraint on every table but why not do the following…
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | --just create a simple default of the current date and time  CREATE DEFAULT DateOfInsertion AS GETDATE()  --now document it  EXEC   sp_addextendedproperty 'MS_Description',   'this is a default supplying the current date',   'user', dbo, 'default', DateOfInsertion  --add a new type called InsertionDate  EXEC sp_addtype InsertionDate,'DateTime',NONULL   --add the default to the type  sp_bindefault DateOfInsertion, "InsertionDate" sp_help InsertionDate --check that the default is there (Default_name)  --now the whole process is a lot simpler  CREATE TABLE EventLog (MyID INT IDENTITY(1,1),                                         TheDescription VARCHAR(100),                                         insertionDate insertionDate)  INSERT INTO Eventlog(theDescription)                                 VALUES ('the martians are attacking')  INSERT INTO Eventlog(theDescription)                                 VALUES ('It is the treens')  INSERT INTO Eventlog(theDescription)                                 VALUES ('Call in Dan Dare')  INSERT INTO Eventlog(theDescription)                                 VALUES ('Dan Dare to the rescue')  SELECT * FROM EventLog--note that the dates have been filled in  /* but there is a feeling that there is something missing.. Why not  have a nice identity field User-Defined Data Type too?         */  EXEC sp_addtype counter,INT,'IDENTITY' -- Sybase only. Sorry  --you'll get an error  | 
Triggers
Triggers can be quite complex, as Pop Rivett explains elsewhere on this site with his excellent ‘Auditing’ trigger. We’ll only discuss triggers as data constraints.
Imagine we want to check addresses that are being placed in an address table. We want to ensure that we’ve done reasonable sanity checks to the address. For a start, we believe that an address should contain at least four words and alphanumeric characters with just a sprinkling of punctuation
Here is one of Phil Factor’s routines (thanks Phil).
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 | CREATE FUNCTION [dbo].[ufiWordcountVarchar] --counts the number of   --words in a VARCHAR. We need something like this to check a valid  --address.  (   @string VARCHAR(8000),  @WordStartCharacters VARCHAR(255)='a-z',  @WordCharacters VARCHAR(255)='-a-z'''  )   RETURNS INT  AS   BEGIN   DECLARE @Position INT,   --index of current search         @WhereWeAre INT,--index into string so far         @wordcount INT,--the number of words found so far         @next INT,       --where the next search starts          @size INT        --the total size of the text  SELECT         @WhereWeAre=1,@size=LEN(@string),@Position=1,@wordcount=0  WHILE @Position>0         BEGIN         SELECT @Position=PATINDEX('%['+@wordStartCharacters+']%',                                 SUBSTRING(@string,@whereWeAre,8000))         IF @Position>0                  BEGIN                 SELECT @next=@WhereWeAre+@Position,@wordcount=@Wordcount+1                 SELECT @Position=PATINDEX('%[^'+@wordCharacters+']%',                                 SUBSTRING(@string,@next,8000)+' ')                 SELECT @WhereWeAre=@next+@Position                 END         END  RETURN @wordcount  END  GO  /* so we create a test table and add a trigger to it, putting the two   checks in the trigger */  CREATE TABLE MyAddresses (MyID INT IDENTITY (1,1), Address VARCHAR(100))  GO  CREATE TRIGGER tCheckAddress_MyAddress  ON MyAddresses  FOR INSERT, UPDATE  AS  BEGIN  IF (SELECT MIN(dbo.ufiWordcountVarchar(Address,DEFAULT,DEFAULT))         FROM inserted)<4  OR    EXISTS (SELECT 1 FROM inserted WHERE address LIKE '%[^-a-z0-9''",.()]%')         BEGIN         ROLLBACK TRANSACTION         RAISERROR ('Suspect address inserted into MyAddresses',16,1)         END  END  --not forgetting the documentation  EXEC   sp_addextendedproperty 'MS_Description',   'This checks to see if there are at least four words in the address  and checks for characters you wouldn''t expect to see in an address ',   'user', dbo, 'table', MyAddresses,'trigger', tCheckAddress_MyAddress  INSERT INTO MyAddresses (address)                  VALUES ('32, Acacia Avenue, Goldhay, Berkshire')  INSERT INTO MyAddresses (address)                  VALUES ('12 the Firs, Clive West Way, Slough, Berks')  INSERT INTO MyAddresses (address)                  VALUES ('46, the promenade, felixstowe essex fizzbang<>*')  INSERT INTO MyAddresses (address)                  VALUES ('Buckingham palace')  UPDATE Myaddresses SET address='Aargh!' WHERE MyID=2  SELECT * FROM MyAddresses  | 
Integrity Constraints
Integrity Constraints, like rules, limit or ‘constrain’ you in what you can put in a table or column. Unlike triggers, integrity constraints cannot cascade changes through related tables in the database, enforce complex restrictions by referencing other database objects, perform “what if” analyses or roll back the current transaction as a result of enforcing data integrity. (With triggers, you can either roll back or continue the transaction, depending on how you handle referential integrity.) Nevertheless they are the most commonly used constraint and simple to define and alter.
There are three types of Table-level constraint:
- Unique and primary key constraints ( no two rows in a table have the same values in the specified columns – with a primary key constraint it can’t be null either).
- Referential integrity constraints (enforces the rule that, for a specific column, there must already be matching data in the column it references).
- Check constraints limit the values of data inserted into columns.
We’re most concerned with Check constraints in this article. Like a rule, the check has to be an expression that would fit in a WHERE or IF Clause, but can’t involve a subquery. You can apply several constraints on the one column.
Imagine we want to check that an email address is valid.
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | CREATE TABLE TestConstraint   (      MyID INT IDENTITY(1,1),          MyEmailAddress VARCHAR(50) CONSTRAINT my_check_constraint           CHECK (MyEmailAddress NOT LIKE '%[^a-z-_.@]%'          AND LEN(REPLACE(MyEmailAddress,'@',''))=LEN(MyEmailAddress)-1)  )  EXEC   sp_addextendedproperty 'MS_Description',   'Check to make sure that only valid characters are in the email address  and there is only one @ sign',   'user', dbo, 'table', testconstraint,'constraint', my_check_constraint  INSERT INTO TestConstraint (MyEmailAddress)                                        SELECT 'Robyn.Page@Simple-talk.com'  INSERT INTO TestConstraint (MyEmailAddress)                                        SELECT 'Silly@>@Simple-talk.com'  INSERT INTO TestConstraint (MyEmailAddress)                                        SELECT 'Bill.Gates@Microsoft.com'  UPDATE TestConstraint          SET MyEmailAddress='dodgy@@simple-talk.com' WHERE MyID=1  /* the second one  and fourth will fail. Ideally, one should be able to   bind a user-defined error message to the constraint. Again, sorry -   Sybase only!  if we want the constraint to check other columns, then we need a table  -level constraint rather then a column constraint*/  CREATE TABLE SecondTestConstraint   (      MyID INT IDENTITY(1,1),          MyName VARCHAR(50),         MyDomain VARCHAR(50),  CONSTRAINT my_Second_check_constraint           CHECK (MyName+MyDomain NOT LIKE '%[^a-z-_.@]%'          AND LEN(REPLACE(MyName+MyDomain,'@',''))                                 =LEN(MyName+MyDomain)-1)  )  INSERT INTO SecondTestConstraint (MyName,MyDomain)                          SELECT 'Robyn.Page','@Simple-talk.com'  INSERT INTO SecondTestConstraint (MyName,MyDomain)                          SELECT 'Mad <andBad>','@Simple-talk.com'  | 
So we see that the table-level check has done a check based on both columns.
A unique index permits no two rows to have the same index value, including NULL. A unique index checks that all data in the column is unique when it is created and also when an insert or update is attempted.
You cannot create a unique index on a column that contains null values in more than one row – an error is, obviously, triggered.
You can use the unique keyword on composite indexes.
| 1 2 3 4 5 6 7 8 | CREATE UNIQUE INDEX idxUniqueEmail ON TestConstraint(MyEmailAddress)  INSERT INTO TestConstraint(MyEmailAddress) SELECT 'Gengis@Kahn.com'  INSERT INTO TestConstraint(MyEmailAddress) SELECT 'Kubla@Kahn.com'  INSERT INTO TestConstraint(MyEmailAddress) SELECT 'Robyn@Kahn.com'  INSERT INTO TestConstraint(MyEmailAddress) SELECT 'Anyone@Kahn.com'  INSERT INTO TestConstraint(MyEmailAddress) SELECT 'Admin@Kahn.com'  INSERT INTO TestConstraint(MyEmailAddress) SELECT 'Gengis@Kahn.com'  --yes, they all go in except for the last one which triggers an error  | 
The use of referential constraints is beyond the scope of this article, as it is difficult to talk about them without going on to write the entire chapter of a book, touching on the complexities of referential data models, cascading deletes and so on.
This workshop started off when I was staring at the tree view of Enterprise Manager, and suddenly realising I’d never used Rules and wondering what they were for! I hope you’ve learned something new too.
Just a few questions just to check …
- When would you choose a Rule as the best way of checking data going into your database
- What are the advantages of defaults over default constraints?
- Why would you choose a table-level constraint over a column level constraint?
- How do you ensure that the value entering a table is unique to the column?
- When would you choose a trigger to check on data over either a rule or default constraint?
- What is the point of putting a default on a user-defined datatype?
- What happens if you create a Unique index on a column that has two null values in it? Why?
- how would you check in SQL whether a default or rule is bound to a user-Defined datatype?
- What is the easiest way of ensuring that a column containing foreign keys actually reference valid primary keys in another table.
- What are Rules and Defaults likely to be replaced by in future?
 
         
        
Load comments