Getting rid of SQL Code

Joe becomes intrigued by the way that experts make errors in any area of technology, and suggests that the problem is more that of mindsets than lack of knowledge. He illustrates the point with SQL Development by means of the "Britney Spears, Automobiles and Squids" table, and the tangled Stored procedure, and shows ways of getting rid of both procedural and non-procedural code by adopting a different programming mindset.

Errors in complex technology

The book “The Logic of Failure: Recognizing and Avoiding Error in Complex Situations” by Dietrich Dörner (ISBN-13: 978-0201479485; 1997) explains the mindsets of people who were trying to solve a problem and got it wrong. He goes into details of the action of the operators of the Chernobyl reactor, who were all trained experts. They weren’t incompetent.  They ignored safety standards because they knew what they were doing. Dörner identifies the mindsets that cause failures, and we can apply this insight to programming.

First mindset problem:

Once you have started in one direction, you stick to it.  But the Turkish proverb is right: “No matter how far you have travelled down the wrong road, turn around”. 

Second mindset problem:

Most programmers started with procedural languages and still carry that mindset into declarative SQL.  This is another proverb: “To a child with a hammer, everything looks like a nail.”

In SQL, I usually find this error in attempts to save bad DDL with TRIGGERs, CURSORs, stored procedures and application code.  Those are the ways to get procedural code into SQL. 

Third mindset problem:

People prefer static mental models, rather than a dynamic model.  This was also the model for traditional programming languages – when you compile a program, you get the same executable code every time.  That does not work in SQL; this is why we have query optimizers rather than procedural code in RDBMS. 

An example of that mindset was a small company that did medical lab reports over the internet.  They had to deliver 24 hour or less turn-around of accurate data.  The clumsy schema required convoluted queries and lots of procedural code to validate the data.  Most of the data validation could have been done declaratively much faster. Hey, it works, so why change it?   My personal favourite was keeping numeric data in strings, then casting both blanks and NULLs to zero. 

The business was successful and the data volume started growing.  But the queries were growing slower at a faster rate than the company, so you could see the train wreck coming.  Management’s solution was to buy more hardware.  This is like adding more rails to the end of the track; if the bridge is wrecked this does not solve the fundamental problem. 

Fourth mindset problem: 

People have a strong tendency to focus on immediately problems and not ask what will happen next.  This will then lead to problems caused by the solutions.  This is also known as the Law of Unintended Consequences.  There is a wonderful book “The Systems Bible: The Beginner’s Guide to Systems Large and Small (Third Edition of SYSTEMANTICS)” by John Gall (ISBN 0-9618251-7-0; 2003) that details this phenomena. 

In IT, there is a proverb: “Nothing is more permanent than a ‘temporary patch’ in software”; I will let the reader fill in his favourite horror story here – we all have one.

We’ll illustrate a few of these mindset problems in some code written by a competent expert.  

Getting Rid of Procedural Code

This is from a real email I got.  I was asked if I could see any problems with the stored procedure below and if I had any tips regarding naming conventions or a better approach.   

There is a lot of stuff that we don’t do in SQL here.  It looks more like a COBOL or BASIC program written in T-SQL.

  1. The “Employees” table ought to be named “Personnel” because it is a set.  But if you are still writing sequential file processing, your mindset only lets you see the trees, one at a time, instead of the forest. 
    Can you figure out why he would use “A” as an alias for the Employees table?  Alphabetical order and sequential thinking again!  In later code he used “B” for the second alias, etc.

  2. I’d suggest that Pascal and camelCase is bad for maintaining code. 

  3. What do the OUTPUT values do?  Well, @Employee_Id returns an IDENTITY value, so we have no verification or validation possible for this data element.  You need better ids than a count of physical insertion attempts to the hardware.
    @Employee_Created and @Employee_Modified are audit meta-data and should not be here at all.  You never put audit data in the table being audited.  But even here, he used getdate()  instead CURRENT_TIMESTAMP just to be proprietary. 
    @UsernameExists is a local flag, just like you would write in Assembly language.  Remember those days?  Make a test and look at a register!  He does not understand that SQL would use a predicate. 

  4. Look at more assembly language programming with @CommandStatus

  5. Why write “NOT EXISTS (SELECT TOP 1 ..)“?  This is silly, proprietary and it doesn’t even have an ORDER BY clause.  Now look at “WHERE NOT A.employee_id = @employee_id” and think about trying to maintain the code. 

  6. @CommandStatus is an example of “flag coupling”, one of the worst kinds of coupling in Software Engineering.  This procedure depends on that flag, this module will do completely different things. Look at what happens when you add a new guy with (@CommandStatus = 3); insert him into the Employees table, flag his orders for deletion and finally report an error with invented codes.  ARRGH! 

When you write older procedural code, the application and the data are coupled in the same module.  You have to do everything in one place.  There were no log files for audit data.  Files had no constraints.  The closest thing to limiting user access was a write-protect ring on a reel of magnetic tape (Google it if you are too young to remember). 

SQL uses DDL, DML and DCL sublanguages to assure data integrity and DDL is the most important line of defense. Let’s put all of those IF-THEN-ELSE control flows from the original procedure into constraints:

Now back to a coherent module that does one job in itself.

Pick your favorite error handling method here.  There is the traditional @@ERROR and RAISERROR or the new TRY-CATCH method. You will get the name of the constraint for display in the error message.

There is no need to invent a new @EmployeeStatus code to pass along to calling programs.  That was COBOL, 40 years ago.  The only serious problem is that SQL Server does not support the Standard SQLSTATE yet. 

The audit data is being done with a third party tool outside the table; deleting a row will not destroy the audit trail anymore.  

For this problem, I would put the update of an existing employee in a separate “CREATE PROCEDURE UpdateEmployee()” so that I can add some DCL on it so nobody can change other people’s data.  But if you want to have this in one module, you can use the new MERGE statement:

Remember to catch errors, of course. The advantages of the MERGE over all that procedural code are huge because

  1.  It is shorter and easier to read than line after line of procedural code.  

  2. It is Standard SQL!  It will port.  Other SQL programmers can read it and maintain it. 

  3. It is one statement, so there is no worry about isolation levels and other sessions.  Your error messages are for just one SQL statement rather than several different ones in a multi-statement block of code. 

  4. It is one statement, so it can be optimized.  Procedural language compilers do their optimizations in multiple passes and have elaborate symbol tables and other structures to get a good executable module. T-SQL is a simple one-pass compiler.  That is why you have to use the “@” prefix.  The prefix tells the compiler about the nature of the variable so it does not have to use a symbol table. 

Getting Rid of Non-Procedural Code

One common problem in SQL Server (but not other SQLs) is that it overly sensitive to referential cycles.  If Table A references Table B twice or more, you get an error message.  This makes it hard to have a data element play two or more roles in a table. 

But most of the time, the problem is a design flaw.  Here is an actual skeleton schema from a Newsgroup posting.  The poster asked what the design flaw(s) were.

SQL Server will return an error when you run this code:

The immediate flaws are improper data element names, which is easy to fix with a global text replacement, but let’s get this out of the way. 

  1. The Employees table should be named Personnel.  It models a set and not one record at a time.  Even Microsoft gave up on camelCase, so the constraint names should be changed, too. 

  2. The “works_for” column is a bad data element name.  It is a verb that describes a relationship and not the name of an attribute.  We will get to that problem shortly.  It is serious and fundamental.

  3. “dept_code” is not an identifier any more that “shipping_code” would be.  A code is a different kind of attribute than a tag number or name. It would be something like “unfunded”, “over budget”, etc. 

  4. Being the department boss is not a property of a department, but of an employee’s job assignment.  Is the driver a part of the automobile? 

  5. Likewise, “dept_boss” is a role played by some emp_ssn.  It is not some special attribute of a department. 

In short, you have built “Britney Spears, Automobiles and Squids” tables.  You have used improper data element names.  Let’s fix the data element names first and consolidate the table declaration into one declarative statement per table. 

While it is a good idea to name constraints, remember that you do it so that you can see those names in errors messages or use them to alter the table.  Naming a PRIMARY KEY is a bit redundant, isn’t it?  The error message will tell you about such violations.

Let’s next get Britney Spears and the Squids into their own tables by splitting out entities and relationships. 

For this example, let’s not bother with the organizational hierarchy and just assume that we have a nested sets model in the rest of the table.  I also made a two-column the key just to make the next table more interesting. 

The DRI cycle problem is now gone.