VALUES() and Long Parameter Lists – Part II

The use of the comma-separated list of parameters to a SQL routine, that Phil Factor calls the 'comedy-limited list, is a device that makes seasoned SQL Database developers wince. The best choice of design for passing variable numbers of parameters or tuples to SQL Routines varies according to the importance to you of SQL Standards. Joe Celko discusses the pros and cons of the front-runners

In the first article of this series, we looked at some of the common ways that beginners to SQL mistakenly apply some of the techniques that served them well with a procedural language. About once a month, there is a request posted to a newsgroup or forum  asking how to pass a CSV (comma separated values) list in a string as a parameter to a store procedure. They miss their arrays and other variable-sized data structure and want to mimic them in SQL. They also have no idea what First Normal Form (1NF) and the Information Principle are and why they are the foundations of RDBMS.

Every time, those of us who give advice on newsgroups and forums see this type of question we refer to  the definitive articles at Erland Sommarskog’s website:

http://www.sommarskog.se/arrays-in-sql-2005.html#manyparameters

http://www.sommarskog.se/arrays-in-sql-2008.html

The approaches to this problem can be classified as (1) Iterative algorithms in T-SQL (2) Auxiliary series table to replace looping (3) Non-SQL tools, such as XML and CLR routines, (4) long parameter list and recently (5) table valued parameters, or TVPs.

The iterative, XML and CLR solutions are purely procedural and are not really SQL. Aside from the religious grounds about declarative programming, the practical grounds of increased maintenance costs and a lack of optimization make them look ugly. And I am too old to learn CLR languages.

The use of the series table in a single query and the table-valued parameter are declarative SQL. Of those two approaches, the table-valued parameters are the best. But both of them are extra work.

As an aside, when Erlang tested the long parameter list, he got a 19ms in execution time for 2000 elements which was better than any other method that he tested. But when he ran the procedure from PHP, the total time jumped to over 550ms, which was horrible. I have no idea what PHP is doing and have had no problems with other host languages.

All but the table-valued parameter approach require that you scrub the input string, if you want data integrity. If the data is already in a table, then you know it is good to go.

Data scrubbing is hard enough in T-SQL, but when you have to worry about external language syntax differences, it is a serious problem. Quickly, which languages accept ‘1E2’ versus ‘1.0E2’ versus ‘E2.0’ for floating point notation? Which ones cast a string like ‘123abc’ to integer 123 and which blow up?

The simplest answer is to use a long parameter list to construct lists and derived tables inside the procedure body. SQL server can handle up to 2100 parameters, which should be more than enough for practical purposes. SQL Server is actually a wimp in this regard; DB2 can pass 32K parameters. and Oracle can have 64K parameters. In practice, I have seen a list of 64 (Chessboard), 81 (Sudoku solver) and 361 (Go board) simple parameters. Most of the time, 100 parameters is a very safe upper limit.

Part I of this series listed the main advantages of the long parameter lists. These are:

  1. The code is highly portable to any Standard SQL. It can take advantage of any improvements in the basic SQL engine.
  2. The optimizer will treat the members of the long parameter list like any other parameter. They can be sniffed. You can use a RECOMPILE option and get the best performance possible each time the procedure runs.

This point gets lost on people. It means that I get the same error messages, testing and logic as any other parameter. When I use a home-made or external parser, I do not. Frankly, I have never seen anyone who used one of the other techniques write RAISERROR() calls to return the same error messages as the compiler.

The example in my first article was a procedure to create a basic Order header and its details in one procedure call. A skeleton for this insertion will use the price list and join it to the SKU codes on the order form. Here is the skeleton.

Do you see the idea? An entire program is being built from the long parameter list in a single declarative statement that uses no special features. I have used this same pattern to create routines for adding subordinates to a nested set tree given the root of the new subtree. Other options are use “COALESCE (@in_qty_x, 1)” on each row on the assumption they wanted at least one if they typed in a SKU, or to remove the SKUs that are NULL before the JOIN to the price list.

But the real differ between the long parameter list and a home-made parser is in the exception handling. Here is a table of various inputs and what happens to them.

1168-JC1.JPG

Legend:

  1. Msg 245, Level 16, State 1, Line x: Conversion failed when converting the varchar value <<string>> to data type int
  2. Msg 102, Level 15, State 1, Line x: Incorrect syntax near <<token>>

Notes:

  1. The CSV parser gives more serious errors, but the difference between level 15 and 16 is not much.
  2. CSV fails to do some valid conversions.
  3. Date testing has its own problems in CSV. The ISDATE() and the CAST() do not work the same way, so you need extra code in the CSV parser that does not depend on that function. Microsoft expects an ISO-8601 date in the string for the CAST(), but not for their proprietary ISDATE().

  4. OUTPUT parameters are impossible with the CSV parser:

  5. The NULL should produce an empty table only if we prune out NULLs, as it does. But the empty string should not become a zero. This is a string conversion error as far as I can tell; at least it is consistent, though.
  6. You can pass local variables with the long parameter list, but not with CSV:

A handy example of the technique is to insert a st of children under a parent in a Nested Set model. Let’s assume we have a basic Nested Set tree table:

Now add the root node to get things started:

The procedure shown here only does 10 children, but it can be easily extended to 100 or 1000 if needed.

— Find the parent node of the new subtree

Put the children into kindergarten; I just had to be cute.

Notice that we can build complete rows with the VALUES() clause immediately without looping or other procedural code.

Use the size of the kindergarten to make a gap in the Tree

Insert kindergarten all at once as a set

Here are some tests you can run to see that it works. The sibling order is the same as the parameter list order. You could also add text edits and other validation and verification code to the statement that built Kindergarten.

Could I have done this with a table-valued parameter? Sure. But there are down sides to that method. The code would not port because T-SQL does not agree with other vendors. The Microsoft model is that this is a type and not a structure. That is an important difference.

I have to declare the table as a type outside the procedure. In other SQL products I would declare it in the parameter list, so I can see the structure. I then have to load it outside the procedure body, then pass it to a local table inside the procedure body with the same structure.

The skeleton for this pattern is:

Create the data type

Create a table using the type

Load the table

Pass the table to the procedure, finally

Remember that the parameter has to be READ-ONLY to prevent changing the rows of @in_my_table_parm in the procedure. You can get a quick introduction to the technique at http://www.sqlteam.com/article/sql-server-2008-table-valued-parameters.

One advantage of the table-valued parameter is that it is a full table, with constriants for keys and so forth. You check the data before it goes to the procedure in a separate step, while the long parameter list checks the data after it goes to the procedure in the EXEC statement.