Sometimes, when writing TSQL code in functions or procedures, it is tempting to do iterations, or even worse, a cursor, when it isn’t really necessary. Cursors and iterations are both renowned for slowing down Transact SQL code. SQL Server just isn’t designed for it.
However, there is usually a way to do such operations in a set-based way. If you do so, then your routines will run a lot faster, with speed at least doubling. There are a lot of tricks to turning a problem that seems to require an iterative approach into a set-based operation, and we wish we could claim we’d invented one of them. Probably the most useful technique involves that apparently useless entity, the ‘helper’ table. This workshop will concentrate on this, because it is probably the most widely used.
The most common Helper table you’ll see is a table with nothing but the numbers in a sequence from 1 upwards. These tables have a surprising number of uses. Once you’ve understood the principles behind helper tables, then you’ll think of many more. We’ll be providing several examples where a helper table suddenly makes life easier. The objective is to show the principles so that you’ll try out something similar the next time you have to tackle a tricky operation in TSQL. As always, you’re encouraged to load the example script into Query Analyser or Management Studio, and experiment!
Our examples include:
Splitting Strings into table-rows, based on a specified delimiter
Encoding and decoding a string
Substituting values into a string
Extracting individual words from a string into a table
Extracting all the numbers in a string into a table
Removing all text between delimiters
Scrabble score
Moving averages
Getting the ‘Week beginning’ date in a table
Calculating the number of working days between dates.
Note. These examples use VARCHAR(8000)
just so they compile on both SQL Server 2000 and 2005. If you are using SQL Server 2005, you’ll probably want to change them to VARCHAR(MAX)
.
Before we start, we’ll need a helper table of numbers. Our examples aren’t going to require high numbers, but we’ve parameterised the size of the table that the routine creates
Creating the helper table
Here is a routine that checks to see if such a ‘helper’ table called numbers
exists, and, if not, creates it.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
CREATE PROCEDURE spMaybeBuildNumberTable @size INT=10000 AS BEGIN SET NOCOUNT ON IF NOT EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N'[dbo].[Numbers]') AND OBJECTPROPERTY(id, N'IsUserTable') = 1) BEGIN CREATE TABLE [dbo].[Numbers]( [number] [int], CONSTRAINT [Index_Numbers] PRIMARY KEY CLUSTERED ( [number] ASC ) ON [PRIMARY] ) ON [PRIMARY] DECLARE @ii INT SELECT @ii=1 WHILE (@ii<=@size) BEGIN INSERT INTO NUMBERS(NUMBER) SELECT @II SELECT @II=@II+1 END END END |
Once you have one of these tables, which we’ve seen described as the Transact SQL developer’s ‘Swiss Army Knife’, you will not want to be without it.
Splitting Strings into table-rows, based on a specified delimiter
Imagine you have a string which you want to break into words to make into a table.
1 2 3 |
DECLARE @ordinal VARCHAR(255) SELECT @Ordinal= 'first second third fourth fifth sixth seventh eighth ninth tenth' |
This can be done very simply and quickly through the following single SQL Select statement:
(Make sure you have executed the spMaybeBuildNumberTable
procedure first!)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
SELECT SUBSTRING(@Ordinal+' ', number, CHARINDEX(' ', @Ordinal+' ', number) - number) FROM Numbers WHERE number <= LEN(@Ordinal) AND SUBSTRING(' ' + @Ordinal, number, 1) = ' ' ORDER BY number RETURN /*----with the result first second third fourth fifth sixth seventh eighth nineth tenth (10 row(s) affected) |
You can then enshrine this principle into a table function that will take any delimiter to split a string. (We believe that the credit for this clever routine should go to Anith Sen).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
CREATE FUNCTION [dbo].[uftSplitString] ( @String VARCHAR(8000), @Delimiter VARCHAR(255) ) RETURNS @Results TABLE ( SeqNo INT IDENTITY(1, 1), Item VARCHAR(8000) ) AS BEGIN INSERT INTO @Results (Item) SELECT SUBSTRING(@String+@Delimiter, number, CHARINDEX(@Delimiter, @String+@Delimiter, number) - number) FROM Numbers WHERE number <= LEN(REPLACE(@String,' ','|')) AND SUBSTRING(@Delimiter + @String, number, LEN(REPLACE(@delimiter,' ','|'))) = @Delimiter ORDER BY number RETURN END |
This is the fastest means I have come across in TSQL to split a string into its components as rows. Try this, which give you a table with the integer and the nominal name.
1 2 3 4 5 |
SELECT * FROM dbo.uftSplitString( 'one,two,three,four,five,six,seven,eight,nine,ten',',') --or this, which SELECT * FROM dbo.uftSplitString( 'Monday--Tuesday--Wednesday--thursday--friday--saturday--sunday','--') |
Encoding and decoding a string
You can use the same principle for a lot of string operations.
Here is one that will URLencode
a string so it can be used in a POST
or GET HTTP
operation. The string is converted into a table with one character per row and, after being operated on, it is re-assembled.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
CREATE FUNCTION ufsURLEncoded ( @String VARCHAR(MAX) ) RETURNS VARCHAR(MAX) BEGIN DECLARE @URLEncodedString VARCHAR(MAX) SELECT @URLEncodedString='' SELECT @URLEncodedString=@URLEncodedString+ CASE WHEN theChar LIKE '[A-Za-z0-9()''*-._!]' THEN theChar ELSE '%' + SUBSTRING ('0123456789ABCDEF', (ASCII(theChar) / 16)+1,1) + SUBSTRING ('0123456789ABCDEF', (ASCII(theChar) % 16)+1,1) END FROM ( SELECT [theChar]=SUBSTRING(@string,number,1) FROM numbers WHERE number <= LEN(@String) ) Characterarray -- Return the result of the function RETURN @URLEncodedString END |
This sort of routine is a lot less complex than the iterative methods and is, as one would expect, a lot faster. Just to show that this is no fluke, here is the reverse function to decode a URL Query string.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
CREATE FUNCTION ufsURLDecoded ( @String VARCHAR(MAX) ) RETURNS VARCHAR(MAX) BEGIN SELECT @string= REPLACE(@string,escapeString, TheCharacter) FROM (SELECT [escapeString]=SUBSTRING(@string,number,3), [theCharacter]=CHAR( (CHARINDEX( SUBSTRING(@string,number+1, 1), '0123456789ABCDEF')-1)*16 +CHARINDEX( SUBSTRING(@string,number+2, 1), '0123456789ABCDEF')-1) FROM numbers WHERE number <= LEN(@String) AND SUBSTRING(@string,number,1) = '%' )f WHERE CHARINDEX(escapeString,@string)>0 RETURN @String END |
Here is a simple function that acts as an int to hex converter.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
CREATE FUNCTION ufsIntToHex ( @integer INT ) RETURNS VARCHAR(20) AS BEGIN DECLARE @result VARCHAR(80) SELECT @result='' IF @integer>0 SELECT @result=CASE WHEN @integer>0 THEN SUBSTRING('0123456789ABCDEF',(@integer%16)+1,1) ELSE '' END+@result, @integer=@integer/16 FROM numbers WHERE number<20 ELSE SELECT @result=0 RETURN '0x'+@result END |
It works this way:
1 |
SELECT dbo.ufsIntToHex(1024) |
And what about TSQL that converts hex to int?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
CREATE FUNCTION ufiHexToInt ( @HexString VARCHAR(50) ) RETURNS INT AS BEGIN DECLARE @result INT SELECT @result=0 SELECT @Result=(@result*16)+ (CHARINDEX(SUBSTRING(@HexString,number,1),'0123456789ABCDEF')-1) FROM numbers WHERE number<=LEN(@HexString) AND SUBSTRING(@HexString,number,1) LIKE '[0-9ABCDEF]' RETURN @Result END |
And now we can test it out:
1 2 3 |
SELECT COUNT(*) FROM numbers WHERE dbo.ufiHexToInt(dbo.ufsIntToHex(number))<>number |
Substituting values into a string
Next, we have a function that uses these principles to do macro substitution of values into a string. It will work for replacing XHTML placeholders with a value, or producing error messages in a variety of languages. In fact, uses keep popping up for this sort of function. In this version, one can specify what strings are used for substitutions (the default is %1, %2, %3 etc,) and what you use as the delimiter of your list of macros and values.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
CREATE FUNCTION [dbo].[ufsSubstitute] ( @Template VARCHAR(8000), @macroList VARCHAR(8000), @valueList VARCHAR(8000), @Delimiter VARCHAR(255) ) RETURNS VARCHAR(8000) AS BEGIN DECLARE @macros TABLE (MyID INT IDENTITY(1,1),variable VARCHAR(80)) DECLARE @values TABLE (MyID INT IDENTITY(1,1),[value] VARCHAR(8000)) --get all the variables INSERT INTO @macros (variable) SELECT SUBSTRING(@MacroList+@Delimiter, number, CHARINDEX(@Delimiter, @MacroList+@Delimiter, number) - number) FROM Numbers WHERE number <= LEN(@MacroList) AND SUBSTRING(@Delimiter + @MacroList, number, LEN(@delimiter)) = @Delimiter ORDER BY number INSERT INTO @values ([value]) SELECT SUBSTRING(@ValueList+@Delimiter, number, CHARINDEX(@Delimiter, @ValueList+@Delimiter, number) - number) FROM Numbers WHERE number <= LEN(@ValueList) AND SUBSTRING(@Delimiter + @ValueList, number, LEN(@delimiter)) = @Delimiter ORDER BY number SELECT @Template= REPLACE(@Template,COALESCE(variable, '%'+CAST(v.MyID AS VARCHAR)), [value]) FROM @values v LEFT OUTER JOIN @macros m ON v.MyID=m.MyID WHERE CHARINDEX(COALESCE(variable,'%'+CAST(v.MyID AS VARCHAR)) ,@Template)>0 RETURN (@Template) END |
There are several ways that we can use this routine in practical applications. Try out these and see what happens!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
SELECT dbo.ufsSubstitute (NULL,NULL,'',',') SELECT dbo.ufsSubstitute ('','','',',') SELECT dbo.ufsSubstitute ( ' views','','6',',') SELECT dbo.ufsSubstitute (' Dear $1 $2, It has come to our attention that your $3 account is $4 to the extent of £$5. Please phone our adviser, $6 $7 on $8 who will inform you of the various actions that need to be taken', '$1,$2,$3,$4,$5,$6,$7,$8', 'Mrs,Prism,current,overdrawn,5678,Mr,Grabbitas,04585 725938', ',') SELECT dbo.ufsSubstitute ('To @Destination; Report dated @Date The @table table is now @rows long. please @action' ,'@Destination|@Date|@Table|@rows|@action', 'Phil Factor|12 Apr 2007|Log|1273980|truncate it at once','|') SELECT dbo.ufsSubstitute ( 'I thought that your present of a %1 was %2. Thank you very much. The %1 will come in handy for %3' ,'' ,'trowel|absolutely wonderful|gardening','|') |
Extracting individual words from a string into a table
One can do rather cleverer things than this. For example, one can extract all the words from a string into a table, a row for each
word.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
CREATE FUNCTION [dbo].[uftWords] ( @String VARCHAR(8000) ) RETURNS @Results TABLE ( SeqNo INT IDENTITY(1, 1), Word VARCHAR(8000) ) AS BEGIN INSERT INTO @Results(word) SELECT [word]=LEFT(RIGHT(@string,number), PATINDEX('%[^a-z]%',RIGHT(@string,number)+' ')-1) FROM Numbers WHERE number <= LEN(@String) AND PATINDEX('%[a-z]%',RIGHT(@string,number))=1 AND PATINDEX('%[^a-z]%',RIGHT(' '+@string,number+1))=1 ORDER BY number DESC RETURN END --and you can get the words (we use it for inversion indexes) SELECT * FROM dbo.uftWords ('One can do rather cleverer things than this. <>! For example, one can extract all the words from a string into a table, a row for each word.') --or a word count SELECT COUNT(*) FROM dbo.uftWords ('It is extraordinary how easy it is to get a wordcount using this ') |
Extracting all the numbers in a string into a table
Even more useful than this is a function that picks out all the numbers from a string into a table. You can therefore easily pick out the third or fourth string simply, because the table has the order as well as the number itself. Were it not for the unary minus operator, this would have been a delightfully simple function.
If you are using this routine, you’ll want to cast these numbers into the number type of your choice. We supply them as strings.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
CREATE FUNCTION [dbo].[uftNumbers] ( @String VARCHAR(8000) ) RETURNS @Results TABLE ( SeqNo INT IDENTITY(1, 1), number VARCHAR(100) ) AS BEGIN INSERT INTO @Results(number) SELECT CASE LEFT(RIGHT(' '+@String,number),1) WHEN '-' THEN '-' ELSE '' END+ SUBSTRING( RIGHT(@String,number-1),1, PATINDEX('%[^0-9.]%', RIGHT(' '+@String,number-1)+' ')-1) FROM Numbers WHERE number <= LEN(REPLACE(@String,' ','!'))+1 AND PATINDEX('%[^0-9.][0-9]%',RIGHT(' ' +@String,number))=1 ORDER BY number DESC RETURN END --So we try out a few examples just to see. It removes anything --that doesn't look kile a number SELECT * FROM dbo.uftNumbers('there are numbers like 34.56, -56, 67.878, maybe34; possibly56, and a few others like <789023>') SELECT * FROM dbo.uftNumbers('23,87986,56.78,67,09,23,30') SELECT * FROM dbo.uftNumbers('') SELECT * FROM dbo.uftNumbers(' <div>') SELECT * FROM dbo.uftNumbers('there are numbers like 34.56, -56, 67.878, maybe34; possibly56, and a few others like <789023>') /* SeqNo number ----------- ------------ 1 34.56 2 -56 3 67.878 4 34 5 56 6 789023 (6 row(s) affected) */ |
Removing all text between delimiters
This is a handy little routine for looking at the strings in HTML code, but seems to earn its keep in a lot of other ways. You specify the opening and closing delimiter. At the moment, only single-character delimiters are allowed. Can anyone re-write it to allow multi-character delimiters?.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
CREATE FUNCTION [dbo].[ufsRemoveDelimited] ( @String VARCHAR(8000), @OpeningDelimiter CHAR(1), @ClosingDelimiter CHAR(1) ) RETURNS VARCHAR(8000) AS BEGIN DECLARE @newString VARCHAR(8000) IF @OpeningDelimiter = @ClosingDelimiter BEGIN RETURN NULL END IF @OpeningDelimiter+@ClosingDelimiter+@String IS NULL BEGIN RETURN @String END SELECT @NewString='' SELECT @newString =@newString +SUBSTRING(@String,number,1) FROM numbers WHERE number<=LEN (REPLACE(@string,' ','|')) AND CHARINDEX (@OpeningDelimiter,@string+@OpeningDelimiter,number) < CHARINDEX (@ClosingDelimiter,@string+' '+@closingDelimiter,number) AND number <> CHARINDEX (@OpeningDelimiter,@string,number) RETURN @NewString END |
So we can try it out with brackets:
1 2 |
SELECT dbo.ufsRemoveDelimited( 'this will appear(but not this),)will this?(','(',')') |
Or if you want to take out tags:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
SELECT dbo.ufsRemoveDelimited( 'this will appear <div>and this </div>and this','<','>') --or this SELECT dbo.ufsRemoveDelimited( '<?xml version="1.0" encoding="us-ascii"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>A poem</title> </head> <body> <div style="float left: width:300px;"> <h2> Weather forecast </h2> <p> The rain it raineth every day<br /> upon the just and unjust fellah<br /> but mainly on the just, because,<br /> the unjust pinched the just''s umbrella<br /> </p> </div> </body> </html> ','<','>') |
Scrabble Score
And as a slightly silly example of the sort of chore that crops up occasionally when analysing strings, character by character, here is a way of scoring a string for Scrabble, assuming it is all on ordinary squares!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
CREATE FUNCTION ufiScrabbleScore ( @String VARCHAR(100) ) RETURNS INT AS BEGIN DECLARE @Total INT SELECT @total=0 SELECT @Total=@total+ CASE WHEN SUBSTRING(@String,number,1) NOT LIKE '[a-z]' THEN NULL ELSE CAST(SUBSTRING('00000000001122223333347799', CHARINDEX(SUBSTRING(@String,number,1), 'EAIONRTLSUDGBCMPFHVWYKJXQZ') ,1) AS INT)+1 END FROM numbers WHERE number <=LEN(@String) RETURN @Total END |
And now we try it out!
1 2 |
SELECT dbo.ufiScrabbleScore('Quiz') --22 SELECT dbo.ufiScrabbleScore('Robyn') --10 |
Now we find out which are the highest scorers, assuming an illegal quantity of tiles. We use the WordList from Phil’s Blog on the Fireside Fun of Decapitations.
Don’t try running this without the WordList!
1 2 3 4 5 6 7 8 9 10 11 12 |
SELECT dbo.ufiScrabbleScore(word),word FROM wordlist ORDER BY dbo.ufiScrabbleScore(word) DESC /* 49 razzamatazz 48 razzmatazz 45 pizzazz 43 quizzically 39 squeezeboxes 38 quizzical 38 psychoanalyzing 37 psychoanalyzed 37 squeezebox ..etc.... |
As well as slicing and dicing strings, once one has one’s helper table,
suddenly time-interval based reporting becomes much easier.
Moving averages
How about moving averages? Here is a simple select
statement that gives you the moving average (over a week) of the number of log entries for every day for a year. This can be adapted to give weighted and exponential moving averages over arbitrary time periods. You use this technique for ironing out ‘noise’ from a graph in order to accentuate the underlying trend.
To execute this, you will need a table to try it on.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
DROP TABLE #cb CREATE TABLE #cb (insertionDate datetime)--quantity --for once, we need to iterate to shuffle the pack DECLARE @ii INT SELECT @ii=0 WHILE @ii<20000 BEGIN INSERT INTO #cb(insertionDate) SELECT DATEADD(Hour,RAND()*8760,'1 jan 2006') SELECT @ii=@ii+1 END --and put an index on it CREATE CLUSTERED INDEX idxInsertionDate ON #cb(insertionDate) |
Now we can try out a moving average!
1 2 3 4 5 6 7 8 9 10 11 12 13 |
SELECT start,[running average]=COUNT(*)/7 FROM ( SELECT [order]=number, [start]=DATEADD(DAY,number,'1 Jan 2006'), [end]=DATEADD(DAY,number+7,'1 Jan 2006') FROM numbers WHERE DATEADD(DAY,number,'1 Jan 2006') BETWEEN '1 Jan 2006' AND '1 Jan 2007')f LEFT OUTER JOIN [#cb] ON [#cb].insertionDate BETWEEN f.start AND f.[end] GROUP BY start ORDER BY start |
Getting the ‘Week beginning’ date in a table
Here is a UDF that lists all the Mondays (or whatever you want) between two dates.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
CREATE FUNCTION uftDatesOfWeekday ( @Weekday VARCHAR(10), @StartDate datetime, @EndDate DateTime ) RETURNS TABLE AS RETURN ( SELECT [start]=DATEADD(DAY,number-1,@StartDate) FROM numbers WHERE DATEADD(DAY,number-1,@StartDate)< @EndDate AND DATENAME(dw,DATEADD(DAY,number-1,@StartDate))=@Weekday ) /* And you can try it out by finding how many mondays there are between the first of January and 1st june this year. */ SELECT * FROM dbo.uftDatesOfWeekday('monday','1 Jan 2007','1 Jun 2007') |
Number of Working Days between two dates
Or, how about a UDF that tells you the number of working days between two dates? (you can alter it if Saturday and Sunday are not your days off!)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
CREATE FUNCTION ufiWorkingDays ( @StartDate datetime, @EndDate DateTime ) RETURNS INT AS BEGIN RETURN (SELECT COUNT(*) FROM numbers WHERE DATEADD(DAY,number-1,@StartDate)< @EndDate AND DATENAME(dw,DATEADD(DAY,number-1,@StartDate)) NOT IN ('saturday','sunday')) END |
So how many working days until Christmas?
1 2 |
SELECT dbo.ufiWorkingDays( GETDATE(),'25 Dec '+DATENAME(YEAR,GETDATE())) |
We can go on for ever with example of using a numeric Helper table but we won’t because most of you will have wandered off even before you reach this point. We hope that you’ll now take over and create some more examples, try them out against iterative solutions that do the same thing. We guarantee you’ll be pleased with the result!
See also other Workbenches at Simple-Talk
Robyn Page’s SQL Server DATE/TIME Workbench, Robyn Page
Date calculation and formatting in SQL Server can be surprisingly tricky. Robyn Page’s “hands-on” workbench will lead you through the minefield.
Robyn Page’s SQL Server String Manipulation Workbench, Robyn Page
String searching and manipulation in SQL Server can be error-prone and tedious… Unless you’re armed with the techniques described in Robyn’s string manipulation workbench…
SQL Server Error Handling Workbench, Grant Fritchey
Grant Fritchey steps into the workbench arena, with an example-fuelled examination of catching and gracefully handling errors in SQL 2000 and 2005, including worked examples of the new TRY..CATCH
capabilities.
Robyn Page’s SQL Server Cursor Workbench, Robyn Page
The topic of cursors is the ultimate “hot potato” in the world of SQL Server. Everyone has a view on when they should and mainly should not be used. By example and testing Robyn Page proves that, when handled with care, cursors are not necessarily a “bad thing”.
Robyn Page’s SQL Server Data Validation Workbench, Robyn Page
Robyn Page provides essential techniques for ensuring the validity of the data being entered into your SQL Server tables.
Robyn Page’s Excel Workbench, Robyn Page and Phil Factor
The need to produce Excel reports from SQL Server is very common. Here, Robyn Page and Phil Factor present practical techniques for creating and manipulating Excel spreadsheets from SQL Server, using linked servers and T-SQL. The pièce de résistance is a stored procedure that uses OLE Automation…
Robyn Page’s SQL Server Security Workbench, Robyn Page and Phil Factor
Robyn Page and Phil Factor present practical T-SQL techniques for controlling access to sensitive information within the database, and preventing malicious SQL injection attacks.
Load comments