Writing CLR-based functions and procedures for SQL Server isn’t necessarily a complex task. It doesn’t even need Visual Studio. It is possible to get things up and running very quickly. For this example, I’ve chosen to revisit the hoary old example of the RegEx package. There is nothing heroic about doing so since it is a tiny smear of code that merely accesses the .NET Framework regular expression engine. I decided on the CLR RegEx package because I’ve never found a complete example: The sample code that is readily available generally leaves something important out.
I decided to use VB.NET, so as to provide something fresh, and thought it might be interesting to do it without using Visual Studio at all, just to prove it could be done with just SQLExpress. For the SQL programmer, there is no need to even look at the source code as it can be installed merely from the CREATE ASSEMBLY script which I’ve included. The assembly-creation script can be executed in the same way as any SQL.
The CLR assembly
The Visual Basic Source is here. It was too long to place in the article itself. You don’t need it if you just to try out the package, only if you want to modify or extend it.
This assembly gives you a reasonably full palette of RegEx functions within SQL Server such as…
- RegExReplace
- replaces the substring matching the expression with a result string that can include the matches. This works the same way as the SQL Server REPLACE function, except it takes a full RegEx instead of the LIKE expression, and allows substitution of the matches in the replacement string using $0, $1…$n.
- RegExReplaceX
- like RegExReplace, except that the parameters are in the more normal order of pattern, input, replacement, options. The options parameter allows the RegEx to be used with a number of different options
- RegExSplit
- split a string into a table basing each row-delimiter on the substring that matches the expression, returns the table.
- RegExMatch
- returns the first substring that matches the expression
- RegExIsMatch
- returns non-zero if there is a match of the RegEx within the string
- RegExMatches
- returns a table of all the substring that match the expression within the string
- RegExEscape
- returns the input string with all characters that could be interpreted as part of a RegEx expression ‘escaped’. This allows strings to be inserted into RegExes
- RegExIndex
- returns the index into the string of any substring that matches the RegEx. It works the same way as PatIndex
- RegExOptionEnumeration
- returns a bitmap of the options you select that controls the way that the RegEx is used.
Although this looks like a comprehensive list, it isn’t complete. The NET RegEx package allows you to specify the maximum number of matches or splits to be returned. I didn’t include it because it seemed less useful, and it would be a simple feature to add. I also avoided returning capturing groups, to RegExMatch and RegExMatches. This would have meant returning a de-normalized table. There are uses for it but the source of such a CLR routine it is already in the SQL Server sample code if you need it.
SQL Server Assemblies can be written in any language that can produce a .NET assembly, though C# and VB .net are most often used. They are loaded into SQL Server but are not directly visible to the database developer. Once they are in SQL Server, they can be easily distributed, saved, and loaded into other servers using TSQL. Once you have them installed in SQL Server, you can get to the source very easily using .NET Reflector with an add-in, so it becomes very easy to modify them. You have no need for Visual Studio either. You can merely use the tools that are already provided within the .NET Framework to create and alter the code for these assemblies.
Creating and using a simple CLR in SQL Server
We’ll create a simple RegEx CLR function without Visual Studio and SSMS just to prove that it isn’t at all complex. Let’s take a very simple example first. We then have something we can expand on once we have it working. Here is a simple, but useful, implementation of the RegEx.IsMatch function from the .NET Regular Expression (RegEx) class. First, a bit of Visual Basic.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
Imports System Imports System.Data.Sql Imports Microsoft.SqlServer.Server Imports System.Data.SqlTypes Imports System.Runtime.InteropServices Imports System.Text.RegularExpressions Imports System.Collections 'the IEnumerable interface is here Namespace SimpleTalk.Phil.Factor Public Class RegularExpressionFunctions 'RegExIsMatch function <SqlFunction(IsDeterministic:=True, IsPrecise:=True)> _ Public Shared Function RegExIsMatch( _ ByVal pattern As SqlString, _ ByVal input As SqlString, _ ByVal Options As SqlInt32) As SqlBoolean If (input.IsNull OrElse pattern.IsNull) Then Return SqlBoolean.False End If Dim RegExOption As New System.Text.RegularExpressions.RegExOptions RegExOption = Options Return RegEx.IsMatch(input.Value, pattern.Value, RegExOption) End Function End Class ' End Namespace |
This just used the IsMatch method of the RegEx . It indicates whether the regular expression finds a match in the input string, using the regular expression specified in the pattern parameter and the matching options supplied in the options parameter. The real script is much longer and can be downloaded at the bottom of the article.
We simply create an assembly from the command line (just put it into your programmer’s editor as a ‘tool’- I did all the development work for this project in EditPad Pro.)
1 |
%PATH_TO_FRAMEWORK%\vbc.exe /t:library "%FILE%" |
Where “%FILE%” is the full path and name of the file, and %PATH_TO_FRAMEWORK%\ is the path to the framework you are using.
Then a simple bit of SQL will have you up and running (this can be run by SQLCMD) Remember to substitute the file and path of the assembly for %FILE%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
sp_configure 'clr enabled', 1 RECONFIGURE WITH OVERRIDE IF EXISTS ( SELECT 1 FROM sys.objects WHERE object_id = OBJECT_ID(N'dbo.RegExIsMatch') ) DROP FUNCTION dbo.RegExIsMatch go IF EXISTS ( SELECT 1 FROM sys.assemblies asms WHERE asms.name = N'RegExFunction ' ) DROP ASSEMBLY [RegExFunction] CREATE ASSEMBLY RegExFunction FROM '%FILE%' GO CREATE FUNCTION RegExIsMatch ( @Pattern NVARCHAR(4000), @Input NVARCHAR(MAX), @Options int ) RETURNS BIT AS EXTERNAL NAME RegExFunction.[SimpleTalk.Phil.Factor.RegularExpressionFunctions].RegExIsMatch GO --a few tests ---Is this card a valid credit card? SELECT dbo.RegExIsMatch ('^(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|6(?:011|5[0-9][0-9])[0-9]{12}|3[47][0-9]{13}|3(?:0[0-5]|[68][0-9])[0-9]{11}|(?:2131|1800|35\d{3})\d{11})$','4241825283987487',1) --is there a number in this string SELECT dbo.RegExIsMatch( '\d','there is 1 thing I hate',1) --Verifies number Returns 1 DECLARE @pattern VARCHAR(255) SELECT @pattern ='[a-zA-Z0-9]\d{2}[a-zA-Z0-9](-\d{3}){2}[A-Za-z0-9]' SELECT dbo.RegExIsMatch (@pattern, '1298-673-4192',1), dbo.RegExIsMatch (@pattern,'A08Z-931-468A',1), dbo.RegExIsMatch (@pattern,'[A90-123-129X',1), dbo.RegExIsMatch (@pattern,'12345-KKA-1230',1), dbo.RegExIsMatch (@pattern,'0919-2893-1256',1) |
Running the full project
So now we are ready for the full project. You can either compile the DLL from the source code, or you can execute the CREATE ASSEMBLY from the build script if you are just interested in taking the code for a trip around the block.
To compile the full project, download the VB source and compile it using the following command in the command-line
1 |
%PATH_TO_FRAMEWORK%\vbc.exe /t:library "%FILE%" |
Where “%FILE%” is the full path and name of the file, and %PATH_TO_FRAMEWORK%\ is the path to the framework you are using.
Then, from SSMS or SQLCMD run the …
InstallRegExSample.sql file,
…which can be downloaded from the speech bubble
If you want to avoid compiling the DLL from the VB source, or you want to run the CLR assembly on a remote computer, run the build script,( which can be downloaded from the speech bubble)
followed by, from SSMS or SQLCMD, the …
InstallRegExAssembly.sql file,
…which can be downloaded from the speech bubble
You can view the two files, the InstallRegExSample.sql and InstallRegExAssembly.sql files. Amongst other things, they will tell you what parameters are passed, and in what order, for every SQLCLR function.
A Trip Around the Block.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
/* Probably the simplest routines for a SQL programmer is the RegExReplace and the RegExIndex. This is because they work the same way as does the REPLACE() and PatIndex() */ Select Replace ('this is a revolting view','revolting', 'stunning') Select dbo.RegExReplace('this is a revolting view','revolting', 'stunning') /* Both give the same result. However, with RegExReplace, we can use any RegEx pattern instead of a string We'll create an insert script from a comma-delimited list*/ select dbo.RegExReplace( 'Sprocket,6.26,Paris widget,2.476,London Bucket,8.25,New Orleans', '^("[^"\r\n]*"|[^,\r\n]*),("[^"\r\n]*"|[^,\r\n]*),("[^"\r\n]*"|[^,\r\n]*)', 'insert into MyTable select ''$1'',$2,''$3'' --$0' ) /* which gives the result... insert into MyTable select 'Sprocket',6.26,'Paris' --Sprocket,6.26,Paris insert into MyTable select 'widget',2.476,'London' --widget,2.476,London insert into MyTable select 'Bucket',8.25,'New Orleans' --Bucket,8.25,New Orleans We have a powerful device here. We can put the string that matches the pattern into the result string using the $1, $2 $3 ... convention. We can do exactly the same thing with RegExReplacex, which gives us a finer control over how the RegEx is executed. */ select dbo.RegExReplacex('^("[^"\r\n]*"|[^,\r\n]*),("[^"\r\n]*"|[^,\r\n]*),("[^"\r\n]*"|[^,\r\n]*)', 'Sprocket,6.26,Paris widget,2.476,London Bucket,8.25,New Orleans', 'insert into MyTable select ''$1'',$2,''$3'' --$0', dbo.RegExOptionEnumeration(1,1,0,0,0,0,0,0,0) ) --remove repeated words in text SELECT dbo.RegExReplace('Sometimes I cant help help help stuttering','\b(\w+)(?:\s+\1\b)+', '$1') --find a #comment and add a TSQL -- SELECT dbo.RegExReplace(' # this is a comment first,second,third,fourth','#.*','--$&',1) --replace a url with an HTML anchor SELECT dbo.RegExReplacex( '\b(https?|ftp|file)://([-A-Z0-9+&@#/%?=~_|!:,.;]*[-A-Z0-9+&@#/%=~_|])', 'There is this amazing site at http://www.simple-talk.com', '<a href="$2">$2</a>',1) --strip all HTML elements out of a string SELECT dbo.RegExReplace('<a href="http://www.simple-talk.com">Simle Talk is wonderful</a><!--This is a comment --> we all love it','<(?:[^>''"]*|([''"]).*?\1)*>', '') /*But there are a whole lot of other things we can do. Let's return the first number in a string*/ SELECT dbo.RegExMatch('\d+', ' somewhere there is a number 4567 and then more ',1) -- 4567 --escape a literal string so it can be part of a regular expression SELECT dbo.RegExEscape(' I might need to search for [*\\\*]') /* this would become useful if you wanted to insert a literal string into a RegEx. How about, for example, you want to search a string for a substring where two words are near each other (at most four words apart) in either order.*/ Declare @String1 varchar(80), @String2 varchar(80), @RegEx Varchar(200) Select @String1= dbo.RegExEscape('often'), @String2= dbo.RegExEscape('wrong') Select @RegEx= '\b(?:'+@String1+'(?:\W+\w+){0,4}?\W+'+@String2+'|'+@String2+'(?:\W+\w+){0,4}?\W'+@String1+')\b' SELECT dbo.RegExMatch(@RegEx,'A RegEx expression can often be wrong but it is usually possible to put it right.',1) --split a string into words SELECT * FROM dbo.RegExSplit('\W+','this is an exciting regular expression',1) --Find if the words 'Simple' and 'Talk' are within three words distant Select dbo.RegExIsMatch('\bsimple(?:\W+\w+){0,3}?\W+talk\b', 'It is simple to say that there is talk of a wonderful website called Simple Talk',1) --Find the words 'Simple' and 'Talk' within three words distant Select dbo.RegExIndex('\bsimple(?:\W+\w+){0,3}?\W+talk\b', 'It is simple to say that there is talk of a wonderful website called Simple Talk',1) /* we can return a table of every repeating word in a string (along with the index intyo the string and the length of the match) */ select * from RegExMatches( '\b(\w+)\s+\1\b',--match any repeated word 'i have had my ups and downs but wotthehell wotthehell yesterday sceptres and crowns fried oysters and velvet gowns and today i herd with bums but wotthehell wotthehell i wake the world from sleep as i caper and sing and leap when i sing my wild free tune wotthehell wotthehell under the blear eyed moon i am pelted with cast off shoon but wotthehell wotthehell',3) --get valid dates and convert to SQL Server format SELECT DISTINCT CONVERT(DATETIME,match,103) FROM dbo.RegExMatches ('\b(0?[1-9]|[12][0-9]|3[01])[- /.](0?[1-9]|1[012])[- /.](19|20?[0-9]{2})\b',' 12/2/2006 12:30 <> 13/2/2007 32/3/2007 2-4-2007 25.8.2007 1/1/2005 34/2/2104 2/5/2006',1) /* There are a number of ways we can use the RegExIsMatch function. Here are a few simple examples */ --is there a repeated word? SELECT dbo.RegExIsMatch('\b(\w+)\s+\1\b','this has has been repeated',1)--1 SELECT dbo.RegExIsMatch('\b(\w+)\s+\1\b','this has not been repeated',1)--0 --Is the word 'for' and 'last' up to 2 words apart) SELECT dbo.RegExIsMatch('\bfor(?:\W+\w+){0,2}?\W+last\b', 'You have failed me for the last time, Admiral',1)--1 SELECT dbo.RegExIsMatch('\bfor(?:\W+\w+){1,2}?\W+last\b', 'You have failed me for what could be the last time, Admiral',1)--0 --is this likely to be a valid credit card? SELECT dbo.RegExIsMatch('^(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|6011[0-9]{12}|3(?:0 [0-5]|[68][0-9])[0-9]{11}|3[47][0-9]{13}|(?:2131|1800)\d{11})$','4953129482924435',1) --IS this a valid ZIP code SELECT dbo.RegExIsMatch('^[0-9]{5,5}([- ]?[0-9]{4,4})?$','02115-4653',1) --is this a valid Postcode? SELECT dbo.RegExIsMatch('^([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha -hJ-Yj-y][0-9]{1,2})|(([A-Za-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9]?[A-Za-z]))) ) {0,1}[0-9][A-Za-z]{2})$','RG35 2AQ',1) --is this a valid European date? SELECT dbo.RegExIsMatch('^((((31\/(0?[13578]|1[02]))|((29|30)\/(0?[1,3-9]|1[0-2])))\/(1[ 6-9]|[2-9]\d)?\d{2})|(29\/0?2\/(((1[6-9]|[2-9]\d)?(0[48]|[2468][048]|[13579][26])|((16 |[2468][048]|[3579][26])00))))|(0?[1-9]|1\d|2[0-8])\/((0?[1-9])|(1[0-2]))\/((1[6-9]|[2 -9]\d)?\d{2})) (20|21|22|23|[0-1]?\d):[0-5]?\d:[0-5]?\d$','12/12/2007 20:15:27',1) --is this a valid currency value (dollar)? SELECT dbo.RegExIsMatch('^\$(\d{1,3}(\,\d{3})*|(\d+))(\.\d{2})?$','$34,000.00',1) --is this a valid currency value (Sterling)? SELECT dbo.RegExIsMatch('^\£(\d{1,3}(\,\d{3})*|(\d+))(\.\d{2})?$', '£34,000.00',1) --A valid email address? SELECT dbo.RegExIsMatch('^(([a-zA-Z0-9!#\$%\^&\*\{\}''`\+=-_\|/\?]+(\.[a-zA-Z0-9!#\$%\^& \*\{\}''`\+=-_\|/\?]+)*){1,64}@(([A-Za-z0-9]+[A-Za-z0-9-_]*){1,63}\.)*(([A-Za-z0-9]+[A -Za-z0-9-_]*){3,63}\.)+([A-Za-z0-9]{2,4}\.?)+){1,255}$','Phil.Factor@simple-Talk.com',1) |
The RegEx Options
RegEx functions can work in different ways by setting the option bitmap which is passed to the various functions. The most important of these is the flag that tells the RegEx to do case-insensitive searches. This project has a special function that translates these options into a bitmap. An example of its use was given in the ‘Trip around the block’, above
1 2 3 4 5 6 7 |
select dbo.RegExReplacex('^("[^"\r\n]*"|[^,\r\n]*),("[^"\r\n]*"|[^,\r\n]*),("[^"\r\n]*"|[^,\r\n]*)', 'Sprocket,6.26,Paris widget,2.476,London Bucket,8.25,New Orleans', 'insert into MyTable select ''$1'',$2,''$3'' --$0', dbo.RegExOptionEnumeration(1,1,0,0,0,0,0,0,0) ) |
And the function that implements it it is..
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
CREATE FUNCTION RegExOptionEnumeration ( @IgnoreCase bit, @MultiLine bit, @ExplicitCapture bit, @Compiled bit, @SingleLine bit, @IgnorePatternWhitespace bit, @RightToLeft bit, @ECMAScript bit, @CultureInvariant bit ) returns int AS EXTERNAL NAME RegExFunction.[SimpleTalk.Phil.Factor.RegularExpressionFunctions].RegExOptionEnumeration go |
These flags, in order, are:
- IgnoreCase
- By default, the regular expression is case sensitive. In the examples we used, we set it to ignore case
- The Multiline property
- The caret and dollar only match at the very start and very end of the subject string by default. If your subject string consists of multiple lines separated by line breaks, you can make the caret and dollar match at the start and the end of those lines by setting the Multiline property to True. (there is no option to make the dot match line break characters).
- Explicit Capture
- Specifies that the only valid captures are explicitly named or numbered groups of the form (?<name>). This allows unnamed parentheses to act as noncapturing groups without the syntactic clumsiness of the expression (?:).
- Compiled
- Specifies that the regular expression is compiled to an assembly. This yields faster execution but increases startup time.
- Singleline
- Specifies single-line mode. Changes the meaning of the dot (.) so it matches every character (instead of every character except\n).
- IgnorePatternWhitespace
- Eliminates unescaped white space from the pattern and enables comments marked with #.
- RightToLeft
- Specifies that the search will be from right to left instead of from left to right
- ECMAScript
- Enables ECMAScript-compliant behaviour for the expression. This flag can be used only in conjunction with the IgnoreCase, Multiline, and Compiled flags. The use of this flag with any other flags results in an exception.
- CultureInvariant
- Specifies that cultural differences in language is ignored
So what next?
I’m hoping that this project can be appreciated on several levels. Firstly, it gives what I hope is a working implementation of a SQLCLR RegEx library of functions that is immediately useful. Secondly, I want to show how easy it is to develop CLR routines in your favourite .NET language. Thirdly, it would be nice to think that it could be developed and extended to support more complex RegEx operations, such as those involving matching groups .
If you are going to make extensive use of the package, then you’ll find everything you need to know in the
- function definitions within the InstallRegex*.SQL files
- Documentation for the System.Text.RegularExpressions Namespace
- VB Source of the CLR assembly that accompanies the article.
I’d be fascinated to hear of any improvements or bugfixes you make.
Load comments