CLR Assembly RegEx Functions for SQL Server by Example

Phil Factor presents a simple CLR Assembly to allow SQL Server users to access all the powerful RegEx library methods in .NET. In the spirit of 'get something up and running', he takes a hands-on approach and demonstrates that you needn't be a C# hotshot or Visual Studio expert to develop a CLR assembly for SQL Server

Writing CLR-based functions and procedures for SQL Server isn’t necessarily a complex task. It doesn’t even need Visual Studio. It is possible to get things up and running very quickly.  For this example, I’ve chosen to revisit the hoary old example of the RegEx package. There is nothing heroic about doing so since it is a tiny smear of code that merely accesses the .NET Framework regular expression engine.  I decided on the CLR RegEx package because I’ve never found a complete example: The sample code that is readily available generally leaves something important out.

I decided to use VB.NET, so as to provide something fresh, and thought it might be interesting to do it without using Visual Studio at all, just to prove it could be done with just SQLExpress.  For the SQL programmer, there is no need to even look at the source code as it can be installed merely from the CREATE ASSEMBLY script which I’ve included. The assembly-creation script can be executed in the same way as any SQL.

The CLR assembly

The Visual Basic Source is here. It was too long to place in the article itself. You don’t need it if you just to try out the package, only if you want to modify or extend it.

This assembly gives you a reasonably full palette of RegEx functions within SQL Server such as…

RegExReplace 
replaces the substring matching the expression with a result string that can include the matches. This works the same way as the SQL Server REPLACE function, except it takes a full RegEx instead of the LIKE expression, and allows substitution of the matches in the replacement string using $0, $1…$n.
RegExReplaceX 
like RegExReplace, except that the parameters are in the more normal order of  pattern, input, replacement, options. The options parameter allows the RegEx to be used with a number of different options
RegExSplit
split a string into a table basing each row-delimiter on the substring that matches the expression, returns the table.
RegExMatch
returns the first substring that matches the expression
RegExIsMatch
returns non-zero if there is a match of the RegEx within the string
RegExMatches  
returns a table of all the substring that match the expression within the string
RegExEscape
returns the input string with all characters that could be interpreted as part of a RegEx expression ‘escaped’. This allows strings to be inserted into RegExes
RegExIndex
returns the index into the string of any substring that matches the RegEx. It works the same way as PatIndex
RegExOptionEnumeration
returns a bitmap of the options you select that controls the way that the RegEx is used.

Although this looks like a comprehensive list, it isn’t complete. The NET RegEx package allows you to specify the maximum number of matches or splits to be returned. I didn’t include it because it seemed less useful, and it would be a simple feature to add. I also avoided returning capturing groups, to RegExMatch and RegExMatches. This would have meant returning a de-normalized table. There are uses for it but the source of such a CLR routine it is already in the SQL Server  sample code if you need it.

SQL Server Assemblies can be written in any language that can produce a .NET assembly, though C# and VB .net are most often used.  They are loaded into SQL Server but are not directly visible to the database developer. Once they are in SQL Server, they can be easily distributed, saved, and loaded into other servers using TSQL.  Once you have them installed in SQL Server, you can get to the source very easily using .NET Reflector with an add-in, so it becomes very easy to modify them. You have no need for Visual Studio either. You can merely use the tools that are already provided within the .NET Framework to create and alter the code for these assemblies.

Creating and using a simple CLR in SQL Server

We’ll create a simple RegEx CLR function without Visual Studio and SSMS just to prove that it isn’t at all complex. Let’s take a very simple example first. We then have something we can expand on once we have it working. Here is a simple, but  useful, implementation of the  RegEx.IsMatch function from the .NET Regular Expression (RegEx) class. First, a bit of Visual Basic.

This just used the IsMatch method of the RegEx . It indicates whether the regular expression finds a match in the input string, using the regular expression specified in the pattern parameter and the matching options supplied in the options parameter. The real script is much longer and can be downloaded at the bottom of the article.

We simply create an assembly from the command line (just put it into your programmer’s  editor as a ‘tool’-  I did all the development work for this project in EditPad Pro.)

Where “%FILE%” is the full path and name of the file, and %PATH_TO_FRAMEWORK%\ is the path to the framework you are using.

Then a simple bit of SQL will have you up and running (this can be run by SQLCMD) Remember to substitute the file and path of the assembly for %FILE%

Running the full project

So now we are ready for the full project. You can either compile the DLL from the source code, or you can execute the CREATE ASSEMBLY from the build script if you are just interested in taking the code for a trip around the block.

To compile the full project, download the VB source and compile it using the following command in the command-line

Where “%FILE%” is the full path and name of the file, and %PATH_TO_FRAMEWORK%\ is the path to the framework you are using.

Then, from SSMS or SQLCMD run the …
        InstallRegExSample.sql file,
…which can be downloaded from the speech bubble

If you want to avoid compiling the DLL from the VB source, or you want to run the CLR assembly on a remote computer, run the build script,( which can be downloaded from the speech bubble)

followed by, from SSMS or SQLCMD, the …
        InstallRegExAssembly.sql file,
…which can be downloaded from the speech bubble

You can view the two files, the  InstallRegExSample.sql  and  InstallRegExAssembly.sql files. Amongst other things, they will tell you what parameters are passed, and in what order, for every SQLCLR function.

A Trip Around the Block.

The RegEx Options

RegEx functions can work in different ways by setting the option bitmap which is passed to the various functions. The most important of these is the flag that tells the RegEx to do case-insensitive searches. This project has a special function that translates these options into a bitmap. An example of its use was given in the ‘Trip around the block’, above

And the function that implements it  it is..

These flags, in order, are:

IgnoreCase
By default, the regular expression is case sensitive. In the examples we used, we set it to ignore case

The Multiline property
The caret and dollar only match at the very start and very end of the subject string by default. If your subject string consists of multiple lines separated by line breaks, you can make the caret and dollar match at the start and the end of those lines by setting the Multiline property to True. (there is no option to make the dot match line break characters).

Explicit Capture
Specifies that the only valid captures are explicitly named or numbered groups of the form (?<name>). This allows unnamed parentheses to act as noncapturing groups without the syntactic clumsiness of the expression (?:).

Compiled
Specifies that the regular expression is compiled to an assembly. This yields faster execution but increases startup time.

Singleline
Specifies single-line mode. Changes the meaning of the dot (.) so it matches every character (instead of every character except\n).

IgnorePatternWhitespace
Eliminates unescaped white space from the pattern and enables comments marked with #.
RightToLeft
Specifies that the search will be from right to left instead of from left to right
ECMAScript
Enables ECMAScript-compliant behaviour for the expression. This flag can be used only in conjunction with the IgnoreCase, Multiline, and Compiled flags. The use of this flag with any other flags results in an exception.
CultureInvariant
Specifies that cultural differences in language is ignored

 

So what next?

I’m hoping that this project can be appreciated on several levels. Firstly, it gives what I hope is a working implementation of a SQLCLR RegEx library of functions that is immediately useful. Secondly, I want to show how easy it is to develop  CLR routines in your favourite .NET language. Thirdly, it would be nice to think that it could be developed and extended to  support more complex RegEx operations, such as those involving matching groups .

If you are going to make extensive use of the package, then you’ll find everything you need to know in the

I’d be fascinated to hear of any improvements or bugfixes you make.