{"id":327,"date":"2007-11-27T00:00:00","date_gmt":"2007-11-27T00:00:00","guid":{"rendered":"https:\/\/test.simple-talk.com\/uncategorized\/tsql-regular-expression-workbench\/"},"modified":"2021-09-29T16:22:14","modified_gmt":"2021-09-29T16:22:14","slug":"tsql-regular-expression-workbench","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/t-sql-programming-sql-server\/tsql-regular-expression-workbench\/","title":{"rendered":"TSQL Regular Expression Workbench"},"content":{"rendered":"<p>This Workbench is about using Regular expressions with SQL Server via TSQL. It doesn&#8217;t even attempt to teach how regular expressions work or how to pull them together. There are plenty of such resources on the Web. The aim is to demonstrate a few possibilities and try to persuade you to experiment with them if you don&#8217;t already use Regex with SQL Server.<\/p>\n<p>We suggest that, if you are an ordinary mortal like Phil or I, without special powers, you should use an application such as <a href=\"http:\/\/www.regexbuddy.com\/\">RegexBuddy <\/a>to form, edit and interpret Regular expressions. It makes learning them a lot easier. In order that people with access only to SQL Server 2000 can use the workbench, we&#8217;ll use OLE in the examples, but they are readily adapted to CLR. As always, the source code is available to download at the bottom of the article.<\/p>\n<h3>Contents<\/h3>\n<ol id=\"contents\">\n<li><a href=\"#first\">Introduction<\/a><\/li>\n<li><a href=\"#second\">The OLE Functions<\/a>\n<ol>\n<li><a href=\"#third\">The OLE Regex Match function<\/a><\/li>\n<li><a href=\"#fourth\">The OLE Regex Replace function<\/a><\/li>\n<li><a href=\"#fifth\">The OLE Regex Find (Execute) function<\/a><\/li>\n<\/ol>\n<\/li>\n<li><a href=\"#seventh\">Combining two Regexes<\/a><\/li>\n<li><a href=\"#eighth\">OLE Regex Performance<\/a><\/li>\n<\/ol>\n<p>Regular Expressions can be very useful to the Database programmer, particularly for data validation, data feeds and data transformations. A lot of the time, tools such as grep and awk or Funduc&#8217;s S&amp;R will be the most suitable way of using regular expressions, but just occasionally, it is handy to be able to use them in TSQL as we&#8217;ll try to show.<\/p>\n<p>Regular Expressions are not regular in the sense that there is any common dialect of expression that is understood by all Regex engines. On the contrary, regular expresssions aren&#8217;t always portable and there are many common, similar but incompatible, dialects in use, such as Perl 5.8, Java.util.regex, .NET, PHP, Python, Ruby, ECMA Javascript, PCRE, Apache, vi, Shell tools TCL ARE, POSIX BRE, Funduc and JGsoft.<\/p>\n<p>Regular Expressions were never developed to be easy to understand. They are a condensed shorthand that, on preliminary inspection, looks as if someone has repeatedly sat on the keyboard. Even when interpreted, the logic isn&#8217;t always easy to follow.<\/p>\n<p>Probably the best tutorial on the web for Regular Expressions is on <a href=\"http:\/\/www.regular-expressions.info\">www.regular-expressions.info<\/a> but it is also worth reading <a href=\"http:\/\/www.simple-talk.com\/dotnet\/.net-framework\/implementing-real-world-data-input-validation-using-regular-expressions\/\">Implementing Real-World Data Input Validation using Regular Expressions by Francis Norton<\/a> for an introduction to regular expressions.<\/p>\n<p>A great deal can be done using commandline applications that work with regular expressions such as GREP and AWK. However, there are times where it is handy to use Regex directly from TSQL. There are two Regex engines available to SQL Server. These are:<\/p>\n<ul>\n<li>The .NET Regex which is in the system.text.regularexpression module<\/li>\n<li>The ECMA Regex from VBScript.RegExp which is distributed with the IE browser and is used by Javascript and JScript.<\/li>\n<\/ul>\n<p>Both of these are excellent standard implementations. Both work well in TSQL.<\/p>\n<p><strong>The .NET Regex<\/strong> requires the creation of CLR functions to provide regular expressions, and works only with SQL Server 2005, (and 2007) <a href=\"http:\/\/www.sqlservercentral.com\/articles\/Development\/clrintegration\/1967\/\">See CLR Integration by Christoffer Hedgate<\/a>.<\/p>\n<p><strong>The ECMA Regex<\/strong> can be used via VBScript.RegExp, which are available to SQL Server 2000 as well. The regex is compatible with Javascript.<\/p>\n<p>The advantage of using CLR is that the regular expressions of the NET framework are very good, and performance is excellent. However, the techniques are well-known, whereas some of the more powerful uses of VBScript.RegExp have hardly ever been published, so this workbench will concentrate on the latter<\/p>\n<h3 id=\"second\">The OLE functions<\/h3>\n<p>There are various properties to consider in these functions:<\/p>\n<dl>\n<dt>IgnoreCase<\/dt>\n<dd>By default, the regular expression is case sensitive. In the following functions, we have set the IgnoreCase property to True to make it case insensitive.<\/dd>\n<dt>Multiline property<\/dt>\n<dd>The caret and dollar only match at the very start and very end of the subject string by default. If your subject string consists of multiple lines separated by line breaks, you can make the caret and dollar match at the start and the end of those lines by setting the Multiline property to True. (there is no option to make the dot match line break characters).<\/dd>\n<dt>Global property<\/dt>\n<dd>If you want the RegExp object to return or replace all matches instead of just the first one, set the Global property to True.<\/dd>\n<\/dl>\n<p>Only the IgnoreCase is relevant in the first function but we&#8217;ve &#8216;hardcoded&#8217; it to 1 as case-sensitive searches are a minority interest.<\/p>\n<h3 id=\"third\">The OLE Regex Match function<\/h3>\n<p>Let&#8217;s start off with something simple, a function for testing a string against a regular expression:<\/p>\n<pre class=\"lang:tsql theme:ssms2012\">IF OBJECT_ID (N'dbo.RegexMatch') IS NOT NULL\r\n   DROP FUNCTION dbo.RegexMatch\r\nGO\r\n    \r\nCREATE FUNCTION dbo.RegexMatch\r\n    (\r\n      @pattern VARCHAR(2000),\r\n      @matchstring VARCHAR(MAX)--Varchar(8000) got SQL Server 2000\r\n    )\r\nRETURNS INT\r\n<\/pre>\n<p>The RegexMatch returns <code>True<\/code> or <code>False<\/code>, indicating if the regular expression matches (part of) the string. (It returns <code>null<\/code> if there is an error). When using this for validating user input, you&#8217;ll normally want to check if the entire string matches the regular expression. To do so, put a caret at the start of the regex, and a dollar at the end, to anchor the regex at the start and end of the subject string.<\/p>\n<pre class=\"lang:tsql theme:ssms2012\">    \r\nAS BEGIN\r\n    DECLARE @objRegexExp INT,\r\n        @objErrorObject INT,\r\n        @strErrorMessage VARCHAR(255),\r\n        @hr INT,\r\n        @match BIT\r\n    \r\n    \r\n    SELECT  @strErrorMessage = 'creating a regex object'\r\n    EXEC @hr= sp_OACreate 'VBScript.RegExp', @objRegexExp OUT\r\n    IF @hr = 0\r\n        EXEC @hr= sp_OASetProperty @objRegexExp, 'Pattern', @pattern\r\n        --Specifying a case-insensitive match\r\n    IF @hr = 0\r\n        EXEC @hr= sp_OASetProperty @objRegexExp, 'IgnoreCase', 1\r\n        --Doing a Test'\r\n    IF @hr = 0\r\n        EXEC @hr= sp_OAMethod @objRegexExp, 'Test', @match OUT, @matchstring\r\n    IF @hr &amp;lt;&amp;gt; 0\r\n        BEGIN\r\n            RETURN NULL\r\n        END\r\n    EXEC sp_OADestroy @objRegexExp\r\n    RETURN @match\r\n   END\r\nGO\r\n<\/pre>\n<p>Now, with this routine, we can do some complex input validation.<\/p>\n<pre class=\"lang:tsql theme:ssms2012\">--IS there a repeating word\r\nSELECT dbo.RegexMatch('\\b(\\w+)\\s+\\1\\b','this has has been repeated')--1\r\nSELECT dbo.RegexMatch('\\b(\\w+)\\s+\\1\\b','this has not been repeated')--0\r\n    \r\n\r\n    \r\n--find a word near another word (in this case 'for' and 'last' 1 or 2 words apart)\r\nSELECT dbo.RegexMatch('\\bfor(?:\\W+\\w+){1,2}?\\W+last\\b',\r\n           'You have failed me for the last time, Admiral')--1\r\nSELECT dbo.RegexMatch('\\bfor(?:\\W+\\w+){1,2}?\\W+last\\b',\r\n           'You have failed me for what could be the last time, Admiral')--0\r\n    \r\n    \r\n--is this likely to be a valid credit card\r\nSELECT dbo.RegexMatch('^(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|6011[0-9]{12}|3(?:0\r\n[0-5]|[68][0-9])[0-9]{11}|3[47][0-9]{13}|(?:2131|1800)\\d{11})$','4953129482924435')          \r\n    \r\n    \r\n--IS this a valid ZIP code\r\nSELECT dbo.RegexMatch('^[0-9]{5,5}([- ]?[0-9]{4,4})?$','02115-4653')\r\n    \r\n    \r\n--is this a valid Postcode\r\nSELECT dbo.RegexMatch('^([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha\r\n-hJ-Yj-y][0-9]{1,2})|(([A-Za-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9]?[A-Za-z])))\r\n) {0,1}[0-9][A-Za-z]{2})$','RG35 2AQ')\r\n    \r\n\r\n--is this a valid European date\r\nSELECT dbo.RegexMatch('^((((31\\\/(0?[13578]|1[02]))|((29|30)\\\/(0?[1,3-9]|1[0-2])))\\\/(1[\r\n6-9]|[2-9]\\d)?\\d{2})|(29\\\/0?2\\\/(((1[6-9]|[2-9]\\d)?(0[48]|[2468][048]|[13579][26])|((16\r\n|[2468][048]|[3579][26])00))))|(0?[1-9]|1\\d|2[0-8])\\\/((0?[1-9])|(1[0-2]))\\\/((1[6-9]|[2\r\n-9]\\d)?\\d{2})) (20|21|22|23|[0-1]?\\d):[0-5]?\\d:[0-5]?\\d$','12\/12\/2007 20:15:27')\r\n    \r\n\r\n    \r\n--is this a valid currency value (dollar)\r\nSELECT dbo.RegexMatch('^\\$(\\d{1,3}(\\,\\d{3})*|(\\d+))(\\.\\d{2})?$','$34,000.00')\r\n    \r\n\r\n    \r\n--is this a valid currency value (Sterling)\r\nSELECT dbo.RegexMatch('^\\&amp;pound;(\\d{1,3}(\\,\\d{3})*|(\\d+))(\\.\\d{2})?$',\r\n'&amp;pound;34,000.00')\r\n    \r\n\r\n    \r\n--A valid email address?\r\nSELECT dbo.RegexMatch('^(([a-zA-Z0-9!#\\$%\\^&amp;\\*\\{\\}''`\\+=-_\\|\/\\?]+(\\.[a-zA-Z0-9!#\\$%\\^&amp;\r\n\\*\\{\\}''`\\+=-_\\|\/\\?]+)*){1,64}@(([A-Za-z0-9]+[A-Za-z0-9-_]*){1,63}\\.)*(([A-Za-z0-9]+[A\r\n-Za-z0-9-_]*){3,63}\\.)+([A-Za-z0-9]{2,4}\\.?)+){1,255}$','Phil.Factor@simple-Talk.com')\r\n    <\/pre>\n<p>With this function, the passing back of errors is rudimentary. If an OLE error occurs, then a <code>null<\/code> is passed back.<\/p>\n<p>There are two other basic Regex functions available. With them, you can use regular expressions in all sorts of places in TSQL without having to get to direct grips with the rather awkward OLE interface.<\/p>\n<h3 id=\"fourth\">The OLE Regex Replace function<\/h3>\n<pre class=\"lang:tsql theme:ssms2012\">IF OBJECT_ID(N'dbo.RegexReplace') IS NOT NULL\r\n    DROP FUNCTION dbo.RegexReplace\r\nGO\r\n    \r\nCREATE FUNCTION dbo.RegexReplace\r\n    (\r\n      @pattern VARCHAR(255),\r\n      @replacement VARCHAR(255),\r\n      @Subject VARCHAR(MAX),\r\n      @global BIT = 1,\r\n     @Multiline bit =1\r\n    )\r\nRETURNS VARCHAR(MAX)\r\n    <\/pre>\n<p>The <code>RegexReplace<\/code> function takes three string parameters. The pattern (the regular expression) the replacement expression, and the subject string to do the manipulation to. The replacement expression is one that can cause difficulties. You can specify an empty string <code>''<\/code> as the <code>@replacement<\/code> text. This will cause the <code>Replace<\/code> method to return the subject string with all regex matches deleted from it (see &#8220;strip all HTML elements out of a string&#8221; below). To re-insert the regex match as part of the replacement, include <code>$&amp;<\/code> in the replacement text. (see &#8220;find a #comment and add a TSQL &#8211;&#8221; below). If the regexp contains capturing parentheses, you can use backreferences in the replacement text. <code>$1<\/code> in the replacement text inserts the text matched by the first capturing group, <code>$2<\/code> the second, etc. up to <code>$9<\/code>. (e.g. see import delimited text into a database below) To include a literal dollar sign in the replacements, put two consecutive dollar signs in the string you pass to the <code>Replace<\/code> method.<\/p>\n<pre class=\"lang:tsql theme:ssms2012\">AS BEGIN\r\n    DECLARE @objRegexExp INT,\r\n        @objErrorObject INT,\r\n        @strErrorMessage VARCHAR(255),\r\n        @Substituted VARCHAR(8000),\r\n        @hr INT,\r\n        @Replace BIT\r\n    \r\n\r\n    \r\n    SELECT  @strErrorMessage = 'creating a regex object'\r\n    EXEC @hr= sp_OACreate 'VBScript.RegExp', @objRegexExp OUT\r\n    IF @hr = 0\r\n        SELECT  @strErrorMessage = 'Setting the Regex pattern',\r\n                @objErrorObject = @objRegexExp\r\n    IF @hr = 0\r\n        EXEC @hr= sp_OASetProperty @objRegexExp, 'Pattern', @pattern\r\n    IF @hr = 0 \/*By default, the regular expression is case sensitive. Set the IgnoreCase property to True to make it case insensitive.*\/\r\n        SELECT  @strErrorMessage = 'Specifying the type of match'\r\n    IF @hr = 0\r\n        EXEC @hr= sp_OASetProperty @objRegexExp, 'IgnoreCase', 1\r\n    IF @hr = 0\r\n        EXEC @hr= sp_OASetProperty @objRegexExp, 'MultiLine', @Multiline\r\n    IF @hr = 0\r\n        EXEC @hr= sp_OASetProperty @objRegexExp, 'Global', @global\r\n    IF @hr = 0\r\n        SELECT  @strErrorMessage = 'Doing a Replacement'\r\n    IF @hr = 0\r\n        EXEC @hr= sp_OAMethod @objRegexExp, 'Replace', @Substituted OUT,\r\n            @subject, @Replacement\r\n\r\n     \/*If the RegExp.Global property is False (the default), Replace will return the @subject string with the first regex match (if any) substituted with the replacement text. If RegExp.Global is true, the @Subject string will be returned with all matches replaced.*\/  \r\n\r\n    IF @hr &amp;lt;&amp;gt; 0\r\n        BEGIN\r\n            DECLARE @Source VARCHAR(255),\r\n                @Description VARCHAR(255),\r\n                @Helpfile VARCHAR(255),\r\n                @HelpID INT\r\n            EXECUTE sp_OAGetErrorInfo @objErrorObject, @source OUTPUT,\r\n                @Description OUTPUT, @Helpfile OUTPUT, @HelpID OUTPUT\r\n            SELECT  @strErrorMessage = 'Error whilst '\r\n                    + COALESCE(@strErrorMessage, 'doing something') + ', '\r\n                    + COALESCE(@Description, '')\r\n            RETURN @strErrorMessage\r\n        END\r\n    EXEC sp_OADestroy @objRegexExp\r\n    RETURN @Substituted\r\n   END\r\nGO\r\n    \r\n--remove repeated words in text\r\nSELECT  dbo.RegexReplace('\\b(\\w+)(?:\\s+\\1\\b)+', '$1',\r\n                         'Sometimes I cant help help help stuttering',1, 1)\r\n\r\n    \r\n--find a #comment and add a TSQL --\r\nSELECT  dbo.RegexReplace('#.*','--$&amp;','\r\n# this is a comment\r\nfirst,second,third,fourth',1,1)\r\n    \r\n\r\n    \r\n--replace a url with an HTML anchor\r\nSELECT  dbo.RegexReplace(\r\n        '\\b(https?|ftp|file):\/\/([-A-Z0-9+&amp;@#\/%?=~_|!:,.;]*[-A-Z0-9+&amp;@#\/%=~_|])',\r\n        '&lt;a href=\"$2\"&gt;$2&lt;\/a&gt;',\r\n         'There is  this amazing site at http:\/\/www.simple-talk.com',1,1)\r\n    \r\n\r\n    \r\n--strip all HTML elements out of a string\r\nSELECT  dbo.RegexReplace('&lt;(?:[^&gt;''\"]*|([''\"]).*?\\1)*&gt;',\r\n   '','&lt;a href=\"http:\/\/www.simple-talk.com\"&gt;Simle Talk is wonderful&lt;\/a&gt;&lt;!--This is a comment --&gt; we all love it',1,1)\r\n    \r\n\r\n    \r\n--import delimited text into a database, converting it into insert statements\r\nSELECT  dbo.RegexReplace(\r\n '([^\\|\\r\\n]+)[|\\r\\n]+([^\\|\\r\\n]+)[|\\r\\n]+([^\\|\\r\\n]+)[|\\r\\n]+([^\\|\\r\\n]+)[|\\r\\n]+',\r\n 'Insert into MyTable (Firstcol,SecondCol, ThirdCol, Fourthcol)\r\nselect $1,$2,$3,$4\r\n','1|white gloves|2435|24565\r\n2|Sports Shoes|285678|0987\r\n3|Stumps|2845|987\r\n4|bat|29862|4875',1,1)\r\n<\/pre>\n<h3 id=\"fifth\">The OLE Regex Find (Execute) function<\/h3>\n<p>This is the most powerful function for doing complex finding and replacing of text. As it passes back detailed records of the hits, including the location and the backreferences, it allows for complex manipulations.<\/p>\n<p>This is written as a table function. The Regex Routine actually passes back a collection for each &#8216;hit&#8217;. In the relational world, you&#8217;d normally represent this in two tables, so we&#8217;ve returned a left outer join of the two logical tables so as to pass back all the information. This seems to cater for all the uses we can think of. We also append an error column, which should be blank!<\/p>\n<pre class=\"lang:tsql theme:ssms2012\">    \r\nIF OBJECT_ID(N'dbo.RegexFind') IS NOT NULL\r\n    DROP FUNCTION dbo.RegexFind\r\nGO\r\n    \r\nCREATE FUNCTION RegexFind(\r\n    @pattern VARCHAR(255),\r\n    @matchstring VARCHAR(MAX),\r\n    @global BIT = 1,\r\n   @Multiline bit =1)\r\nRETURNS\r\n    @result TABLE\r\n        (\r\n        Match_ID INT,\r\n          FirstIndex INT ,\r\n          length INT ,\r\n          Value VARCHAR(2000),\r\n          Submatch_ID INT,\r\n          SubmatchValue VARCHAR(2000),\r\n         Error VARCHAR(255)\r\n        )\r\n\r\nAS -- columns returned by the function\r\n   BEGIN\r\n    DECLARE @objRegexExp INT,\r\n        @objErrorObject INT,\r\n        @objMatch INT,\r\n        @objSubMatches INT,\r\n        @strErrorMessage VARCHAR(255),\r\n       @error VARCHAR(255),\r\n        @Substituted VARCHAR(8000),\r\n        @hr INT,\r\n        @matchcount INT,\r\n        @SubmatchCount INT,\r\n        @ii INT,\r\n        @jj INT,\r\n        @FirstIndex INT,\r\n        @length INT,\r\n        @Value VARCHAR(2000),\r\n        @SubmatchValue VARCHAR(2000),\r\n        @objSubmatchValue INT,\r\n        @command VARCHAR(8000),\r\n        @Match_ID INT\r\n        \r\n    DECLARE @match TABLE\r\n        (\r\n          Match_ID INT IDENTITY(1, 1)\r\n                       NOT NULL,\r\n          FirstIndex INT NOT NULL,\r\n          length INT NOT NULL,\r\n          Value VARCHAR(2000)\r\n        )    \r\n    DECLARE @Submatch TABLE\r\n        (\r\n          Submatch_ID INT IDENTITY(1, 1),\r\n          match_ID INT NOT NULL,\r\n          SubmatchNo INT NOT NULL,\r\n          SubmatchValue VARCHAR(2000)\r\n        )\r\n    \r\n    SELECT  @strErrorMessage = 'creating a regex object',@error=''\r\n    EXEC @hr= sp_OACreate 'VBScript.RegExp', @objRegexExp OUT\r\n    IF @hr = 0\r\n        SELECT  @strErrorMessage = 'Setting the Regex pattern',\r\n                @objErrorObject = @objRegexExp\r\n    IF @hr = 0\r\n        EXEC @hr= sp_OASetProperty @objRegexExp, 'Pattern', @pattern\r\n    IF @hr = 0\r\n        SELECT  @strErrorMessage = 'Specifying a case-insensitive match'\r\n    IF @hr = 0\r\n        EXEC @hr= sp_OASetProperty @objRegexExp, 'IgnoreCase', 1\r\n    IF @hr = 0\r\n        EXEC @hr= sp_OASetProperty @objRegexExp, 'MultiLine', @Multiline\r\n    IF @hr = 0\r\n        EXEC @hr= sp_OASetProperty @objRegexExp, 'Global', @global\r\n    IF @hr = 0\r\n        SELECT  @strErrorMessage = 'Doing a match'\r\n    IF @hr = 0\r\n        EXEC @hr= sp_OAMethod @objRegexExp, 'execute', @objMatch OUT,\r\n            @matchstring\r\n    IF @hr = 0\r\n        SELECT  @strErrorMessage = 'Getting the number of matches'    \r\n    IF @hr = 0\r\n        EXEC @hr= sp_OAGetProperty @objmatch, 'count', @matchcount OUT\r\n    SELECT  @ii = 0\r\n    WHILE @hr = 0\r\n        AND @ii &amp;lt; @Matchcount\r\n        BEGIN\r\n\/*\r\n    \r\nThe Match object has four read-only properties.\r\nThe FirstIndex property indicates the number of characters in the string to the left of the match.\r\nThe Length property of the Match object indicates the number of characters in the match.\r\nThe Value property returns the text that was matched.\r\n\r\n*\/\r\n    \r\n            SELECT  @strErrorMessage = 'Getting the FirstIndex property',\r\n                    @command = 'item(' + CAST(@ii AS VARCHAR) + ').FirstIndex'    \r\n            IF @hr = 0\r\n                EXEC @hr= sp_OAGetProperty @objmatch, @command,\r\n                    @Firstindex OUT\r\n            IF @hr = 0\r\n                SELECT  @strErrorMessage = 'Getting the length property',\r\n                        @command = 'item(' + CAST(@ii AS VARCHAR) + ').Length'    \r\n            IF @hr = 0\r\n                EXEC @hr= sp_OAGetProperty @objmatch, @command, @Length OUT\r\n            IF @hr = 0\r\n                SELECT  @strErrorMessage = 'Getting the value property',\r\n                        @command = 'item(' + CAST(@ii AS VARCHAR) + ').Value'    \r\n            IF @hr = 0\r\n                EXEC @hr= sp_OAGetProperty @objmatch, @command, @Value OUT\r\n            INSERT  INTO @match\r\n                    (\r\n                      Firstindex,\r\n                      [Length],\r\n                      [Value]\r\n                    )\r\n                    SELLECT  @firstindex + 1,\r\n                            @Length,\r\n                            @Value\r\n            SELECT  @Match_ID = @@Identity        \r\n    \r\n\/*The SubMatches property of the Match object is a collection of strings. It will only hold values if your regular expression has capturing groups. The collection will hold one string for each capturing group. The Count property (returned as SubmatchCount) indicates the number of string in the collection. The Item property takes an index parameter, and returns the text matched by the capturing group.\r\n    \r\n*\/\r\n    \r\n            IF @hr = 0\r\n                SELECT  @strErrorMessage = 'Getting the SubMatches collection',\r\n                        @command = 'item(' + CAST(@ii AS VARCHAR)\r\n                        + ').SubMatches'    \r\n            IF @hr = 0\r\n                SELECT  @strErrorMessage = 'Getting the number of submatches'    \r\n            IF @hr = 0\r\n                EXEC @hr= sp_OAGetProperty @objSubmatches, 'count',\r\n                    @submatchCount OUT\r\n            SELECT  @jj = 0\r\n            WHILE @hr = 0\r\n                AND @jj &amp;lt; @submatchCount\r\n                BEGIN\r\n                    IF @hr = 0\r\n                        SELECT  @strErrorMessage = 'Getting the submatch value property',\r\n                                @command = 'item(' + CAST(@jj AS VARCHAR)\r\n                                + ')' ,@submatchValue=NULL  \r\n                    IF @hr = 0\r\n                        EXEC @hr= sp_OAGetProperty @objSubmatches, @command,\r\n                            @SubmatchValue OUT\r\n                    INSERT  INTO @Submatch\r\n                            (\r\n                              Match_ID,\r\n                              SubmatchNo,\r\n                              SubmatchValue\r\n                            )\r\n                            SELECT  @Match_ID,\r\n                                    @jj+1,\r\n                                    @SubmatchValue\r\n                    SELECT  @jj = @jj + 1\r\n                END  \r\n                EXEC @hr= sp_OAGetProperty @objmatch, @command,\r\n                    @objSubmatches OUT    \r\n            SELECT  @ii = @ii + 1\r\n        END\r\n    IF @hr &amp;lt;&amp;gt; 0\r\n        BEGIN\r\n            DECLARE @Source VARCHAR(255),\r\n                @Description VARCHAR(255),\r\n                @Helpfile VARCHAR(255),\r\n                @HelpID INT\r\n  \r\n            EXECUTE sp_OAGetErrorInfo @objErrorObject, @source OUTPUT,\r\n                @Description OUTPUT, @Helpfile OUTPUT, @HelpID OUTPUT\r\n            SELECT  @Error = 'Error whilst '\r\n                    + COALESCE(@strErrorMessage, 'doing something') + ', '\r\n                    + COALESCE(@Description, '')\r\n        END\r\n    EXEC sp_OADestroy @objRegexExp\r\n     EXEC sp_OADestroy        @objMatch\r\n     EXEC sp_OADestroy        @objSubMatches\r\n\r\nINSERT INTO @result\r\n          (Match_ID,\r\n          FirstIndex,\r\n          [length],\r\n          [Value],\r\n          Submatch_ID,\r\n          SubmatchValue,\r\n         error)\r\n\r\n\r\n    SELECT  m.[Match_ID],\r\n           [FirstIndex],\r\n           [length],\r\n           [Value],[SubmatchNo],\r\n           [SubmatchValue],@error\r\n  FROM    @match m\r\n    LEFT OUTER JOIN   @submatch s\r\n    ON m.match_ID=s.match_ID    \r\nIF @@rowcount=0 AND LEN(@error)&amp;gt;0\r\nINSERT INTO @result(error) SELECT @error\r\nRETURN\r\nEND\r\nGO\r\n\r\n--showing the context where two words 'for' and 'last' are found in proximity\r\nDECLARE @sample VARCHAR(2000)\r\nSELECT @Sample='You have failed me for the last time, Admiral.\r\nWe have not long to wait for your last gasp'\r\nSELECT '...'+SUBSTRING(@Sample,Firstindex-8,length+16)+'...'\r\n    FROM dbo.RegexFind ('\\bfor(?:\\W+\\w+){0,3}?\\W+last\\b',\r\n           @sample,1,1)\r\n\r\n--finding repeated words, showing the repetition and the repeated word\r\nSELECT [repetition]=value, [word]=SubmatchValue FROM dbo.RegexFind ('\\b(\\w+)\\s+\\1\\b',\r\n'this this is is a repeated word word word',1,1)\r\n\r\n--Split lines based on a regular expression\r\nSELECT value FROM dbo.regexfind('[^\\r\\n]*(?:[\\r\\n]*)',\r\n'\r\nThis is the second line\r\nThis is the third\r\nand the fourth',1,1) WHERE length&amp;gt;0\r\n\r\n--break up all words in a string into separate table rows\r\nSELECT value FROM dbo.RegexFind ('\\b[\\w]+\\b',\r\n'Hickory dickory dock, the mouse ran up the clock',1,1)\r\n--split text into keywords and values\r\nSELECT Match_ID,\r\n[keyword]=MAX (CASE WHEN submatch_ID=1 THEN  submatchValue ELSE '' END),\r\n[value]=MAX (CASE WHEN submatch_ID=2 THEN  submatchValue ELSE '' END)\r\n  FROM dbo.RegexFind ('(\\w+)\\s*=\\s*(.*)\\s*',\r\n'firstname=Phil\r\nLastname=Factor\r\nSalary=$200,000\r\nage=unknown to us\r\nPost=DBA',1,1) GROUP BY Match_ID\r\n\r\nSELECT * FROM dbo.RegexFind ('([^\\|\\r\\n]+[\\|\\r\\n]+)',\r\n'1|white gloves|2435|24565\r\n2|Sports Shoes|285678|0987\r\n3|Stumps|2845|987\r\n4|bat|29862|4875',1,1)\r\n\r\n--get valid dates and convert to SQL Server format\r\nSELECT DISTINCT CONVERT(DATETIME,value,103) FROM dbo.RegexFind ('\\b(0?[1-9]|[12][0-9]|3[01])[- \/.](0?[1-9]|1[012])[- \/.](19|20?[0-9]{2})\\b','\r\n12\/2\/2006 12:30 &lt;&gt; 13\/2\/2007\r\n32\/3\/2007\r\n2-4-2007\r\n25.8.2007\r\n1\/1\/2005\r\n34\/2\/2104\r\n2\/5\/2006',1,1)\r\n<\/pre>\n<h3 id=\"seventh\">Combining two Regexes<\/h3>\n<p>Once you&#8217;ve experimented with the regex calls we&#8217;ve provided, you&#8217;ll realise that you can create some really cool functions and procedures that combine regexes. Here we have a procedure that does a &#8216;google-style&#8217; search on text to find the words you specify. It returns the &#8216;context&#8217; in that it quotes the substring where the match occurred. You can specify how close the words need to be to specify a &#8216;hit&#8217;.<\/p>\n<pre class=\"lang:tsql theme:ssms2012\">IF OBJECT_ID(N'dbo.FindWordsInContext') IS NOT NULL\r\n    DROP FUNCTION dbo.FindWordsInContext\r\nGO\r\nCREATE FUNCTION dbo.FindWordsInContext\r\n    (\r\n      @words VARCHAR(255),--list of words you want searched for\r\n      @text VARCHAR(MAX),--the text you want searched\r\n      @proximity INT--the maximum distance in words between specified words\r\n    )\r\nRETURNS @proximityList TABLE\r\n    (\r\n      Hit INT IDENTITY(1, 1),\r\n      context VARCHAR(2000)\r\n    )\r\nAS BEGIN\r\n    DECLARE @Pattern VARCHAR(512)\r\n    SELECT  @Pattern = COALESCE(@pattern + '(?:\\W+\\w+){0,'\r\n                                + CAST(@proximity AS VARCHAR(5)) + '}?\\W+',\r\n                                '\\b') + value\r\n    FROM    dbo.RegexFind('\\b[\\w]+\\b', @words, 1, 1)\r\n    INSERT  INTO @ProximityList ( context )\r\n            SELECT  '...' + SUBSTRING(@text, Firstindex - 8, length + 16)\r\n                    + '...'\r\n            FROM    dbo.RegexFind(@pattern+'\\b', @text, 1, 1)\r\n    RETURN\r\n   END\r\n    \r\nGO\r\n\r\nSELECT * FROM dbo.FindWordsInContext('sadness farewell embark',\r\n'Sunset and evening star,\r\nAnd one clear call for me!\r\nAnd may there by no moaning of the bar,\r\nWhen I put out to sea,\r\n\r\nBut such a tide as moving seems asleep,\r\nToo full for sound and foam,\r\nWhen that which drew from out the boundless deep\r\nTurns again home.\r\n    \r\nTwilight and evening bell,\r\nAnd after that the dark!\r\nAnd may there be no sadness of farewell,\r\nWhen I embark;\r\n\r\nFor tho'' from out our bourne of Time and Place\r\nThe flood may bear me far,\r\nI hope to see my Pilot face to face\r\nWhen I have crost the bar.\r\n',8)\r\n<\/pre>\n<h3 id=\"eighth\">OLE Regex performance<\/h3>\n<p>Whereas the use of the OLE <code>VBScript.RegExp<\/code> to scan large chunks of text is fine, it is good for complex validation, and it makes a great testbed for regexes, These OLE functions are too slow for use in queries. The overhead of making the calls is just too high because the performance of OLE in TSQL is not great. See Zach Nichter&#8217;s excellent article on the subject <a href=\"https:\/\/www.sqlservercentral.com\/articles\/writing-to-a-file-using-the-sp_oacreate-stored-procedure-and-osql-the\">&#8216;Writing to a File Using the sp_OACreate Stored Procedure and OSQL&#8217;<\/a>.<\/p>\n<p>Here is an example, scanning a databases of nearly 50,000 names of public houses from out XML Jumpstart Workbench.<\/p>\n<pre class=\"lang:tsql theme:ssms2012\">SELECT COUNT(*) FROMpublichouses.dbo.publichouses WHERE dbo.RegexMatch ('\\bred\\b',name)=1\r\n--5 minutes 28 secs\r\nSELECT COUNT(*) FROM publichouses.dbo.publichouses WHERE name LIKE '%red %'\r\n--less than 50 ms\r\n    \r\n\/*You can reduce the overhead to a quarter of what it was by using a function like this and creating the Regex object before you do the call. This means the Regex Object does not get repeatedly created and destroyed on every call.*\/\r\n    \r\nIF OBJECT_ID(N'dbo.OARegexMatch') IS NOT NULL\r\n    DROP FUNCTION dbo.OARegexMatch\r\nGO\r\nCREATE FUNCTION dbo.OARegexMatch \/* very simple Function Wrapper around the call *\/\r\n    (\r\n     @objRegexExp INT,\r\n      @matchstring VARCHAR(MAX)\r\n    )\r\nRETURNS INT\r\nAS BEGIN\r\n    DECLARE @objErrorObject INT,\r\n        @hr INT,\r\n        @match BIT\r\n        EXEC @hr= sp_OAMethod @objRegexExp, 'Test', @match OUT, @matchstring\r\n    IF @hr &amp;lt;&amp;gt; 0\r\n        BEGIN\r\n            RETURN NULL\r\n        END\r\n    RETURN @match\r\n   END\r\nGO\r\n\/* and now embed the SQL Query within the life-cycle of the Regex object *\/\r\n    \r\n    \r\nDECLARE @objRegexExp INT,\r\n        @objErrorObject INT,\r\n        @strErrorMessage VARCHAR(255),\r\n        @hr INT,\r\n        @match BIT\r\n    \r\n    SELECT  @strErrorMessage = 'creating a regex object'\r\n    EXEC @hr= sp_OACreate 'VBScript.RegExp', @objRegexExp OUT\r\n    IF @hr = 0\r\n        EXEC @hr= sp_OASetProperty @objRegexExp,'pattern', '\\bred\\b'\r\n        --Specifying a case-insensitive match\r\n    IF @hr = 0\r\n        EXEC @hr= sp_OASetProperty @objRegexExp, 'IgnoreCase', 1\r\n        --Doing a Test'\r\n    IF @hr = 0\r\n       SELECT COUNT(*)\r\n           FROM publichouses.dbo.publichouses\r\n           WHERE dbo.OARegexMatch (@objRegexExp,name)=1\r\n    IF @hr &amp;lt;&amp;gt; 0\r\n        BEGIN\r\n            DECLARE @Source VARCHAR(255),\r\n                @Description VARCHAR(255),\r\n                @Helpfile VARCHAR(255),\r\n                @HelpID INT\r\n            EXECUTE sp_OAGetErrorInfo @objErrorObject, @source OUTPUT,\r\n                @Description OUTPUT, @Helpfile OUTPUT, @HelpID OUTPUT\r\n            SELECT  @strErrorMessage = 'Error whilst '\r\n                    + COALESCE(@strErrorMessage, 'doing something') + ', '\r\n                    + COALESCE(@Description, '')\r\n            RAISERROR( @strErrorMessage,16,1)\r\n        END\r\n    EXEC sp_OADestroy @objRegexExp\r\n--1 minute 28 secs\r\n<\/pre>\n<p>It is no consolation for those who are stuck with SQL Server 2000, but the CLR functions are a lot quicker for this sort of usage.<\/p>\n<p class=\"note\">We&#8217;ve used a range of regex patterns from a number of sources in this workbench. Like a lot of programmers, we collect up snippets we come across, almost always forgetting to record the original author. We therefore apologise in advance for not crediting the source and original author, of regex patterns. As you can guess, they often take a long time and effort to develop. If you spot a regex which we should have cited, please add a comment and let us all know who originally wrote it!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Robyn and Phil start by writing a gentle introduction to using Regular expressions for validation, data cleaning and data import in TSQL, and finally end up with a routine for doing google-style searches that show the context of hits. It&#8217;s all done in the spirit of &#8216;try it and see&#8230;&#8217;&hellip;<\/p>\n","protected":false},"author":154613,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143531],"tags":[4150,4252,4190,4834,4460],"coauthors":[6813,6814],"class_list":["post-327","post","type-post","status-publish","format-standard","hentry","category-t-sql-programming-sql-server","tag-sql","tag-t-sql-programming","tag-tsql","tag-tsql-exaple-code-regular-expressions-regex-sequel-sql","tag-workbench"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/327","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/154613"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=327"}],"version-history":[{"count":9,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/327\/revisions"}],"predecessor-version":[{"id":84234,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/327\/revisions\/84234"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=327"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=327"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=327"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=327"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}