PATINDEX Workbench

The PATINDEX function of SQL Server packs powerful magic, but it is easy to get it wrong. Phil Factor returns to the Workbench format to give a tutorial of examples, samples and cookbook ideas to demonstrate the ways that this underrated function can be of practical use. It is intended to be pasted into SSMS and used as a basis for experiment

The PATINDEX function is a treasure, and it is the obvious way in SQL Server to locate text in strings. It uses the same wildcard pattern as the LIKE operator, works as fast, but gives back more information. It is closest in functionality to CHARINDEX. People worry that it does not allow a full RegEx, but that is missing the point. It will do useful string searching, as I’ll show you, and it is optimized for performance in a database. A RegEx search performs a different service and is slower at doing those routine jobs that PATINDEX is intended for.

It is great for quick checks on your data. Here, we do an elementary check on a stored IP address.

So here, the % ‘wildcard’ character has a special meaning, which is ‘any number of any character’. BOL puts this as ‘Any string of zero or more characters.’ For some reason, most developers seem to think that there is a rule that you can only use them at the start and/or end of a pattern. Not true. You can use them anywhere, as many as you like, but not adjacent as all but the first are ignored. The meaning ‘One of any character’ is denoted by the underscore wildcard character, ‘_’. So what does the IP check we have shown you comprise? Nothing more than counting that there are three dots in the string. We can do more but it gets rather more complicated.

Note that, because we are just testing for existence, and aren’t going to extract it, we could also use LIKE

Here is another simple example of its use

Here are a few other more practical examples of using the ‘%’ wildcard.

See what we’ve done? We’ve added a default at the end so that we don’t have to cope with passing back a 0 from the PATINDEX when it hits a string without the correctly formatted number in it. We look for the transition between a character that isn’t numeric, to one that is. Then we look for three valid consecutive numbers followed by a character that isn’t a number.

The same technique can be used where you want to trim off whitespace before or after a string. You might think that RTRIM and LTRIM do this but they are slightly broken, in that they only trim off the space character. What about linebreaks or tabs?

We’ll now show how we can use this to implement a proper Left-Trim function

Let’s try something a bit trickier, and closer to a real chore. Let’s find a UK postcode. (apologies to all other nations who are reading this)

The validation rules are that the length must be between 6 and 8 characters of which one is a space. This divides the three-character local code to the right of the space from the sorting-office code to the left of the space. The local characters are always a numeric character followed by two alphabetic characters. The Sorting Office code the left of the gap, can be between 2 and 4 characters and the first character must be alpha.

Before you get too excited, I must point out the the postcode validation is more complex. We can’t use it because PATINDEX uses only wildcards and hasn’t the OR expression or the iterators. We can do quite well though..

What if you wanted to do the more common chore of extracting the postcode from an address-line and putting it in its own field? This is where you have to stop using LIKE as it won’t cut the mustard. If you are of a nervous disposition in your SQL-writing please turn away now.

…and then we’ll populate it with 100,000 rows via SQL Data Generator (we’ll use a RegEx to fill the address column in).

1291-before.PNG

Hopefully, I’ll remember to put it in the downloads at the bottom of the article for the other SQLDG freaks. Then we are going to pull out the postcode information, place the modified address without the postcode in a second column, and put the extracted postcode into its own column so we can subsequently do lightning searches based on postcode. This whole messy process runs in five seconds on my test machine. If you did a neat cursor-based process, it would take minutes.

1291-after.PNG

So here’s a puzzle to end off with. You have a field with an email address somewhere in it, and you need to extract it. Here’s one way of pulling it out. It looks a bit complicated, but it is fast.

What are we doing here? The principle is simple. We look for the embedded ‘@’ sign, and then run a check forwards to get the end of the string containing the ‘@’ character. Then we reverse the start of the string and look for the beginning. When we have these, it is just a simple matter of assembling the email address. The SQL looks laborious, but looks can deceive, so it always pays to test it out. Let’s test it out by recreating our addresses table, and stocking the address column with an additional email record.

So there we have it. In summary, when using PATINDEX:

  • Specify the collation if you use character ranges
  • If a problem seems tricky, see if you can detect transitions between character types
  • Use the angle brackets to ‘escape’ wildcard characters.
  • Think of the % wildcard as meaning ‘any number of any character’, or ‘Any string of zero or more characters.’
  • Remember that you can specify the wildcard parameter as a column as well as a literal or variable.
  • You can use cross joins to do multiple searches to simulate the OR condition of RegExes.
  • Experiment wildly when you get a spare minute. Occasionally, you’ll be surprised.