SQL String User Function Workbench: part 2

In which Robyn and Phil continue with their popular series on TSQL String User-functions. In this final episode, they pull together the themes from their TSQL String Array Workbench and String User Function workbench, to provide a simple TSQL string-handling package.

The Return of the Killer Tuples

This workbench finishes of what has been a three-part series of string functions. In it, we introduce the idea of using XML to provide a very simple array for doing string handling. This allows us to use functions for searching and splitting strings that will be familiar to users of procedural languages such as PHP and Python.

The first part, TSQL String Array Workbench showed how the basics worked and then demonstrated how it could be used with a PHP-style string function. What inspired us to write this workshop was when Phil had to endure a PHP programmer sounding off about how much better PHPs string handling as than TSQL. He then made the discovery that it was actually possible to pass a string in an XML parameter, detect the fact and convert it to a single-item list, so as to emulate the facility of PHP and Python to pass either lists or single strings.

Of course, this principle could be extended to arrays and matrices. We don’t handle the representation of lists, arrays and matrices in any standard way, as this would be a distraction at this stage, and multi-dimensional lists of arrays aren’t used much for strings.

We then got rather diverted by the Python string functions (now string methods) and so wrote the SQL String User Function Workbench: part 1 that emulated all the python string functions that didn’t have lists as parameters so we didn’t use the XML array mechanism. Unfortunately there were a whole group that did, so here to round things up are the…

SQL String User Functions (from Python) that can use lists

Contents

  1. Split function
  2. SplitLines
  3. Within (not from python)
  4. EndsWith
  5. StartsWith
  6. Contains (not from python)
  7. Join
  8. Parts (not from python)
  9. Partition
  10. RPartition

Split Function

Return an array of the words in the string, using @delimiter as a delimiter. If @maxsplit is given, at most @maxsplit splits are done. (thus, the list will have at most maxsplit+1 elements). If @maxsplit is not specified, then there is no limit on the number of splits (all possible splits are made). Consecutive delimiters are not grouped together and are deemed to delimit empty strings. The sep argument may consist of several characters. If @Delimiter is not specified or is None, a different splitting algorithm is applied. First, whitespace characters spaces, tabs, newlines, returns, and formfeeds) are stripped from both ends. Then, words are separated by arbitrary length strings of whitespace characters. Consecutive whitespace delimiters are treated as a single delimiter. Splitting an empty string or a string consisting of just whitespace returns an empty list. P.S. We took this second ‘splitting algorithm’ to mean that a list of the words was required. Our solution is only tested for English and will need fine tuning for other languages. Phil swore ages ago that Hell would freeze over before he ever published yet another string-splitting algorithm. We may have hit on a solution to global warming here.

Now we simply output the temporary table variable as XML using our standard string-array format.

So now we test it out (The real test rig is longer and more boring).

SplitLines string Function

Return a list of the lines in the string, breaking at line boundaries. Line breaks are not included in the resulting list unless keepends is given and true. p.s. This is such a simple modification to ‘Split’ that you wonder why they bothered.

Note: in SQL Server 2016, there is a built-in function to do this called STRING_SPLIT

within string Function

Return non-zero if the string contains the specified substring, otherwise return False. Suffix can also be a list of substrings to look for. With the optional start parameter, the test should begin at that position. With the optional end,the test should stop comparing at that position. P.S. This isn’t a Python method, but it underpins the Startswith, and EndsWith routines. We add the contains function to hide the wildcard.

EndsWith string Function

Return non-zero if the string ends with the suffix, otherwise return False. The suffix can also be a list of suffixes to look for. With optional start, test string beginning at that position. With optional end, stop comparing string at that position.

StartsWith string Function

Return non-zero if the string starts with the prefix, otherwise return False. Prefix can also be a list of prefixes to look for. With optional start, test string beginning at that position. With optional end, stop comparing string at that position.

Contains string Function

Return non-zero if the string contains the substring, otherwise returns 0. substring can also be a list of substrings to look for. With optional start, test string beginning at that position. With optional end, stop comparing string at that position.

Join string Function

Joins together the given array as a string with the @separator as separator:

Parts string Function

Split the string at the first occurrence of sep, and return an array containing the part before the separator, the separator itself, and the part after the separator. IF the separator is not found, return an array containing the string itself, followed by two empty strings. P.S. This is not part of the Python suite. It is used to support Partition and RPartition. Again, Phil required calming down before he knuckled down to write this, since he once swore he would never publish another string splitting routine.

Partition string Function

Split the string at the first occurrence of sep, and return an array containing the part before the separator, the separator itself, and the part after the separator. If the separator is not found, return an array containing the string itself, followed by two empty strings.

RPartition string Function

Split the string at the last occurrence of sep, and return an array containing the part before the separator, the separator itself, and the part after the separator. If the separator is not found, return an array containing the string itself, followed by two empty strings.

So, just to summarise:

…and so on, and so forth. The combinations and possibilities are endless. We find that having the functions there will speed development. We always say that, for speed-critical sections of code, we’ll re-code using the built-in functions; but it is surprising how seldom this is actually required.

Please remember that the code is, as ever, available to download below. Just to make things simpler, we’ve added the code for the previous string User-Function workbenches too.