De-duplicating Delimited Lists in SQL Server

Comments 0

Share to social media

In a recent article ‘Removing Duplicates from Strings in SQL Server’, Aaron Bertrand comes up with the interesting problem of removing duplicates from lists of strings. The example he gives is …

… and you want to make each word unique in the list…

He goes on to describe the conventional ‘number-table’ solution.

It made me wonder if it would be possible to do an alternative solution that didn’t require functions but would also work on a whole set of lists. Meybe, even a single expression.

Normally, these lists will be quite long. There is no place in a relational database for lists. However, there are times when we are forced to resort to them, if only when importing data from other systems. There is a subtle problem with them. Sometimes, lists not only describe the members of a set but also their order in the set.

In this case, it is the order of the first occurrence of the item in the list. To do the latter, the solution is slightly more complicated.

Let’s start with a set of lists. Just for simplicity, we’ll use the same delimiter for each though you’ll probably spot how to specify the delimiter and even how to allow multi-character delimiters. I’m leaving all that stuff out to keep all this code as simple as possible

So there you have it. With XML tricks and window functions, we have more opportunity for kicking out any need for functions. To use this code, you’d just swap out the select statement that supplied my samples to the routine, for the lists that you want to deduplicate. I am intrigued that there are such different ways of doing a solution for this task in SQL server. Are there yet other ways of doing it?

  1. 22-Jan-2016 Fixed a bug spotted by Aaron Bertrand)
  2. 28-Jan-2016 Fixed an XML escaping bug spotted by xcelsius
  3. 29-Jan-2016 Added fast version using XQuery and flwor
  4. 12-Feb-2016 Added a new faster solution based on that of xcelsius

Load comments

About the author

Phil Factor

See Profile

Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 40 years of experience with database-intensive applications. Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career. See also :

Phil Factor's contributions