{"id":70004,"date":"2017-02-23T19:52:10","date_gmt":"2017-02-23T19:52:10","guid":{"rendered":"https:\/\/www.simple-talk.com\/?p=70004"},"modified":"2021-09-29T16:21:17","modified_gmt":"2021-09-29T16:21:17","slug":"fuzzy-searches-sql-server","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/t-sql-programming-sql-server\/fuzzy-searches-sql-server\/","title":{"rendered":"Doing Fuzzy Searches in SQL Server"},"content":{"rendered":"<p>When an application searches for specific text data, you can generally rely on it to get the search term right. For example, if it needs to find \u2018sausages\u2019, you won\u2019t expect to receive a search on \u2018sossyjez\u2019 , however, When people search your website or application, they have grown to expect it.<\/p>\n<p>In this article, I\u2019ll be explaining some of the strategies that you can adopt to make the searches that are made by users more satisfactory for them. Although they are well-tried in the industry, It is rare to hear of them being used in SQL.<\/p>\n<h1>General principles<\/h1>\n<p>When you create your application, you will need to have an \u2018inversion table\u2019 that lists all the words that are legitimately \u2018searchable\u2019. This technique <a href=\"https:\/\/www.simple-talk.com\/sql\/t-sql-programming\/searching-for-strings-in-sql-server-databases\/\">is described here<\/a>. If, for example you are selling widgets, the inversion table would contain a list of widgets, and the widget spares, repairs, advice, instructions and so on. At this stage, we\u2019ll stick to a single language site, but if your site is multi-language, then the structure of the related tables is rather different.<\/p>\n<p>This word list will be referenced, by a foreign key constraint, from a large narrow table that records where the various entities that are associated with the word are stored, so that relevant matches, maybe a link, a phrase, a description or an image, can be returned to the user. This second table usually records at least the location of the text and the sequence number of the word in the text, where it occurs. It is the inversion table, containing the list of words, that we\u2019ll focus on in this article.<\/p>\n<h1>The task<\/h1>\n<p>When users search the site, even if they are using a single language, they will misspell or use national spellings. They will also use dialect words, or specialist words, for the widget, and even then they may get the spelling wrong. They will use argot or even textspeak (\u2018I H8 U 4 UR HCC\u2019).<\/p>\n<p>In short, your &#8220;fuzzy search&#8221; algorithm ought to be able to cope with a lot of creative ways to search for the same thing, for example:<\/p>\n<ul>\n<li><strong>dialect differences<\/strong> \u2013 crayfish, crawfish, \u00e9crevisse, crawdads, yabbies<\/li>\n<li><strong>national spelling differences<\/strong> \u2013 yoghourt, yogurt and yoghurt<\/li>\n<li><strong>national word differences<\/strong> \u2013 pants and trousers<\/li>\n<li><strong>multiple valid spellings<\/strong> \u2013 encyclopaedia, encyclopedia<\/li>\n<li><strong>misspellings<\/strong> \u2013 definitely often rendered as definitly, definately or defiantly<\/li>\n<li><strong>mistyping<\/strong> \u2013 computer as copmuter or comptuer.<\/li>\n<\/ul>\n<p>We\u2019ll use three basic techniques to cope with all this: the \u2018edit difference\u2019, the hardcoded \u2018common misspelling\u2019 lookup, and the sound-alike. Although some of the ways that words become difficult to match are dealt with by the lookup, and the \u2018edit difference\u2019 approach is good for interpreting mistyping, in truth, both that and the sound-alike technique work together to catch a wide variety of errors.<\/p>\n<h1>The Test-bed<\/h1>\n<p>To set this up, we\u2019ll have our inversion table. We\u2019ll also need a function to generate metaphones for when we get to the \u2018sound-alike\u2019 techniques<\/p>\n<p>All we need to start is a list of all the common words in the English language. To this we can then add the common mistypings and \u2018aliases\u2019(by which I mean dialect words or national differences). The list of common words can be found all over the internet but is here in the <a href=\"https:\/\/www.simple-talk.com\/sql\/t-sql-programming\/the-sql-of-scrabble-and-rapping\/\">SQL of Scrabble and Rapping<\/a> or the <a href=\"https:\/\/www.simple-talk.com\/blogs\/the-fireside-fun-of-decapitations\/\">Fireside Fun of Decapitations<\/a>. Common Mistypings and common misspellings are available in the public domain. Dialect words and national differences are dependent on your location. You\u2019ll also need to load the code for the metaphone function from here<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">CREATE TABLE words\r\n  (word VARCHAR(40) PRIMARY KEY,\r\n  CanonicalVersion VARCHAR(40),\r\n  Metaphone VARCHAR(10) NULL, \r\n  Word_id int IDENTITY UNIQUE\r\n  ) \r\n  ALTER TABLE dbo.Words    \r\n  ADD CONSTRAINT FK_canonical FOREIGN KEY (CanonicalVersion)     \r\n      REFERENCES dbo.words(Word)    \r\n  ;    \r\n  GO  \r\n  \/* this is just a temporary expedient *\/\r\n  CREATE TABLE #words\r\n  (word VARCHAR(40)\r\n  ) \r\n  \/* we unpack the wordlist and put it in a directory\r\n  on the server hosting our SQL Server instance *\/\r\n  INSERT INTO #words(word)\r\n         EXECUTE master..xp_cmdshell \r\n                 'type OurDirectory\\wordlist.txt'\r\n  DELETE FROM #words WHERE word IS NULL\r\n  insert INTO words (word,Metaphone)\r\n    SELECT  word, dbo.metaphone(word) FROM #words\r\n  -- now we need an index on the metaphone\r\n  CREATE INDEX idxMetaphone ON words (metaphone)\r\n  DROP TABLE #words<\/pre>\n<p>You can then insert your secondary \u2018alias\u2019 words like this<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">INSERT INTO words(word, Metaphone)\r\n  SELECT good, dbo.metaphone(good)\r\n   FROM (VALUES\r\n  ('across','accross' ),\r\n  ('aggressive','agressive'),\r\n  ('aggression','agression'),\r\n  ('apparently','apparantly'),\r\n  ('appearance','appearence'),\r\n  ('argument','arguement'),<\/pre>\n<p>\u2026and so on through to<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">('wherever','whereever'),\r\n  ('which','wich'))f(good,bad)<\/pre>\n<p>Now, you can select both the standard \u2018canonical\u2019 words and the \u2018aliases\u2019<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">    SELECT COALESCE(words.CanonicalVersion, words.word)  FROM dbo.words\r\n        WHERE words.word = @Searchterm;<\/pre>\n<p>Or you may want to just search the canonical words by excluding the aliases by adding the condition<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">AND words.CanonicalVersion IS NULL<\/pre>\n<h1>Edit Difference<\/h1>\n<p>We can use the &#8216;edit difference&#8217; technique to cope with simple mistyping or misspelling errors.<\/p>\n<p>When you compare words, and need a measure of how alike they are, you generally calculate how many single-character edits, deletions, transpositions, replacements and insertions you would need to make to get from one to the other. It is commonly believed that to search for a word, you need to calculate the number of edits required to change your candidate word to any other word in the inversion table. Well, no. This would take at least thirteen seconds on our word list, with the full Damerau-Levenshtein algorithm.<\/p>\n<p>A much simpler and faster technique is to select just the candidates that are one \u2018edit difference\u2019, or Levenshtein Distance, away because you only want candidate matches that are similar.<\/p>\n<p>You just produce all the character strings that are one edit distance away and do an inner join with your table of words to eliminate the vast majority that aren\u2019t legitimate words in your table. This deals with the majority of errors. No fuss at all.<\/p>\n<p>Here is the code<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">IF OBJECT_ID (N'dbo.OneEditDifferenceTo') IS NOT NULL\r\n     DROP FUNCTION dbo.OneEditDifferenceTo\r\n  GO\r\n  CREATE FUNCTION dbo.OneEditDifferenceTo(@Word Varchar(40))\r\n   \/**\r\n  summary:  &gt;\r\n  Returns all common words that are one edit difference away from the input word. The routine generates all the character strings that are one edit distance away and does an inner join with the table of words.\r\n  Author: Phil Factor\r\n  Revision: 1.0\r\n  date: 16\/02\/2017\r\n  example:\r\n  Select * from dbo.OneEditDifferenceTo('osd')\r\n  Select * from dbo.OneEditDifferenceTo('acheeve')\r\n  Select * from dbo.OneEditDifferenceTo('erty')\r\n  returns:  &gt;\r\n   a table containing strings\r\n  Dependency: WORDS. A table of common words\r\n  **\/\r\n  RETURNS @candidates TABLE \r\n  (\r\n  Candidate VARCHAR(40)\r\n  )\r\n  AS\r\n  -- body of the function\r\n  BEGIN\r\n  DECLARE @characters TABLE ([character] CHAR(1) PRIMARY KEY)\r\n  DECLARE @numbers TABLE (number int PRIMARY KEY)\r\n  INSERT INTO @characters([character])\r\n  SELECT character FROM (VALUES \r\n    ('a'),('b'),('c'),('d'),('e'),('f'),('g'),('h'),('i'),('j'),('k'),('l'),('m'),\r\n    ('n'),('o'),('p'),('q'),('r'),('s'),('t'),('u'),('v'),('w'),('x'),('y'),('z'))F(character)\r\n  INSERT INTO @numbers(number)\r\n    SELECT number FROM (VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),\r\n      (14),(15),(16),(17),(18),(19),(20),(21),(22),(23),(24),(25),(26),(27),(28),(29),\r\n      (30),(31),(32),(33),(34),(35),(36),(37),(38),(39),(40))F(number)\r\n  INSERT INTO @Candidates(candidate)\r\n    SELECT DISTINCT word FROM \r\n      (--deletes\r\n      SELECT  STUFF(@word,number,1,'') FROM @numbers WHERE number &lt;=LEN(@Word)\r\n      UNION ALL--transposes\r\n      SELECT  STUFF(@word,number,2,SUBSTRING(@word,number+1,1)+SUBSTRING(@word,number,1)) \r\n       FROM @numbers WHERE number &lt;LEN(@Word)\r\n      UNION ALL --replaces\r\n  \tSELECT DISTINCT STUFF(@word,number,1,character) \r\n  \t  FROM @numbers CROSS JOIN @characters WHERE number &lt;=LEN(@Word)\r\n      UNION ALL --inserts \r\n  \tSELECT STUFF(@word,number,0,character) \r\n  \t  FROM @numbers CROSS JOIN @characters WHERE number &lt;=LEN(@Word)\r\n      UNION ALL --inserts at end of string\r\n  \tSELECT @word+character \r\n  \t  FROM @characters \r\n  \t)allcombinations(generatedWord)\r\n      INNER JOIN dbo.words ON generatedWord=word\r\n  \tWHERE words.CanonicalVersion IS null\r\n     RETURN\r\n  END\r\n  GO<\/pre>\n<p>With a test bank of 147 common misspellings, the function failed to come up with the correct version only thirteen times. That is less than a ten percent failure rate. In all, it took 380 ms on a slow test server to find all the one-edit-difference matches for 147 words. That\u2019s roughly 2.6 ms each<\/p>\n<p><img decoding=\"async\" class=\"wp-image-70005\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2017\/02\/hits.jpg\" \/><\/p>\n<p>This might be all you need for some applications, but we can pick up most of the mis-spellings that the edit-difference search failed to find, such as phonetic misspellings, using sound-alike.<\/p>\n<h1>Sound-Alike<\/h1>\n<p>For words that are mis-typed phonetically, there is the Metaphone algorithm. There are more advanced and much more complicated versions of this, but I\u2019m using the early public-domain form. <a href=\"https:\/\/www.simple-talk.com\/blogs\/string-comparisons-sql-metaphone-algorithm\/\">I have a SQL Version here<\/a>. Soundex, which is built-in to SQL, isn\u2019t much use because it was developed for hand-coding, before computers, and isn\u2019t discriminating enough.<\/p>\n<p>The sample word table has the metaphone stored with each word. This means that all you need to do is to find the metaphone for the word and search the metaphone column.<\/p>\n<p>Lets return to our extreme example from the introduction.<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">SELECT COALESCE(words.CanonicalVersion,words.word) AS candidate FROM words \r\n  WHERE metaphone =dbo.metaphone('sossyjez')<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"112\" height=\"40\" class=\"wp-image-70006\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2017\/02\/word-image-18.png\" \/><\/p>\n<p>We aren\u2019t always that lucky in getting just one candidate that matches. If we&#8221; by the same technique:<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">SELECT COALESCE(words.CanonicalVersion,words.word) AS candidate FROM words \r\nWHERE metaphone =dbo.metaphone('tendancy')<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"104\" height=\"80\" class=\"wp-image-70007\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2017\/02\/word-image-19.png\" \/><\/p>\n<p>We can get the most likely ones listed first in the candidate list by ordering the candidates by their edit difference:<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">SELECT COALESCE(words.CanonicalVersion,words.word) AS candidate FROM words \r\n  WHERE metaphone =dbo.metaphone('tendancy')\r\n  ORDER by COALESCE(dbo.DamLev(word,'tendancy',3),4) --order by their ascending edit difference\r\n  --but only up to three edit differences<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"101\" height=\"80\" class=\"wp-image-70008\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2017\/02\/word-image-20.png\" \/><\/p>\n<p>In a simple test run, for 30 common misspellings, a fifth of the misspellings in total, the metaphone comparison showed a different metaphone for the common misspelling than for the correct spelling. Only three of these were misspellings that the edit-difference approach also failed to resolve correctly, but it means that it was quite likely to suggest the wrong word, if used by itself. In other words, the metaphone approach would be no good on its for resolving those common misspellings, but is still a good fall back for when the edit-difference approach fails to deliver.<\/p>\n<p>The general problem with the metaphone approach comes from the language itself in that a great number of common words sound alike if one eliminates the \u2018syllabic peak or nucleus\u2019, by making vowels all alike. As a Metaphone tries to represent the sounds, in some cases it is very discriminating, too discriminating in fact, and in other cases it would come up with fifty matches.<\/p>\n<p>However, if you get too many, then you can fall back on a function that calculates the edit distance, preferably a fast Damerau-Levenshtein such as Steve Hatchett\u2019s version here <a href=\"http:\/\/blog.softwx.net\/2015\/01\/optimizing-damerau-levenshtein_19.html\">Optimizing the Damerau-Levenshtein Algorithm in TSQL<\/a> . You can then order the candidates by their edit difference to get the most likely candidates. The best approach is to limit the search to strings that are two edit differences away; anything further is pretty pointless for finding similar strings because at three edit-differences away, they just aren\u2019t similar! I believe that some applications attempt to find edit differences between the metaphones themselves but I think that is probably taking things too far.<\/p>\n<h1>Hard-coded Common Misspellings, Dialect Words and Aliases<\/h1>\n<p>Sometimes, a common misspelling is more than a single edit away from the intended search term, and won&#8217;t be picked up by either our &#8220;edit difference&#8221; or sound-alike algorithms. In such cases, you can treat it as a hard-coded alias. We also need a special way of catching dialect words and other synonyms that can\u2019t be caught any other way.<\/p>\n<p>Let\u2019s illustrate this simply.<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true \">-- We create a temporary table that holds a table that holds the English words for the\r\n  -- numbers one to ten\r\n  CREATE TABLE #Numberwords\r\n  (word VARCHAR(10) PRIMARY KEY,\r\n  CanonicalVersion VARCHAR(10) NULL,\r\n  ) \r\n  -- We put the numbers one to ten in, leaving the canonicalversion field with a NULL\r\n  -- bwecause these are, if I've typed them right, the canonical version\r\n  INSERT INTO #Numberwords (word)\r\n   VALUES ('one'),('two'),('three'),('four'),('five'),('six'),('seven'),\r\n  ('eight'),('nine'),('ten')\r\n  --now we put in some of the common misspellings, along with the proper 'canonical' spellings\r\n  INSERT INTO #Numberwords (word,CanonicalVersion)\r\n   VALUES('wun','one'),('too','two'),('free','three'),('fower','four'),('thive','five'),\r\n   ('sicks','six'),('sevvern','seven'),('ayt','eight'),('nign','nine'),('tenn','ten')\r\n  --and a few synonyms, just to show that we can cope with both\r\n  INSERT INTO #Numberwords (word,CanonicalVersion)\r\n   VALUES('1','one'),('2','two'),('3','three'),('4','four'),('5','five'),\r\n   ('6','six'),('7','seven'),('8','eight'),('9','nine'),('10','ten')\r\n   --and now we make sure that it always returns the canonical version\r\n  SELECT COALESCE(canonicalversion,word) FROM #Numberwords WHERE word LIKE 'fower'\r\n  --returns 'four'\r\n  SELECT COALESCE(canonicalversion,word) FROM #Numberwords WHERE word LIKE '5'\r\n  --returns 'five'\r\n  --and we can still so fuzzy searches of the canonical versions.\r\n  SELECT word FROM #numberwords WHERE CanonicalVersion IS NULL AND word LIKE 'th%'<\/pre>\n<p>You will see that this technique doesn\u2019t allow an alias or misspelling that is a real \u2018canonical\u2019 word. This means that word confusions can only be spotted in context, and also you\u2019re not going to cope with the difference between \u2018pants\u2019 and \u2018trousers\u2019 this way.<\/p>\n<p>It also doesn\u2019t allow you to misspell an alias. \u2018Crawdads\u2019 need to be spelled right if you are asking for a \u2018crayfish\u2019. However, in this case, you can allow aliases in the initial edit-difference approach. It is a matter of fine-tuning for your application.<\/p>\n<h1>Putting it all together<\/h1>\n<p>By now, if you\u2019ve been following my argument, the best tactic is probably going to be a composite one. The first stage is to see if a word is an alias or a known misspelling. If not, then has it an edit-distance of one to any canonical words in your table? If not then does it share a metaphone with any canonical words in your table? If yes, and there are too many, sort them in ascending edit difference and take the top few. Otherwise, just return them. If no, then it is time to get something at least by getting a limited number of words with an edit distance of two, using Steve Hatchett\u2019s version <a href=\"http:\/\/blog.softwx.net\/2015\/01\/optimizing-damerau-levenshtein_19.html\">of the Damerau-Levenshtein Algorithm<\/a>, specifying that it abandons its work on a particular word once it realises that it is more than two edit distances away.<\/p>\n<p>The general principles are:<\/p>\n<ul>\n<li>Use fast table searches where the word is the clustered primary key wherever possible.<\/li>\n<li>Where you are forced to use a slow algorithm, reduce your candidate set as much as possible beforehand using SQL\u2019s built-in text searching.<\/li>\n<li>Make sure that your search feature can learn, or rather be taught, from the search terms that cause difficulty. This means logging all searches and their success, and use this data to maintain the aliases.<\/li>\n<\/ul>\n<p>Here&#8217;s the complete &#8220;fuzzy search&#8221; algorithm:<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">IF OBJECT_ID (N'dbo.FuzzySearchOf') IS NOT NULL\r\n     DROP FUNCTION dbo.FuzzySearchOf\r\n  GO\r\n  CREATE FUNCTION dbo.FuzzySearchOf(@Searchterm VARCHAR(40))\r\n  \/**\r\n  summary:  &gt;\r\n  Returns all candidate words even if the input word is misspelt\r\n  Author: Phil Factor\r\n  Revision: 1.0\r\n  date: 16\/02\/2017\r\n  example:\r\n  Select * from dbo.FuzzySearchOf('sossyjez')\r\n  Select * from dbo.FuzzySearchOf('acheeve')\r\n  Select * from dbo.FuzzySearchOf('deevyate')--does a \r\n  returns:  &gt;\r\n   a table containing words\r\n  Dependency: \r\n  Words: A table of common words\r\n  DamLev http:\/\/blog.softwx.net\/2015\/01\/optimizing-damerau-levenshtein_19.html\r\n  dbo.OneEditDifferenceTo(@word)\r\n  dbo.metaphone(@searchterm)\r\n  **\/\r\n  RETURNS @candidates TABLE(Candidate VARCHAR(40))\r\n  AS\r\n    -- body of the function\r\n    BEGIN\r\n    DECLARE @Rowcount INT;\r\n    \/* The first stage is to see if a word is an alias or a known misspelling.*\/\r\n    INSERT INTO @candidates  (Candidate)\r\n      SELECT COALESCE(words.CanonicalVersion, words.word)  FROM dbo.words\r\n        WHERE words.word = @Searchterm;\r\n    -- If not a known word or an alias, then has it an edit-distance of one to any canonical words \r\n    -- IN the 'Words' table\r\n    IF @@RowCount = 0\r\n      BEGIN\r\n      INSERT INTO @candidates  (Candidate)\r\n        SELECT OneEditDifferenceTo.Candidate FROM dbo.OneEditDifferenceTo(@Searchterm);\r\n      IF @@RowCount = 0\r\n        BEGIN --If not then does it share a metaphone with any words in your table?\r\n        INSERT INTO @candidates  (Candidate)\r\n          SELECT COALESCE(words.CanonicalVersion, words.word) AS candidate\r\n          FROM dbo.words  WHERE words.Metaphone = dbo.Metaphone(@Searchterm);\r\n        SELECT @Rowcount = @@RowCount;\r\n        IF @Rowcount &gt; 5 --If yes, and there are too many, then get what there are and \r\n          BEGIN --take the top few in ascending edit difference.\r\n          DELETE FROM @candidates;\r\n          INSERT INTO @candidates (Candidate)\r\n            SELECT TOP 5 COALESCE(words.CanonicalVersion, words.word) AS candidate\r\n            FROM dbo.words  WHERE words.Metaphone = dbo.Metaphone(@Searchterm)\r\n            ORDER BY COALESCE(dbo.DamLev(words.word, @Searchterm, 3), 4); --just do three levels\r\n          END;\r\n        IF @Rowcount = 0\r\n          BEGIN\r\n          \/* Get a limited number of words with an edit distance of two, using Steve Hatchett\u2019s \r\n  \t\tversion  of the Damerau-Levenshtein Algorithm, specifying that it abandons its work \r\n  \t\ton a particular word once it realises that it is more than two edit distances away*\/\r\n          INSERT INTO @candidates (Candidate)\r\n            SELECT TOP 5 words.word\r\n            FROM dbo.words  WHERE words.CanonicalVersion IS NULL\r\n  \t\t\tAND word LIKE LEFT(@Searchterm,1)+'%'\r\n              AND COALESCE(dbo.DamLev(words.word, @Searchterm, 2), 3) &lt; 3;\r\n          END;\r\n        END;\r\n      END;\r\n    RETURN;\r\n    END;\r\n  GO<\/pre>\n<p>We can now test this with this:<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">DECLARE @timer DateTime2\r\n  SELECT @timer=GETDATE()\r\n  SELECT good, bad, candidate FROM (VALUES\r\n  ('across','accross'),('aggressive','agressive'),('aggression','agression'),('apparently','apparantly'),\r\n  ('appearance','appearence'),('argument','arguement'),('assassination','assasination'),\r\n  ('absence','absense'),('absence','absance'),('absence','absense'),('acceptable','acceptible'),\r\n  ('accidentally','accidentaly'),('accommodate','acommodate'),('achieve','acheive'),('acknowledge','acknowlege'),\r\n  ('acknowledge','aknowledge'),('acquaintance','acquaintence'),('acquaintance','aquaintance'),\r\n  ('acquire','aquire'),('acquire','adquire'),('acquit','aquit'),('acreage','acrage'),('acreage','acerage'),\r\n  ('address','adress'),('adultery','adultary'),('advisable','adviseable'),('advisable','advizable'),\r\n  ('aggression','agression'),('aggressive','agressive'),('allegiance','allegaince'),('allegiance','allegience'),\r\n  ('allegiance','alegiance'),('almost','allmost'),('amateur','amatuer'),('amateur','amature'),\r\n  ('annually','anually'),('annually','annualy'),('apparent','apparant'),('apparent','aparent'),\r\n  ('arctic','artic'),('argument','arguement'),('atheist','athiest'),('awful','awfull'),('awful','aweful'),\r\n  ('basically','basicly'),('beginning','begining'),('believe','beleive'),('believe','belive'),\r\n  ('bizarre','bizzare'),('business','buisness'),('calendar','calender'),('Caribbean','Carribean'),\r\n  ('cemetery','cemeiary'),('chauffeur','chauffer'),('colleague','collegue'),('coming','comming'),\r\n  ('committee','commitee'),('completely','completly'),('conscious','concious'),('curiosity','curiousity'),\r\n  ('definitely','definately'),('dilemma','dilemna'),('disappear','dissapear'),('disappoint','dissapoint'),\r\n  ('ecstasy','ecsiacy'),('embarrass','embarass'),('environment','enviroment'),('existence','existance'),\r\n  ('familiar','familar'),('finally','finaly'),('fluorescent','florescent'),\r\n  ('foreign','foriegn'),('foreseeable','forseeable'),('forty','fourty'),('forward','foward'),\r\n  ('friend','freind'),('further','futher'),('gist','jist'),('glamorous','glamourous'),('government','goverment'),\r\n  ('guard','gaurd'),('happened','happend'),('harass','harrass'),('harassment','harrassment'),('honorary','honourary'),\r\n  ('humorous','humourous'),('idiosyncrasy','idiosyncracy'),('immediately','immediatly'),\r\n  ('incidentally','incidently'),('independent','independant'),('interrupt','interupt'),\r\n  ('irresistible','irresistable'),('knowledge','knowlege'),('liaise','liase'),('liaison','liason'),('lollipop','lollypop'),\r\n  ('millennium','millenium'),('millennia','millenia'),('Neanderthal','Neandertal'),('necessary','neccessary'),\r\n  ('noticeable','noticable'),('occasion','ocassion'),('occasion','occassion'),('occurred','occured'),\r\n  ('occurring','occuring'),('occurrence','occurance'),('occurrence','occurence'),('pavilion','pavilllon'),\r\n  ('persistent','persistant'),('pharaoh','pharoah'),('piece','peice'),('politician','politican'),\r\n  ('Portuguese','Portugese'),('possession','Possession'),('preferred','prefered'),('preferring','prefering'),\r\n  ('propaganda','propoganda'),('publicly','publically'),('really','realy'),('receive','recieve'),\r\n  ('referred','refered'),('referring','refering'),('religious','religous'),('remember','rember'),\r\n  ('remember','remeber'),('resistance','resistence'),('sense','sence'),('separate','seperate'),\r\n  ('siege','seige'),('successful','succesful'),('supersede','supercede'),('surprise','suprise'),\r\n  ('tattoo','tatoo'),('tendency','tendancy'),('therefore','therefor'),('threshold','threshhold'),\r\n  ('tomorrow','tommorow'),('tomorrow','tommorrow'),('tongue', 'tounge'),('truly','truely'),\r\n  ('unforeseen','unforseen'),('unfortunately','unfortunatly'),('until','untill'),('weird','wierd'),\r\n  ('wherever','whereever'),('which','wich')\r\n  )f(good,bad) OUTER APPLY dbo.FuzzySearchOf(bad)<\/pre>\n<h1>Conclusions<\/h1>\n<p>Many times in the past, I\u2019ve had arguments with members of the development teams who, when we are discussing fuzzy searches, draw themselves up to their full height, look dignified, and say that a relational database is no place to be doing fuzzy searches or spell-checking. It should, they say, be done within the application layer. This is nonsense, and we can prove it with a stopwatch.<\/p>\n<p>We are dealing with data. Relational databases do this well, but it just has to be done right. This implies searching on well-indexed fields such as the primary key, and not being ashamed of having quite large working tables. It means dealing with the majority of cases as rapidly as possible. It implies learning from failures to find a match. It means, most of all, a re-think from a procedural strategy.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A series of arguments with developers who insist that fuzzy searches or spell-checking be done within the application rather then a relational database inspired Phil Factor to show how it is done.  When the database must find relevant material from search terms entered by users, the database must learn to expect, and deal with, both expected and unexpected &hellip;<\/p>\n","protected":false},"author":154613,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143531],"tags":[],"coauthors":[6813],"class_list":["post-70004","post","type-post","status-publish","format-standard","hentry","category-t-sql-programming-sql-server"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/70004","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/154613"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=70004"}],"version-history":[{"count":10,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/70004\/revisions"}],"predecessor-version":[{"id":92501,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/70004\/revisions\/92501"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=70004"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=70004"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=70004"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=70004"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}