{"id":69926,"date":"2017-02-14T12:32:05","date_gmt":"2017-02-14T12:32:05","guid":{"rendered":"https:\/\/www.simple-talk.com\/?p=69926"},"modified":"2021-09-29T16:21:17","modified_gmt":"2021-09-29T16:21:17","slug":"string-comparisons-sql-metaphone-algorithm","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/blogs\/string-comparisons-sql-metaphone-algorithm\/","title":{"rendered":"String Comparisons in SQL: The Metaphone Algorithm"},"content":{"rendered":"<p>Metaphone algorithms are designed to produce an approximate phonetic representation, in ASCII, of regular &#8220;dictionary&#8221; words and names in English and some Latin-based languages. It is intended for indexing words by their English pronunciation. It is one of the more popular of the phonetic algorithms and was published by Lawrence Philips in 1990. A Metaphone is up to ten characters in length.<\/p>\n<p>It is used for fuzzy searches for records where each string to be searched has an index with a Metaphone key. You search for all records with the same or similar metaphone key and then refine the search by some ranking algorithm such as Damerau\u2013Levenshtein distance. Metaphone searches are particularly popular with \u2018ancestor\u2019 sites that search on surnames where spellings vary considerably for the same surname. The current version, Metaphone 3, is actively maintained by Lawrence Philips, developed to account for all spelling variations commonly found in English words, first and last names found in the United States and Europe, and non-English words whose native pronunciations are familiar to English-speakers. The source of Metaphone 3 is proprietary, and Lawrence charges a fee to supply the source.<\/p>\n<p>There is a version of Metaphone that is built into in PHP, and I have used the original public domain source of Metaphone, and modified it slightly \u00a0to get the same result as the PHP version.<\/p>\n<p>I have commented the source as clearly as I can to explain what is going on.<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true  \">IF  OBJECT_ID('dbo.Metaphone','FN') IS NOT NULL --drop any existing metaphone function\r\n   DROP FUNCTION dbo.Metaphone\r\ngo\r\nCREATE FUNCTION dbo.Metaphone\r\n\/**\r\nsummary:   &gt;\r\nThe Metaphone  phonetic algorithm was devised by Lawrence Philips in 1990.\r\nIt reduces words to their basic sounds, but produces a more accurate encoding,\r\nthan Soundex for matching words that sound similar. \r\nMetaphone is a built-in operator in a number of systems such as PHP but there\r\nseemed to be no available SQL Version until I wrote this. It is merely\r\na reverse engineering of the original published algorithm but tweaked to ensure\r\nthat it gave the same result as the PHP version.\r\nAuthor: Phil Factor\r\nRevision: 1.0\r\ndate: 21 Jan 2017\r\nexample: &gt;\r\n\tSelect dbo.Metaphone ('opportunities')\r\n\t--OPRTNTS\r\nParameters: \r\n\t-- @String (a word -all punctuation will be stripped out)\r\n  A string representing the Metaphone equivalent of the word. \r\n**\/  \r\n(\r\n\t@String VARCHAR(30)\r\n)\r\nRETURNS VARCHAR(10)\r\nAS\r\nBEGIN\r\nDECLARE  @New BIT, @ii INT, @Metaphone VARCHAR(28), @Len INT, @where INT;\r\nDECLARE @This CHAR, @Next CHAR, @Following CHAR, @Previous CHAR, @silent BIT;\r\n\r\nSELECT @String = UPPER(LTRIM(COALESCE(@String, ''))); --trim and upper case\r\nSELECT @where= PATINDEX ('%[^A-Z]%',@String COLLATE Latin1_General_CI_AI ) \r\nWHILE  @where&gt;0 --strip out all non-alphabetic characters!(Edited. Thanks Ros Presser) \r\n\tBEGIN\r\n\tSELECT @String=STUFF(@string,@where,1,'')\r\n\tSELECT @where=PATINDEX ('%[^A-Z]%',@String COLLATE Latin1_General_CI_AI ) \r\n    END\r\nIF(LEN(@String) &lt; 2) RETURN  @String\r\n\r\n--do the start of string stuff first.\r\n--If the word begins with 'KN', 'GN', 'PN', 'AE', 'WR', drop the first letter.\r\n-- \"Aebersold\", \"Gnagy\", \"Knuth\", \"Pniewski\", \"Wright\"\r\nIF SUBSTRING(@String, 1, 2) IN ( 'KN', 'GN', 'PN', 'AE', 'WR' )\r\n  SELECT @String = STUFF(@String, 1, 1, '');\r\n-- Beginning of word: \"x\" change to \"s\" as in \"Deng Xiaopeng\"\r\nIF SUBSTRING(@String, 1, 1) = 'X'\r\n  SELECT @String = STUFF(@String, 1, 1, 'S');\r\n-- Beginning of word: \"wh-\" change to \"w\" as in \"Whatsoever\"\r\nIF @String LIKE 'WH%'\r\n  SELECT @String = STUFF(@String, 1, 1, 'W');\r\n-- Set up for While loop \r\nSELECT @Len = LEN(@String), @Metaphone = '', -- Initialize the main variable \r\n  @New = 1, -- this variable only used next 10 lines!!! \r\n  @ii = 1; --Position counter\r\n--\r\nWHILE((LEN(@Metaphone) &lt;= 8) AND (@ii &lt;= @Len))\r\n  BEGIN --SET up the 'pointers' for this loop-around }\r\n  SELECT @Previous =\r\n    CASE WHEN @ii &gt; 1 THEN SUBSTRING(@String, @ii - 1, 1) ELSE '' END,\r\n    -- originally a nul terminated string }\r\n    @This = SUBSTRING(@String, @ii, 1),\r\n    @Next =\r\n      CASE WHEN @ii &lt; @Len THEN SUBSTRING(@String, @ii + 1, 1) ELSE '' END,\r\n    @Following =\r\n      CASE WHEN((@ii + 1) &lt; @Len) THEN SUBSTRING(@String, @ii + 2, 1) ELSE\r\n                                                                        '' END\r\n   -- 'CC' inside word \r\n  --SELECT @Previous,@this,@Next,@Following,@New,@ii,@Len,@Metaphone\r\n  \/* Drop duplicate adjacent letters, except for C.*\/\r\n  IF @This=@Previous AND @This&lt;&gt; 'C' \r\n\tBEGIN\r\n\t--we do nothing \r\n\tSELECT @New=0\r\n    END\r\n  \/*Drop all vowels unless it is the beginning.*\/\r\n  ELSE IF @This IN ( 'A', 'E', 'I', 'O', 'U' )\r\n    BEGIN\r\n    IF @ii = 1 --vowel at the beginning\r\n      SELECT @Metaphone = @This;\r\n    \/* B -&gt; B unless at the end of word after \"m\", as in \"dumb\", \"Comb\" *\/\r\n    END;\r\n  ELSE IF @This = 'B' AND NOT ((@ii = @Len) AND (@Previous = 'M'))\r\n         BEGIN\r\n         SELECT @Metaphone = @Metaphone + 'B';\r\n         END;\r\n         -- -mb is silent \r\n \/*'C' transforms to 'X' if followed by 'IA' or 'H' (unless in latter case, it is part of '-SCH-',\r\n in which case it transforms to 'K'). 'C' transforms to 'S' if followed by 'I', 'E', or 'Y'. \r\n Otherwise, 'C' transforms to 'K'.*\/\r\n  ELSE IF @This = 'C'\r\n         BEGIN -- -sce, i, y = silent \r\n         IF NOT (@Previous= 'S') AND (@Next IN ( 'H', 'E', 'I', 'Y' )) --front vowel set \r\n           BEGIN\r\n\t\t\t   IF(@Next = 'I') AND (@Following = 'A')\r\n\t\t\t\t SELECT @Metaphone = @Metaphone + 'X'; -- -cia- \r\n\t\t\t   ELSE IF(@Next IN ( 'E', 'I', 'Y' ))\r\n\t\t\t\t SELECT @Metaphone = @Metaphone + 'S'; -- -ce, i, y = 'S' }\r\n\t\t\t   ELSE IF(@Next = 'H') AND (@Previous = 'S')\r\n\t\t\t\t SELECT @Metaphone = @Metaphone + 'K'; -- -sch- = 'K' }\r\n\t\t\t   ELSE IF(@Next = 'H')\r\n\t\t\t\t BEGIN\r\n\t\t\t\t   IF(@ii = 1) AND ((@ii + 2) &lt;= @Len) \r\n\t\t\t\t\t AND NOT(@Following IN ( 'A', 'E', 'I', 'O', 'U' ))\r\n\t\t\t\t\t   SELECT @Metaphone = @Metaphone + 'K';\r\n\t\t\t\t   ELSE\r\n\t\t\t\t\t SELECT @Metaphone = @Metaphone + 'X';\r\n\t\t\t\t   END\r\n           End  \r\n\t\t ELSE \r\n           SELECT @Metaphone = @Metaphone +CASE WHEN @Previous= 'S' THEN '' else 'K' end;\r\n         \t   -- Else silent \r\n         END; -- Case C }\r\n  \/*'D' transforms to 'J' if followed by 'GE', 'GY', or 'GI'. Otherwise, 'D' \r\n  transforms to 'T'.*\/\r\n  ELSE IF @This = 'D'\r\n         BEGIN\r\n         SELECT @Metaphone = @Metaphone\r\n           + CASE WHEN(@Next = 'G') AND (@Following IN ( 'E', 'I', 'Y' )) --front vowel set \r\n                 THEN 'J' ELSE 'T' END;\r\n         END;\r\n  ELSE IF @This = 'G'\r\n         \/*Drop 'G' if followed by 'H' and 'H' is not at the end or before a vowel. Drop 'G' \r\n if followed by 'N' or 'NED' and is at the end.\r\n 'G' transforms to 'J' if before 'I', 'E', or 'Y', and it is not in 'GG'. \r\n Otherwise, 'G' transforms to 'K'.*\/\r\n    BEGIN\r\n  SELECT @silent = \r\n    CASE WHEN (@Next = 'H') AND (@Following IN ('A','E','I','O','U'))\r\n\tAND (@ii &gt; 1) AND (((@ii+1) = @Len) OR ((@Next = 'n') AND\r\n    (@Following = 'E') AND SUBSTRING(@String,@ii+3,1) = 'D') AND ((@ii+3) = @Len)) \r\n-- Terminal -gned \r\n  AND (@Previous = 'i') AND (@Next = 'n')\r\n  THEN 1 \r\n -- if not start and near -end or -gned.) \r\n  WHEN (@ii &gt; 1) AND (@Previous = 'D')-- gnuw\r\n    AND (@Next IN ('E','I','Y')) --front vowel set \r\n  THEN 1 -- -dge, i, y \r\n  ELSE 0 END\r\n  IF NOT (@silent=1)\r\n    SELECT @Metaphone = @Metaphone \r\n\t+ CASE WHEN (@Next IN ('E','I','Y')) --front vowel set \r\n      THEN  'J' ELSE  'K' END\r\n  END\r\n  \/*Drop 'H' if after vowel and not before a vowel.\r\n  or the second char of  \"-ch-\", \"-sh-\", \"-ph-\", \"-th-\", \"-gh-\"*\/\r\n\r\n  ELSE IF @This = 'H'\r\n    BEGIN\r\n    IF NOT ( (@ii= @Len) OR (@Previous IN  ( 'C', 'S', 'T', 'G' ))) \r\n\t   AND (@Next IN ( 'A', 'E', 'I', 'O', 'U' ) )\r\n     SELECT @Metaphone = @Metaphone + 'H';\r\n         -- else silent (vowel follows) }\r\n    END;\r\n  ELSE IF @This IN --some get no substitution\r\n       ( 'F', 'J', 'L', 'M', 'N', 'R' )\r\n     BEGIN\r\n     SELECT @Metaphone = @Metaphone + @This;\r\n     END;\r\n  \/*'CK' transforms to 'K'.*\/\r\n  ELSE IF @This = 'K'\r\n     BEGIN\r\n     IF(@Previous &lt;&gt; 'C')\r\n       SELECT @Metaphone = @Metaphone + 'K';\r\n     END;\r\n  \/*'PH' transforms to 'F'.*\/\r\n  ELSE IF @This = 'P'\r\n    BEGIN\r\n      IF(@Next = 'H') SELECT @Metaphone = @Metaphone + 'F', @ii = @ii + 1;\r\n      -- Skip the 'H' \r\n      ELSE\r\n        SELECT @Metaphone = @Metaphone + 'P';\r\n      END;\r\n  \/*'Q' transforms to 'K'.*\/\r\n  ELSE IF @This = 'Q'\r\n    BEGIN\r\n      SELECT @Metaphone = @Metaphone + 'K';\r\n    END;\r\n  \/*'S' transforms to 'X' if followed by 'H', 'IO', or 'IA'.*\/\r\n  ELSE IF @This = 'S'\r\n    BEGIN\r\n    SELECT @Metaphone = @Metaphone + \r\n\t  CASE \r\n\t\tWHEN(@Next = 'H')\r\n\t\t OR( (@ii&gt; 1) AND (@Next = 'i') \r\n\t\t  AND (@Following IN ( 'O', 'A' ) )\r\n\t\t  ) \r\n\t\tTHEN 'X' ELSE 'S' END;\r\n     END;\r\n  \/*'T' transforms to 'X' if followed by 'IA' or 'IO'. 'TH' transforms \r\nto '0'. Drop 'T' if followed by 'CH'.*\/\r\n  ELSE IF @This = 'T'\r\n    BEGIN\r\n    SELECT @Metaphone = @Metaphone\r\n      + CASE \r\n\t    WHEN(@ii = 1) AND (@Next = 'H') AND (@Following = 'O') \r\n\t       THEN 'T' -- Initial Tho- }\r\n        WHEN(@ii &gt; 1) AND (@Next = 'i') \r\n\t\t     AND (@Following IN ( 'O', 'A' )) \r\n\t\t  THEN 'X'\r\n        WHEN(@Next = 'H') THEN '0'\r\n        WHEN NOT((@Next = 'C') AND (@Following = 'H')) \r\n\t\t  THEN 'T'\r\n        ELSE '' END;\r\n         -- -tch = silent }\r\n    END;\r\n  \/*'V' transforms to 'F'.*\/\r\n  ELSE IF @This = 'V'\r\n    BEGIN\r\n    SELECT @Metaphone = @Metaphone + 'F';\r\n    END;\r\n  \/*'WH' transforms to 'W' if at the beginning. Drop 'W' if not followed by a vowel.*\/\r\n  \/*Drop 'Y' if not followed by a vowel.*\/\r\n  ELSE IF @This IN ( 'W', 'Y' )\r\n    BEGIN\r\n    IF @Next IN ( 'A', 'E', 'I', 'O', 'U' )\r\n      SELECT @Metaphone = @Metaphone + @This;\r\n     --else silent \r\n     \/*'X' transforms to 'S' if at the beginning. Otherwise, 'X' transforms to 'KS'.*\/\r\n    END;\r\n  ELSE IF @This = 'X'\r\n    BEGIN\r\n      SELECT @Metaphone = @Metaphone + 'KS';\r\n    END;\r\n  \/*'Z' transforms to 'S'.*\/\r\n  ELSE IF @This = 'Z'\r\n     BEGIN\r\n       SELECT @Metaphone = @Metaphone + 'S';\r\n     END;\r\n  ELSE\r\n  RETURN 'error with '''+ @This+ '''';\r\n  -- end\r\n  SELECT @ii = @ii + 1;\r\n  END; -- While \r\nreturn @Metaphone \r\nEND\r\ngo\r\n\/*\r\nCheck against the PHP implementation*\/\r\nSELECT * FROM \r\n(SELECT dbo.Metaphone ('craven') AS Attempt,'craven' AS original ,'KRFN' AS canonical                      \r\nUNION ALL SELECT dbo.Metaphone ('platitudinous'),'platitudinous','PLTTTNS'\r\nUNION ALL SELECT dbo.Metaphone ('woodcarvings'),'woodcarvings','WTKRFNKS'\r\nUNION ALL SELECT dbo.Metaphone ('overlaid'),'overlaid','OFRLT'\r\nUNION ALL SELECT dbo.Metaphone ('solitaries'),'solitaries','SLTRS'\r\nUNION ALL SELECT dbo.Metaphone ('beatific'),'beatific', 'BTFK'\r\nUNION ALL SELECT dbo.Metaphone ('plaza'),'plaza','PLS'\r\nUNION ALL SELECT dbo.Metaphone ('paramilitary'),'paramilitary','PRMLTR'\r\nUNION ALL SELECT dbo.Metaphone ('synod'),'synod','SNT'\r\nUNION ALL SELECT dbo.Metaphone ('marinas'),'marinas','MRNS'\r\nUNION ALL SELECT dbo.Metaphone ('hyperventilation'),'hyperventilation','PRFNTLXN'\r\nUNION ALL SELECT dbo.Metaphone ('celebrant'),'celebrant','SLBRNT'\r\nUNION ALL SELECT dbo.Metaphone ('pipsqueaks'),'pipsqueaks','PPSKKS'\r\nUNION ALL SELECT dbo.Metaphone ('dazzles'),'dazzles', 'TSLS'\r\nUNION ALL SELECT dbo.Metaphone ('bloodbaths'),'bloodbaths','BLTB0S'\r\nUNION ALL SELECT dbo.Metaphone ('lotion'),'lotion','LXN'\r\nUNION ALL SELECT dbo.Metaphone ('agreeable'),'agreeable','AKRBL'\r\nUNION ALL SELECT dbo.Metaphone ('shariah'),'shariah','XR'\r\nUNION ALL SELECT dbo.Metaphone ('direction'),'direction','TRKXN'\r\nUNION ALL SELECT dbo.Metaphone ('constricts'),'constricts','KNSTRKTS'\r\nUNION ALL SELECT dbo.Metaphone ('avowedly'),'avowedly','AFWTL'\r\nUNION ALL SELECT dbo.Metaphone ('exorcisms'),'exorcisms','EKSRSSMS'\r\nUNION ALL SELECT dbo.Metaphone ('starches'),'starches','STRXS'\r\nUNION ALL SELECT dbo.Metaphone ('poses'),'poses','PSS'\r\nUNION ALL SELECT dbo.Metaphone ('levies'),'levies','LFS'\r\nUNION ALL SELECT dbo.Metaphone ('clicks'),'clicks','KLKS'\r\nUNION ALL SELECT dbo.Metaphone ('minstrels'),'minstrels','MNSTRLS'\r\nUNION ALL SELECT dbo.Metaphone ('propounding'),'propounding','PRPNTNK'\r\nUNION ALL SELECT dbo.Metaphone ('opalescent'),'opalescent','OPLSNT'\r\nUNION ALL SELECT dbo.Metaphone ('hotline'),'hotline','HTLN'\r\nUNION ALL SELECT dbo.Metaphone ('soporifically'),'soporifically','SPRFKL'\r\nUNION ALL SELECT dbo.Metaphone ('python'),'python','P0N'\r\nUNION ALL SELECT dbo.Metaphone ('drab'),'drab','TRB'\r\nUNION ALL SELECT dbo.Metaphone ('appraised'),'appraised','APRST'\r\nUNION ALL SELECT dbo.Metaphone ('commotions'),'commotions','KMXNS'\r\nUNION ALL SELECT dbo.Metaphone ('defeatists'),'defeatists','TFTSTS'\r\nUNION ALL SELECT dbo.Metaphone ('dispensations'),'dispensations','TSPNSXNS'\r\nUNION ALL SELECT dbo.Metaphone ('downfall'),'downfall','TNFL'\r\nUNION ALL SELECT dbo.Metaphone ('naturalising'),'naturalising','NTRLSNK')k\r\n\r\nWHERE attempt &lt;&gt; canonical\r\nIF @@RowCount&gt;0 RAISERROR( 'As you can see, there was a problem somewhere',16,1)\r\n\r\n<\/pre>\n<p>The source to this function is <a href=\"https:\/\/github.com\/Phil-Factor\/SQLMetaPhone\">here on Github<\/a>.<\/p>\n<p>See also :-<\/p>\n<ul>\n<li><a href=\"https:\/\/www.simple-talk.com\/blogs\/string-comparisons-in-sql-the-longest-common-subsequence\/\">String Comparisons in SQL: The Longest Common Subsequence<\/a><\/li>\n<li><a href=\"https:\/\/www.simple-talk.com\/blogs\/string-comparisons-in-sql-edit-distance-and-the-levenshtein-algorithm\/\">String Comparisons in SQL: Edit Distance and the Levenshtein algorithm<\/a><\/li>\n<li><a href=\"https:\/\/www.simple-talk.com\/blogs\/string-comparisons-in-sql-the-longest-common-substring\/\">String Comparisons in SQL: The Longest Common Substring.<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>When exploring the use of the Metaphone algorithm for fuzzy search, Phil couldn&#8217;t find a SQL version of the algorithm so he wrote one. The Metaphone algorithm is built in to PHP, and is widely used for string searches where you aren&#8217;t always likely to get exact matches, such as ancestral research and historical documents. It is particularly useful when comparing strings word-by-word. With a SQL version, it is easy to experiment on large quantities of data!&hellip;<\/p>\n","protected":false},"author":154613,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[2,143531],"tags":[],"coauthors":[6813],"class_list":["post-69926","post","type-post","status-publish","format-standard","hentry","category-blogs","category-t-sql-programming-sql-server"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/69926","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/154613"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=69926"}],"version-history":[{"count":7,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/69926\/revisions"}],"predecessor-version":[{"id":69972,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/69926\/revisions\/69972"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=69926"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=69926"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=69926"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=69926"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}