Preamble
Sometimes, when you tackle a different problem in SQL, one can hit on techniques that come in handy in all sorts of other contexts. This is the theme of this article, which uses a bank of all common words of the English language.
Cheating at Scrabble
I was playing Scrabble the other day and faced, as usual with an impossible hand. Scrabble is, as you probably know, a game where you pick tiles from a sack, and each tile has a letter written on it. You are allowed up to seven tiles at a time, and you have to place these, made up into words, on a game board marked with a 15-by-15 grid. The words are formed across and down in crossword fashion, and must be real words in common use. Each letter is scored individually, but the score is boosted by special squares on the board that give you double, or triple, word, or letter, scores.
It occurred to me, as it must have done to many others, that one could cheat at this game with a surreptitious iPhone or iPod and a simple word bank.
If you based this application on SQL Server, using a simple HTML interface, it would be easy to find all the words that could be made up from your seven tiles. Because you will need to link in with another word, that will come to eight letters. In some denser games, even more than eight-letter words are made as more existing words are crossed.
The first exercise will be to find all the permutation of the characters in your hand. Actually, if you are being subtle, you can restrict yourself to a subset, since only a small number of the combination of vowels and consonants are actually allowed in English. We won’t be subtle here: we’ll use the brute-force attack to the problem. The simplest way is to do a series of joins to a table of letters, but I’ll try a more flexible approach that uses a variation of the card-sharper’s shuffle instead. To do this, you will firstly need a number table. This number table is used frequently in SQL and you may already have one in your test database. If you haven’t, this stored procedure will deal with the task of creating and populating the number table.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
IF EXISTS ( SELECT 'found' FROM sys.objects WHERE name='MaybeBuildNumberTable' ) DROP PROCEDURE MaybeBuildNumberTable go CREATE PROCEDURE [dbo].[MaybeBuildNumberTable] AS BEGIN SET NOCOUNT ON IF NOT EXISTS ( SELECT * FROM dbo.sysobjects WHERE id=OBJECT_ID(N'[dbo].[Numbers]') AND OBJECTPROPERTY(id, N'IsUserTable')=1 ) BEGIN CREATE TABLE [dbo].[Numbers] ( [number] [int], CONSTRAINT [Index_Numbers] PRIMARY KEY CLUSTERED ([number] ASC) ON [PRIMARY] ) ON [PRIMARY] END IF NOT EXISTS(SELECT 1 FROM numbers WHERE number=99999) BEGIN TRUNCATE TABLE numbers ;WITH Digits(i) AS (SELECT i FROM (VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (0)) AS X(i)) INSERT INTO numbers(number) SELECT (D5.i*100000 + D4.i*10000 + D3.i * 1000 + D2.i * 100 + D1.i * 10 + D0.i + 1) AS seq FROM Digits AS D0, Digits AS D1, Digits AS D2, Digits AS D3, Digits AS D4, Digits AS D5 END END go Execute MaybeBuildNumberTable |
With these basics out of the way, we can now create the Table-Valued Function that returns the permutations in sequence of the characters you give it. This will work with any ASCII character and can be altered to deal with unicode, of course. All we are doing here is creating an empty table of the right size and then filling it with all the permutations of the characters you supply to the function. Normally, permutations will be done with a series of self-joins to a table with all the characters, one per row, but here we want something that will be useful when you do not know the length of the string in advance. (Permutations are great for doing many of the ‘graph’ problems, such as finding the shortest route, network routing, or time-tabling)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 |
if exists (select 'found' from sys.objects where name ='PermutationsOf') drop function PermutationsOf go CREATE FUNCTION PermutationsOf (@String VARCHAR(10)) /** summary: > returns a simple table of all the permutations of a string of up to eight characters. (one can do more if you are prepared to wait a bit) The algorithm is based on the idea of a cunjurors shuffle which guarantees to return the pack to its original order. In this case, it shuffles the string to every combination. It uses the 'quirky update' in order to do the shuffle. example: - code: --test for correctness Select count(*),string from dbo.PermutationsOf ('12345678') group by string having count(*)>1 - code: Select * from dbo.PermutationsOf ('physical') - code: Select * from dbo.PermutationsOf (null) returns: a table listing all the alternative permutations of the letters. If you gibe a string like '111' then all permutations will be the same **/ RETURNS @alternatives TABLE ( number INT primary key, String VARCHAR(10) ) AS BEGIN DECLARE @LenString INT DECLARE @Iterations INT SELECT @LenString=LEN(@String), @iterations=CASE @Lenstring WHEN 1 THEN 1 WHEN 2 THEN 2 WHEN 3 THEN 6 WHEN 4 THEN 24 WHEN 5 THEN 120 WHEN 6 THEN 720 WHEN 7 THEN 5040 WHEN 8 THEN 40320 ELSE NULL END IF @iterations IS NULL BEGIN INSERT INTO @alternatives ( number,string) SELECT 1,'Sorry' UNION SELECT 2,'String' UNION SELECT 3,'wrong' UNION SELECT 4,'length' RETURN END INSERT INTO @alternatives (Number, string) SELECT number, '' FROM numbers WHERE number<=@iterations UPDATE @alternatives --progressively shuffle the string SET @string=string =CASE WHEN number=0 THEN @string WHEN number%5040=0 THEN SUBSTRING(@string, 8, 1)+SUBSTRING(@string, 7, 1) +SUBSTRING(@string, 6, 1)+SUBSTRING(@string, 5, 1) +SUBSTRING(@string, 4, 1)+SUBSTRING(@string, 2, 1) +LEFT(@string, 1)+SUBSTRING(@string, 3, 1) +SUBSTRING(@string, 9, @lenstring-4) WHEN number%720=0 THEN SUBSTRING(@string, 7, 1)+SUBSTRING(@string, 6, 1) +SUBSTRING(@string, 5, 1)+SUBSTRING(@string, 4, 1) +SUBSTRING(@string, 2, 1)+LEFT(@string, 1) +SUBSTRING(@string, 3, 1)+SUBSTRING(@string, 8, @lenstring-4) WHEN number%120=0 THEN SUBSTRING(@string, 6, 1)+SUBSTRING(@string, 5, 1) +SUBSTRING(@string, 4, 1)+SUBSTRING(@string, 2, 1) +LEFT(@string, 1)+SUBSTRING(@string, 3, 1) +SUBSTRING(@string, 7, @lenstring-4) WHEN number%24=0 THEN SUBSTRING(@string, 5, 1)+SUBSTRING(@string, 4, 1) +SUBSTRING(@string, 2, 1)+LEFT(@string, 1) +SUBSTRING(@string, 3, 1)+SUBSTRING(@string, 6, @lenstring-4) WHEN number%6=0 THEN SUBSTRING(@string, 4, 1)+SUBSTRING(@string, 2, 1) +LEFT(@string, 1)+SUBSTRING(@string, 3, 1) +SUBSTRING(@string, 5, @lenstring-4) WHEN number%3 IN (1, 2) THEN SUBSTRING(@string, 2, 2)+LEFT(@String, 1) +RIGHT(@String, @lenstring-3) ELSE SUBSTRING(@string, 2, 1)+LEFT(@string, 1) +RIGHT(@String, @lenstring-2) END RETURN END GO |
Now we have this working, we’ll need all the common words of English. We first create the table of words and stock it from our word bank that I provide in the downloads at the bottom of the article.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
IF EXISTS ( SELECT 'found' FROM information_Schema.tables WHERE Table_Name='AllWords' ) DROP TABLE AllWords go CREATE TABLE AllWords ( String VARCHAR(25) NOT NULL, CONSTRAINT [PK_CommonWords] PRIMARY KEY CLUSTERED ([String] ASC) ON [PRIMARY] ) ON [PRIMARY] go /* now we'll insert all the words into our dictionary table */ DECLARE @commonwords TABLE (Word VARCHAR(255)) INSERT INTO @CommonWords EXECUTE master..xp_cmdshell 'type wordlist.txt' INSERT INTO AllWords (String) SELECT left(word,25) FROM @commonwords WHERE word IS NOT NULL go |
You would have to alter the path to the word list before using this, of course. I usually find that my seven tiles consist mostly of vowels so I couldn’t resist extracting lists of handy words with at least four consecutive vowels, just to make sure that everything was loaded properly
1 2 3 4 5 6 |
Declare @list Varchar(Max) SELECT @List=coalesce(@list+', ','')+string FROM allwords WHERE PATINDEX('%[aeiou][aeiou][aeiou][aeiou]%',string) >0 Select @List /*aqueous, gooier, gooiest, obsequious, obsequiously, obsequiousness, onomatopoeia, pharmacopoeia, pharmacopoeias, plateaued, plateauing, queue, queued, queueing, queues, queuing, sequoia, sequoias */ |
Though my favourite has five consecutive vowels
1 2 3 4 |
SELECT string FROM allwords WHERE PATINDEX('%[aeiou][aeiou][aeiou][aeiou][aeiou]%',string) >0 --queueing |
Good grief, if you are short of vowels, there is plenty you can do, as there are words with seven or more consonants in a row (if you include Y as a consonant) …
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
Declare @list Varchar(Max) SELECT @List=coalesce(@list+', ','')+string FROM allwords WHERE PATINDEX('%[^aeiou][^aeiou][^aeiou][^aeiou][^aeiou][^aeiou][^aeiou]%',string) >0 Select @List /* biorhythms, encrypts, rhythms, strychnine */ But most of the time, it is a shortage of vowels /* you can quickly find the words with the most vowels (10 seems to be the highest) but these are in words like internationalization(10) and interdenominational (9) */ SELECT string,LEN(string)-LEN(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(String,'a',''), 'e',''), 'i',''), 'o',''), 'u','')) FROM allwords ORDER BY LEN(string)-LEN(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(String,'a',''), 'e',''), 'i',''), 'o',''), 'u',''))DESC /* but it is more useful to have the ones with the highest proportion of vowels to consonants! */ DECLARE @list VARCHAR(MAX) SELECT @List=COALESCE(@list+', ','')+string FROM ( SELECT TOP 60 string FROM allwords ORDER BY ((LEN(string)-LEN(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(String,'a',''), 'e',''), 'i',''), 'o',''), 'u','')))*100.00) /LEN(string) DESC )f SELECT @List /* oi, adieu, aerie, audio, eerie, queue, aeon, aide, aloe, aqua, area, aria, aura, auto, beau, ciao, ease, epee, euro, idea, iota, luau, oboe, ooze, ouzo, amoebae, anaemia, aquaria, aqueous, aureole, evacuee, sequoia, aah, acacia, ace, adagio, adieus, adieux, aerate, aerial, aeries, age, ago, aha, aid, ail, aim, air, airier, ale, amebae, amoeba, anemia, ape, apiece, apogee, are, arouse, ate, audacious*/ |
Now we have a function that will provide a table with all the permutation of between two and eight letters, we can use it to find those common words that can be built with your list of letters. Here we have an effective way of cheating at Scrabble. You’d have to use something like iWebKit to knock together a little application.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
if exists (select 'found' from sys.objects where name ='ValidWordsInLetters') drop function ValidWordsInLetters go CREATE FUNCTION ValidWordsInLetters (@String VARCHAR(10)) /** summary: > example: - code: Select String from dbo.ValidWordsInLetters('CoFartua') returns: a table listing up to the first hundred words that can be made with the letters you supply. Good for Scrabble where you have SQL Server handy. **/ RETURNS @words TABLE ( number INT identity(1,1) primary key, String VARCHAR(10) ) AS BEGIN DECLARE @rowcount INT, @ii INT, @LenString int Declare @Strings Table ( number INT identity(1,1) primary key, String VARCHAR(10) ) insert into @strings(string) Select distinct string from dbo.PermutationsOf (@String) SELECT @Rowcount=0, @LenString=LEN(@String), @ii=@LenString WHILE @ii>2 AND @rowcount<100 BEGIN INSERT INTO @words SELECT distinct TOP 100 allwords.string FROM Allwords INNER JOIN @Strings a ON LEFT(a.string, @ii)=allwords.string SELECT @Rowcount=@rowcount+@@Rowcount, @ii=@ii-1 END return end /* Select String from dbo.ValidWordsInLetters('CoFartua') */ |
Rapping and doggerel.
Rhyming dictionaries aren’t new. they are simply dictionaries that are ordered by the word written backwards. It starts with Baa and ends in Fuzz. The most famous one is probably Walkers Rhyming Dictionary. Every poet has one. Whilst they are useful, they are a bit hit and miss to use. We’ll
be slightly more ambitious and try to give to a better rhyme. We’ll extract up to two of the final syllables of the word and match them to all other words with the same two syllables. We are actually not getting syllables as such but the sonorant/coda combinations. (syllables usually have an initial consonant) This seems to get a better set of rhymes
So, for Phil Factor, we have the rhymes …
actor,benefactor,chiropractor,contractor,detractor,extractor,malefactor,protractor,subcontractor,tractor
… We could soon be rapping with this lot
So as to get a quick response, and keep the code manageable, we’ll create a special table for our rhyming dictionary.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 |
drop TABLE dbo.RhymingWords CREATE TABLE dbo.RhymingWords( String varchar(30) NOT NULL, PenultimateSyllable varchar(30) not null default '', WithoutFinalSyllable varchar(30) not null default '', FinalSyllable varchar(30) not null default '', CONSTRAINT [PK_RhymingWords] PRIMARY KEY CLUSTERED ( [String] ASC ) ON [PRIMARY] ) ON [PRIMARY] GO /*We start by inserting all the words from our dictionary. */ insert into dbo.RhymingWords (String) select String from allwords /* and now we extract the final syllable into its own column */ UPDATE RhymingWords --add in the final syllable SET WithoutFinalSyllable=LEFT(r.String, LEN(r.String)-endingSyllable), FinalSyllable=RIGHT(r.String, endingsyllable) FROM RhymingWords r INNER JOIN -- could be a monophthong, diphthong, or triphthong, (SELECT String, syllable+case when PATINDEX('[aeiou]%',REVERSE(LEFT(String, LEN(String)-syllable)))=0 then 0 else PATINDEX('%[aeiou][^aeiou]%', REVERSE(LEFT(String, LEN(String)-syllable))+'x') end AS Endingsyllable FROM (SELECT String, --pick out the last vowel-consonant transition PATINDEX('%[^aeiou][aeiou]%',--to get the coda REVERSE(String))+1 AS syllable FROM RhymingWords WHERE PATINDEX('%[^aeiou][aeiou]%', REVERSE(String))>0 ---where it is possible! ) f ) g ON g.String=r.String UPDATE RhymingWords --add in the penultimate syllable SET PenultimateSyllable=RIGHT(r.WithoutFinalSyllable, syllableIndex) FROM RhymingWords r INNER JOIN -- could be a monophthong, diphthong, or triphthong, (SELECT String, syllable+case when PATINDEX('[aeiou]%',REVERSE(LEFT(WithoutFinalSyllable, LEN(WithoutFinalSyllable)-syllable)))=0 then 0 else PATINDEX('%[aeiou][^aeiou]%', REVERSE(LEFT(WithoutFinalSyllable, LEN(WithoutFinalSyllable)-syllable))+'x')end AS syllableIndex FROM (SELECT String, WithoutFinalSyllable,--pick out the last vowel-consonant transition PATINDEX('%[^aeiou][aeiou]%',--to get the coda REVERSE(WithoutFinalSyllable))+1 AS syllable FROM RhymingWords WHERE PATINDEX('%[^aeiou][aeiou]%', REVERSE(WithoutFinalSyllable))>0 ---where it is possible! ) f ) g ON g.String=r.String --for short words ending in vowels UPDATE RhymingWords SET FinalSyllable= right(String,patindex('%[^aeiouy]%',reverse(String)+'x')-1) from Rhymingwords where FinalSyllable ='' and reverse(String)like '%[aeiouy]%' --for words with only Ys UPDATE RhymingWords SET FinalSyllable= right(String,patindex('%[aeiouy]%',reverse(String)+'x')) from Rhymingwords where FinalSyllable ='' --no vowels UPDATE RhymingWords SET FinalSyllable= String where FinalSyllable ='' |
Now, we are in the position to create a function that takes any word and gives you the rhymes to it. You need to beware, because I have not yet programmed in automatic translation of homophones. English spelling is so inconsistent that rhyming dictionaries will never tell you all the rhymes. You have to use some artistry to get the best out of a Rhyming dictionary. ‘Rhyme’, rhymes with ‘chime’, even though the word endings are spelt differently. You need to search all the alternative spellings of the final syllable to get all the rhymes.
No subtlety could, for a moment, attract her,
Until she succumbed to the charms of Phil Factor
You see? No easy programming would have given you that rhyme.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 |
IF EXISTS ( SELECT 'found' FROM sys.objects WHERE name='RhymesWith' ) DROP FUNCTION RhymesWith go CREATE FUNCTION RhymesWith (@String VARCHAR(30)) /** summary: > returns a table with the best rhymes it can with whatever valid word you pass to it. Because some syllables are written differently but sound alike (Homophonic syllables) you may need to feed in words that you know rhyme but have their last syllable(s) spelt differently. e.g. eye and spy, example: - code: select * from dbo.RhymesWith('sand') - code: select * from dbo.RhymesWith('Phill') - code: select * from dbo.RhymesWith('technology') returns: a table of one varchar column listing all the Rhymes. There once was an IT contractor who went by the name of 'Phil Factor' He makes no apology for his technology for he's a consummate actor **/ RETURNS @alternatives TABLE ( number INT IDENTITY(1, 1) PRIMARY KEY, String VARCHAR(30) ) AS BEGIN DECLARE @Coda INT DECLARE @FinalSyllable VARCHAR(30) DECLARE @PenultimateSyllable VARCHAR(30) DECLARE @RestofWord VARCHAR(30) SELECT @PenultimateSyllable=PenultimateSyllable, @FinalSyllable=FinalSyllable FROM RhymingWords WHERE String=@String if @@Rowcount=0 BEGIN SELECT @Coda=PATINDEX('%[^aeiou][aeiou]%',--to get the coda REVERSE(@String)) SELECT @Coda=@Coda+PATINDEX('%[aeiou][^aeiou]%', REVERSE(LEFT(@String, LEN(@String)-@Coda))+'x') SELECT @RestOfWord=LEFT(@String, LEN(@String)-@Coda), @FinalSyllable=RIGHT(@String, @Coda) SELECT @Coda=PATINDEX('%[^aeiou][aeiou]%',--to get the coda REVERSE(@RestOfWord))+1 SELECT @Coda=CASE WHEN @Coda=1 THEN 0 ELSE @Coda END IF PATINDEX('[aeiou]%', REVERSE(LEFT(@RestOfWord, LEN(@RestOfWord)-@Coda)))>0 SELECT @Coda=@Coda+PATINDEX('%[aeiou][^aeiou]%', REVERSE(LEFT(@RestOfWord, LEN(@RestOfWord) -@Coda))+'x') SELECT @PenultimateSyllable=RIGHT(@RestOfWord, @Coda) IF @FinalSyllable='' AND REVERSE(@String) LIKE '%[aeiouy]%' SET @FinalSyllable=RIGHT(@String, PATINDEX('%[^aeiouy]%', REVERSE(@String)+'x')-1) --for words with only Ys IF @FinalSyllable='' SET @FinalSyllable=RIGHT(@String, PATINDEX('%[aeiouy]%', REVERSE(@String)+'x')) --no vowels IF @FinalSyllable='' SET @FinalSyllable=@String END INSERT INTO @alternatives (String) SELECT String FROM RhymingWords WHERE PenultimateSyllable+FinalSyllable=@PenultimateSyllable +@FinalSyllable AND String<>@String IF @@Rowcount<10 INSERT INTO @alternatives (String) SELECT String FROM RhymingWords WHERE FinalSyllable=@FinalSyllable AND String<>@String RETURN END |
Rap up
So here we are with a word-bank that allows you to cheat at Scrabble and rap, or make up doggerel. More to the point, it has illustrated, in the ‘permutation’ function, how to use a numbers table to create a table, and the ‘quirky update’ method of filling a table with permutation, or any other data you need. We’ve also illustrated some techniques of using the built-in, and rather primitive, character pattern-matching techniques of SQL Server.
A Parting Competition
To end up with: here is a simple competition, that I will award a Christmas prize of a $50 Amazon voucher for.
Given that Scrabble is scored to the following table:
- How fast can you score all common words according to their Scrabble scores, so as to list them in order.?
- Find all the words that can’t ever be used in Scrabble? (Blank tiles can mean any letter). Scrabble contains…
- 2 blank tiles (scoring 0 points) that can represent any letter
- 1 point: E ×12, A ×9, I ×9, O ×8, N ×6, R ×6, T ×6, L ×4, S ×4, U ×4
- 2 points: D ×4, G ×3
- 3 points: B ×2, C ×2, M ×2, P ×2
- 4 points: F ×2, H ×2, V ×2, W ×2, Y ×2
- 5 points: K ×1
- 8 points: J ×1, X ×1
- 10 points: Q ×1, Z ×1
Load comments