The SQL of Scrabble and Rapping

In which Phil decides to use a table consisting of all the common words in English to explore ways of cheating at Scrabble and writing doggerel using SQL Server. He then issues a SQL challenge.

Preamble

Sometimes, when you tackle a different problem in SQL, one can hit on techniques that come in handy in all sorts of other contexts.  This is the theme of this article, which uses a bank of all common words of the English language. 

Cheating at Scrabble

 I was playing Scrabble the other day and faced, as usual with an impossible hand. Scrabble is, as you probably know, a game where you pick tiles from a sack, and each tile has a letter written on it. You are allowed up to seven tiles at a time, and you have to place these, made up into words, on a game board marked with a 15-by-15 grid. The words are formed across and down in crossword fashion, and must be real words in common use. Each letter is scored individually, but the score is boosted by special squares on the board that give you double, or triple, word, or letter, scores.

It occurred to me, as it must have done to many others, that one could cheat at this game with a surreptitious iPhone or iPod and a simple word bank.

If you based this application on SQL Server, using a simple HTML interface, it would be easy to find all the words that could be made up from your seven tiles. Because you will need to link in with another word, that will come to eight letters. In some denser games, even more than eight-letter words are made as more existing words are crossed.

The first exercise will be to find all the permutation of the characters in your hand. Actually, if you are being subtle, you can restrict yourself to a subset, since only a small number of the combination of vowels and consonants are actually allowed in English. We won’t be subtle here: we’ll use the brute-force attack to the problem. The simplest way is to do a series of joins to a table of letters, but I’ll try a more flexible approach that uses a variation of the card-sharper’s shuffle instead. To do this, you will firstly need a number table. This number table is used frequently in SQL and you may already have one in your test database.  If you haven’t, this stored procedure will deal with the task of creating and populating the number table.

 

With these basics out of the way, we can now create the Table-Valued Function that returns the permutations in sequence of the characters you give it. This will work with any ASCII character and can be altered to deal with unicode, of course. All we are doing here is creating an empty table of the right size and then filling it with all the permutations of the characters you supply to the function. Normally, permutations will be done with a series of self-joins to a table with all the characters, one per row, but here we want something that will be useful when you do not know the length of the string in advance. (Permutations are great for doing many of the ‘graph’ problems, such as finding the shortest route, network routing, or  time-tabling)

Now we have this working, we’ll need all the common words of English. We first create the table of words and stock it from our word bank that I provide in the downloads at the bottom of the article.

You would have to alter the path to the word list before using this, of course.  I usually find that my seven tiles consist mostly of vowels so I couldn’t resist extracting lists of handy words with at least four consecutive vowels, just to make sure that everything was loaded properly

Though my favourite has five consecutive vowels

Good grief, if you are short of vowels, there is plenty you can do, as there are words with seven or more consonants in a row (if you include Y as a consonant) …

Now we have a function that will provide a table with all the permutation of between two and eight letters, we can use it to find those common words that can be built with your list of letters. Here we have an effective way of cheating at Scrabble. You’d have to use something like iWebKit to knock together a little application.

Rapping and doggerel.

Rhyming dictionaries aren’t new. they are simply dictionaries that are ordered by the word written backwards. It starts with Baa and ends in Fuzz. The most famous one is probably Walkers Rhyming Dictionary. Every poet has one. Whilst they are useful, they are a bit hit and miss to use. We’ll

 be slightly more ambitious and try to give to a better rhyme. We’ll extract up to two of the final syllables of the word and match them to all  other words with the same two syllables. We are actually not getting syllables as such but the sonorant/coda combinations. (syllables usually have an initial consonant) This seems to get a better set of rhymes

So, for Phil Factor, we have the rhymes …

actor,benefactor,chiropractor,contractor,detractor,extractor,malefactor,protractor,subcontractor,tractor

… We could soon be rapping with this lot

So as to get a quick response, and keep the code manageable, we’ll create a special table for our rhyming dictionary.

Now, we are in the position to create a function that takes any word and gives you the rhymes to it. You need to beware, because I have not yet programmed in automatic translation of homophones. English spelling is so inconsistent that rhyming dictionaries will never tell you all the rhymes. You have to use some artistry to get the best out of a Rhyming dictionary. ‘Rhyme’, rhymes with ‘chime’, even though the word endings are spelt differently. You need to search all the alternative spellings of the final syllable to get all the rhymes.

No subtlety could, for a moment, attract her,
Until she succumbed to the charms of Phil Factor

You see? No easy programming would have given you that rhyme.

Rap up

So here we are with a word-bank that allows you to cheat at Scrabble and rap, or make up doggerel. More to the point, it has illustrated, in the ‘permutation’ function, how to use a numbers table to create a table, and the ‘quirky update’ method of filling a table with permutation, or any other data you need. We’ve also illustrated some techniques of using the built-in, and rather primitive, character pattern-matching techniques of SQL Server.

A Parting Competition

To end up with: here is a simple competition, that I will award a Christmas prize of a $50 Amazon voucher for.

Given that Scrabble is scored to the following table:

  • How fast can you score all common words according to their Scrabble scores, so as to list them in order.?
  • Find all the words that can’t ever be used in Scrabble? (Blank tiles can mean any letter).   Scrabble contains…
    •     2 blank tiles (scoring 0 points) that can represent any letter
    •     1 point: E ×12, A ×9, I ×9, O ×8, N ×6, R ×6, T ×6, L ×4, S ×4, U ×4
    •     2 points: D ×4, G ×3
    •     3 points: B ×2, C ×2, M ×2, P ×2
    •     4 points: F ×2, H ×2, V ×2, W ×2, Y ×2
    •     5 points: K ×1
    •     8 points: J ×1, X ×1
    •     10 points: Q ×1, Z ×1