Monte and I was sessioning over a deck of cards when Kate came in the living room. She had the thinking expression on. “What’s the best hangman word?”, she asked. Without missing a beat, Monte replied, “Denim.” It took both of us by surprise; denim?
Monte smiles wryly. “People like to guess Es and Ns and As and Is,” he said, “and they stop thinking clearly when _ENI_ appears.” I laughed, and even the Kate-with-thinking-cap-on chuckled. “I’m serious,” she said, “What is the best hangman word?” Monte shrugged. “Hangman and scrabble has never been my strong suit.”
“I think this is not about knowing words, but knowing how people play,” Kate said, “And how does people play? I think they have some hints of what letters are frequent – the Es and Ns and As and Is – and mix that up with often repeated patterns, like -ing and -er and con-…”
“The best word wouldn’t be concerning, that’s for sure. Apart from that, it’s hard to know what patterns people use. But maybe we can find the words with the unlikely letters computationally,” I said. “Can you discover the word with the least common letters?” Kate asked innocently, “That sounds like something you, of all people, would know how to do.”
Damn. How did I get sucked into this again?
“Give me a sec.” I pulled out my laptop, and fired up python. “You wanted me to do that all along, didn’t you?” Kate made a face. Monte smirked, and went back to passing the cards. He obsesses, and that makes him good. The three of us… we all obsesses.
I downloaded the ENABLE word-list, and found a table of letter frequency at wikipedia. My code follows the simple strategy. It goes down the frequency list to hit all the letters, accruing misses while doing so. I ask it to repeat the strategy for each word in the word-list, and report back the number of misses. Kate waits anxiously.
“This is not very informative, Kate. The ‘best’ words are ones like zip, zap, zoom… the single-minded frequency follower will always arrive z last.” Kate is more enthusiastic about this than I am. “But are they all little words?” “Yea, so?” “What’s on the other side, like, what are the worst words?” “Hmm… astonishes, attention, antiheroes…” “Big words,” she said.
Kate paces the room. She does that when she think. It’s like Monte’s ring roll. I plotted the misses each word has; I was surprised at how symmetric the histogram is. I was expecting it to heavily skew to the left. Afterall, the more frequent letters are used more. The average word would take the “frequency player” 11 tries to guess; bad odds, considering most hangman games allow 6 or 7 misses.
“Listen. Instead of counting misses, let’s give them a score…” “Still according to letter frequency?” “Yea.” I changed the code around. We now give each word a score, depending on how many letters it has, and how frequent those letters show up. I took the letter frequency as a raw score; the higher the score, the more common letters it uses. The word “ate” would have a score of 29.925; it’s the sum of a (8.17), t (9.06), and e (12.7). This let us rank the words: “ate” would be a worse word than “fuzz” (5.06; 2.23+2.76+0.07). The scores themselves doesn’t have a physical meaning.
“Little words are still leading?” Kate is fixated on little words. “Yes – the less letters there are, the less there is to score.” I replied. She looked over the list. “But which of the little words are best?”
Monte glanced over at the screen. “That depends on what kind of cheap-shot you are. Are you low enough to use xu?” (I later found out it’s the currency of Vietnam.) I think Kate is too much of a goodie-twoshoe to use xu, or for that matter, my. Both of them seem to be deep in thought. I pull out a histogram of the score; it’s eerily normal distributed.
I know of scarcely anything so apt to impress the imagination as the wonderful form of cosmic order expressed by the “Law of Frequency of Error”. The law would have been personified by the Greeks and deified, if they had known of it. It reigns with serenity and in complete self-effacement, amidst the wildest confusion. The huger the mob, and the greater the apparent anarchy, the more perfect is its sway. It is the supreme law of Unreason. Whenever a large sample of chaotic elements are taken in hand and marshaled in the order of their magnitude, an unsuspected and most beautiful form of regularity proves to have been latent all along.
Monte spoke up, breaking the silence. “Remember denim? The same holds. Once you have decided how low you’re willing to sink in order to win, look for little words with the lowest frequency of occurrences; and amongst those, look for ones with the greatest ambiguity.
Jon’s list says that gym is better than cup by a hair; but in fact cup is the better word, because once _y_ is known, little alternatives exist, but _u_ leaves a wide open realm…” The thought comes to my mind; Monte talks entropy, and Kate talks enthalpy. How fitting.
The evening ended with Monte passing cards to tango music, and Kate writing out a list with the aid of MoreWords. I think she likes buff, buzz, fuzz, and also gag, qat, and jazz. Of course, that works only if she is not playing against us…
The list of words with the lowest score: list
The list of words with the lowest score, vowels excluded: list
The list of words with the highest score: list
The list of words with the highest score, vowels excluded: list
The master list of words arranged by score: list (big download!)