most words with fewest different letters

For the discussion of language mechanics, grammar, vocabulary, trends, and other such linguistic topics, in english and other languages.

Moderators: gmalivuk, Moderators General, Prelates

smallfarm
Posts: 4
Joined: Fri Nov 06, 2015 12:41 am UTC

most words with fewest different letters

Postby smallfarm » Fri Nov 06, 2015 12:52 am UTC

This forum seems to be comprised of reasonably intelligent members (until I was granted access). What I am searching for is the doppelgänger of several threads concerning lists of words based on various criteria for minimal communication.
I am seeking a progressive list of the minimal letters that can be used to generate the maximum number of words. My initial gut thought assumed that following the list of letter usage frequency would be an approximation of an acceptable answer, but letter use frequency lists are vowel heavy on the front end and few words are comprised of multiple vowels and one consonant. Additionally the frequency of letter use list is prejudiced by the frequent use of particular words rather than the number of possible words and the multiple use of a particular letter (or letters) in a word. My second thought was to arrange the letters according to their “weight” in Morse code. This yields a slightly different letter order than the letter frequency lists, but suffers the same core inaccuracies.
What I am looking for may be pictured this way…
Think of Mario Teaches Typing where the letters passing the screen can only form words—not random letters.
Assume the “game” starts with 2 or 3 letters and each may be used multiple times in a word. Which letters would generate the maximum number of “real” (generic spell checker) words?
The choice of each additional letter would be based on maximizing the growth of word list.
In the early stages there would be few words and minimal increase in the list with each new letter. At some point the vocabulary growth rate would be phenomenal so after a few letters (dozen or so?) the list growth optimization would be somewhat academic.
For example (and by no means comprehensive, optimal, and perhaps on the fringe of “real” words).
T and O yield TO, TOO, and TOT.
Add a Q and there is no increase in “real” words.
Add a P and there is TO, TOO, TOT, TOP, POT, POO, OP, TOPO, and OPT.
With a fourth letter the list would obviously grow considerably.

Any ideas??
Thank you.

User avatar
mathmannix
Posts: 1410
Joined: Fri Jul 06, 2012 2:12 pm UTC
Location: Washington, DC

Re: most words with fewest different letters

Postby mathmannix » Mon Nov 09, 2015 4:35 pm UTC

I interpret your statement to mean,
A group of one letter, which contains the most possible words [1], to which another letter can be added to produce a group of two letters which contain the maximum amount of words for a group of two letters produced in such a manner [this turns out to be 3], to which another letter can be added to produce a group of three letters which contain the maximum amount of words for a group of three letters produced in such a manner... [and so on]


In other words, if the group of four letters with the maximum arrangement happened to be "DUNE" (it's really not), it still wouldn't be part of your progressive list, because it couldn't be derived from some original group of one letter (as none of "D", "U", "N", or "E" are one-letter words.)

There are two or three commonly accepted one-letter words: A, I, and sometimes O. (I will include O.) We have to begin with one of these.

wordfind.com gives me the following reversable two-letter words:
containing A: AB, AH, AL, AM, AN, AT, AY
containing I: IS, IT
containing O: DO, HO, MO, NO, OS, OW, OY
(other: DE, EF, EH, EM, EN, ER, MU, NU)

Obviously, all of these containing A, I, or O contain the maximum possible number of words for two letters: three (two 2-letter, one 1-letter). (There are no reversible two-vowel words shown on wordfind.com, which could otherwise give you four words from their letters.) There are, however, also AI (which contains AI, A, and I) and OI (which contains OI, O, and I).

Presumably, you would want to start with a set of three letters that contain one (or more!) of these eighteen two-letter groups
(AB,AH,AI,AL,AM,AN,AT,AY,IO,IS,IT,DO,HO,MO,NO,OS,OW,OY)

From playing word games and checking a few common combinations (ERA, OPT, ART, etc.), I believe I have found the most commonly anagrammed set of three letters to be A-E-T. (This gives five accepted three-letter words - AET, EAT, ETA, TAE (which is Scottish), and TEA. As well as 4 two-letter words: AE, AT, ET, and TA, and the one single-letter word A. This (A,T,E) seems to thus be the singularly most successful path so far for the progession. Of course, this is disregarding repetition of letters, but I think it still holds up.

I would guess adding 'S' next to the list would be best.

tl,dr: TOP gives 6 words without repetition of letters (OPT,POT,TOP,OP,TO,O) and 11 words with repetition (also OOT,POO,POP,TOO,TOT).
AET gives 10 words without repetition of letters (ATE,EAT,ETA,TAE,TEA,AE,AT,ET,TA,A) and 15 words with repetition (also ATT,TAT,TEE,TET,AA).
Also, TOP is not derivable from any combination of two letters with more than two words therein, whereas AET is (AT/TA/A).
I hear velociraptor tastes like chicken.

smallfarm
Posts: 4
Joined: Fri Nov 06, 2015 12:41 am UTC

Re: most words with fewest different letters

Postby smallfarm » Wed Nov 11, 2015 3:13 am UTC

Thank you for your input, mathmmaix.

Your assumption is correct, another example of my quest would be to teach someone to read "real" stories with the fewest different letters involved. The elementary educators I know feel it would be an unproductive way to teach (as learning the alphabet is firmly entrenched as the first step in learning to read) and are unaware of any research or curricular program set up this way.

Perhaps the most salient point is the difficulty presented in obtaining an easy and definitive answer by permitting multiple use of the same letter(s). I tried anagram generators, and as you mentioned they lack this option.

I deliberately left the famous dog TOTO out (as a proper noun), but missed TOOT in my example. Common abbreviations and proper nouns were something that I initially dismissed; I may have to rethink that, however.

Additionally, a fair percentage of the words in your lists (in my very modest and humble opinion) fail to pass the "real word, generic spell check" test. The results are intended to be common words with which most, with a middle or high school education, would use or at least be familiar with.

Again, thank you for your input, and I am willing to entertain other opinions, research, or ideas that may help or point me in the right direction.

smallfarm
Posts: 4
Joined: Fri Nov 06, 2015 12:41 am UTC

Re: most words with fewest different letters

Postby smallfarm » Wed Nov 11, 2015 3:19 am UTC

There are few social sins greater than "getting" someone's name wrong.
:oops:
MATHMANNIX, I beg your forgiveness.

Sandor
Posts: 177
Joined: Sat Feb 13, 2010 8:25 am UTC

Re: most words with fewest different letters

Postby Sandor » Mon Nov 16, 2015 3:35 pm UTC

Given a word list, this is not too difficult to program, at least for the first few letter lists. Here's what I found:

est - 15 words
Spoiler:
est see sees set sets settee settees setts stet tee tees test testes tests tsetse
aest - 50 words
Spoiler:
as ass asses assess assesses asset assets at ate attest attests ease eases east eat eats est estate estates eta sat sea seas seat seats see sees set sets settee settees setts state states stet taste tastes tat tea teas tease teases teat teats tee tees test testes tests tsetse
aerst -154 words
Spoiler:
aerate aerates ararat are area areas arrases arrears arrest arrester arrests art arts as ass assert asserts asses assess assesses asset assets aster asters at ate attest attests ear ears ease eases east easter eat eater eaters eats era eras erase eraser erasers erases ere err errata errs erst est estate estates ester esters eta rare rarer rarest raster rasters rat rate rater rates rats re rear rearer rears reassert reasserts reassess resea reset resets rest restart restarts restate restates rests retest retests retreat retreats sat sea sear sears seas seat seats see seer seers sees serrate set sets settee settees setter setters setts star stare starer stares stars start starter starters starts state states steer steers stet strata street streets stress stresses tar tares tars tart tartar tartrate tarts taste taster tasters tastes tat tatters tea tear tears teas tease teaser teasers teases teat teats tee tees teeter terse terser test tester testers testes tests tetra treat treats tree trees tress tresses tsetse
aeprst - 345 words
Spoiler:
aerate aerates apart ape apes appear appears appease appeaser appeasers appeases apse apses apt aptest ararat are area areas arrases arrears arrest arrester arrests art arts as asp asps ass assert asserts asses assess assesses asset assets aster asters at ate attest attests ear ears ease eases east easter eat eater eaters eats era eras erase eraser erasers erases ere err errata errs erst est estate estates ester esters eta pa pap papa papas paper papers par parapet parapets pare pares parse parser parsers parses part parts pass passe passer passers passes past pasta pastas paste pastes pasts pat pate pater pates pats patter patters pea pear pears peartrees peas peat peep peeper peepers peeps peer peers pep pepper peppers peps per perpetrate perpetrates pert peseta pesetas pest pester pests pet peter peters pets prat pre prep prepare preparer preparers prepares preps preset presets press presses rap rape rapes raps rapt rare rarer rarest rasp rasper rasps raster rasters rat rate rater rates rats re reap reaper reapers reappear reappears reaps rear rearer rears reassert reasserts reassess rep repaper repartee repast repasts repeat repeater repeaters repeats repress represses reps resea reset resets rest restart restarts restate restates rests retest retests retreat retreats sap sapper sappers saps sat satrap satraps sea sear sears seas seat seats see seep seeps seer seers sees separate separates septet septets serrate set sets settee settees setter setters setts spa spar spare spares spars sparse sparser sparsest sparta spas spat spate spats spatter spatters spear spears sprat sprats spree stapes star stare starer stares stars start starter starters starts state states steep steeper steepest steeps steer steers step steppe steppes steps stet strap strapper straps strata street streets stress stresses tap tapas tape taper taperer tapers tapes tappers taps tar tares tars tart tartar tartrate tarts taste taster tasters tastes tat tatters tea tear tears teas tease teaser teasers teases teat teats tee teepee teepees tees teeter tepee terse terser test tester testers testes tests tetra trap trapper trappers traps treat treats tree trees trespass trespasser trespassers trespasses tress tresses tsetse
aelprst - 622 words
Spoiler:
aerate aerates ala alas ale alert alerts ales all allele alleles alp alps alt altar altars alter alters alts apart ape apes appal appals apparel appeal appeals appear appears appease appeaser appeasers appeases appellate apple apples applet apse apses apt aptest ararat are area areal areas arrases arrears arrest arrester arrests art artless arts as asleep asp asps ass assert asserts asses assess assesses asset assets aster asters astral at ate atlas atlases attest attests ear earl earls ears ease easel easels eases east easter eat eater eaters eats eel eels elal elapse elapses elate elates ell ells els else era eras erase eraser erasers erases ere err errata errs erst est estate estates ester esters eta etal lap lapel lapels lapp laps lapse lapses las lase laser lasers lass lasses last lasts late later lateral laterals latest latter lea leap leaper leaps leapt lease leases least leat lee leer leers lees leper lepers less lessee lessees lesser lest let lets letter letterpress letters lls pa paella pal palatal palate palates pale paler pales palest palette palettes pall pallet pallets palls palp palpate palpates pals pap papa papal papas paper paperless papers par parallel parallels parapet parapets pare pares parse parser parsers parses part parts pass passe passer passers passes past pasta pastas paste pastel pastels pastes pasts pat pate patella pater pates pats patter patters pea peal peals pear pearl pearls pears peartrees peas peat peel peeler peelers peels peep peeper peepers peeps peer peerless peers pele pellet pellets pelt pelts pep pepper peppers peps per perpetrate perpetrates pert peseta pesetas pest pester pestle pests pet petal petals peter peters petrel petrels pets plaster plasterer plasterers plasters plate platelet platelets plates platter platters plea pleas please pleases pleat pleats prat prattle prattler pre prelate prelates prep prepare preparer preparers prepares preps preset presets press presses psalter psalters rap rape rapes raps rapt rare rarer rarest rasp rasper rasps raster rasters rat rate rater rates rats rattle rattler rattles re real reals reap reaper reapers reappear reappears reaps rear rearer rears reassert reasserts reassess reel reels relapse relapses relate relates release releases rep repaper repartee repast repasts repeal repeals repeat repeater repeaters repeats repel repels replete repress represses reps resale resea resell reseller resellers reset resets resettle rest restart restarts restate restates restless rests retell retest retests retral retreat retreats sale sales salsa salt saltpetre salts sap sapper sappers saps sat satrap satraps sea seal sealer sealers seals sear sears seas seat seats seattle see seep seeps seer seers sees sell seller sellers sells separate separates septet septets serrate set sets settee settees setter setters settle settler settlers settles setts slap slapper slaps slat slate slater slaters slates slats sleep sleeper sleepers sleepless sleeps sleet sleets slept spa spar spare spares spars sparse sparser sparsest sparta spas spat spate spats spatter spatters spear spears spell speller spellers spells spelt splat splatter sprat sprats spree stale stall stalls stapes staple stapler staplers staples star stare starer stares starless starlet starlets stars start starter starters startle startles starts state stateless states steal stealer stealers steals steel steels steep steeper steepest steeple steeples steeps steer steers stellar step steppe steppes steps stet strap strapless strapper straps strata street streets stress stresses taal tale tales tall taller tallest tap tapas tape taper taperer tapers tapes tappers taps tar tares tars tarsal tart tartar tartrate tarts tassel tassels taste tasteless taster tasters tastes tat tatters tattle tea teal tear tearless tears teas tease teaser teasers teases teat teats tee teepee teepees tees teeter telesales tell teller tellers tells telltale tepee terse terser tesseral test tester testers testes tests tetra trap trapper trappers traps treat treats tree treeless trees trespass trespasser trespassers trespasses tress tresses trestle trestles tsetse
I was surprised how quickly "p" made it onto the list.

smallfarm wrote:Additionally, a fair percentage of the words in your lists (in my very modest and humble opinion) fail to pass the "real word, generic spell check" test. The results are intended to be common words with which most, with a middle or high school education, would use or at least be familiar with.

There's an awful lot of word lists floating around the internet, but I found most of them are polluted with non-words or fail to pass your "real word" test. Those that do are often missing words I think should be included. I used the list from here. If you provide what you think is a good list, maybe someone (perhaps me:) would at least take it up to 9 or 10 letters.

User avatar
Bloopy
Posts: 211
Joined: Wed May 04, 2011 9:16 am UTC
Location: New Zealand

Re: most words with fewest different letters

Postby Bloopy » Wed Nov 18, 2015 12:24 am UTC

smallfarm wrote:Perhaps the most salient point is the difficulty presented in obtaining an easy and definitive answer by permitting multiple use of the same letter(s). I tried anagram generators, and as you mentioned they lack this option.


More Words allows you to exclude a list of letters. So if you search for *^abcdfghijklmnopqruvwxyz for example, you get a list of 28 words that use only e,s,t

smallfarm
Posts: 4
Joined: Fri Nov 06, 2015 12:41 am UTC

Re: most words with fewest different letters

Postby smallfarm » Fri Nov 20, 2015 5:07 am UTC

Bloopy
Thank you for your input, regrettably, by seeking words through a dictionary oriented toward word puzzles, the results will invariably fail my “generic spell checker/student familiarity” criteria.

Sandor
You are “spot on” for what I am seeking. The word list you are using is very good. I tested it by copying a few portions of it to clipboard and dropping it into MSWord. My version of Word’s spell checker found a few items it didn’t like, but not many. I reviewed other sections and found a few “clinkers,” but not in sufficient number to cause alarm. Perhaps these “clinkers’ are more a reflection of my personal vocabulary limitations, than the obscurity of the words.
I was not at all surprised by the high placement of the letter “P.” In my “shoot from the hip” example “P” was the third letter. I consider it an “under the radar” letter in the sense that our opinion of its use frequency is prejudiced by its short section in dictionaries and encyclopedias. Further, overall frequency of letter use has little relation to the number of (often seldom used) words using the letter.
I was, however, surprised by the somewhat linear early growth of the list. I had assumed it would be far sharper after the first 3 letters, sort of a short flat end on a bell curve.

How profuse must my display of gratitude be to convince you to carry this to “9 or 10” letters?
Consider the gratitude displayed!

Sandor
Posts: 177
Joined: Sat Feb 13, 2010 8:25 am UTC

Re: most words with fewest different letters

Postby Sandor » Mon Nov 23, 2015 9:19 pm UTC

smallfarm wrote:How profuse must my display of gratitude be to convince you to carry this to “9 or 10” letters?

I think this can be done for all 26 letters. I'll have a go. Watch this space...

There are also word lists available that include a frequency count, based on a given corpus. For example, what's the minimum number of letters required to reproduce half of the words in the complete works of Shakespeare?

Sandor
Posts: 177
Joined: Sat Feb 13, 2010 8:25 am UTC

Re: most words with fewest different letters

Postby Sandor » Tue Nov 24, 2015 7:16 pm UTC

Sorry if a double post is against the rules, but this is more a follow-on than an edit. I took some short cuts when calculating these, so there is a (very small) chance some of this is incorrect.

Based on the word list I linked to earlier, I found these groups of letters can form the most words:

Code: Select all

ah 5
els 16
aest 51
aerst 165
aeprst 346
aelprst 623
adeinrst 1219
adeginrst 2114
acdeinorst 3430
acdeilnorst 5500
acdeilnoprst 8485
acdeilnoprstu 11852
acdeilmnoprstu 16420
acdegilmnoprstu 22062
acdeghilmnoprstu 27081
abcdeghilmnoprstu 32206
abcdeghilmnoprstuy 37685
abcdefghilmnoprstuy 42878
abcdefghilmnoprstuvy 47410
abcdefghilmnoprstuvwy 50858
abcdefghiklmnoprstuvwy 54505
abcdefghiklmnoprstuvwxy 55873
abcdefghiklmnopqrstuvwxy 56728
abcdefghijklmnopqrstuvwxy 57525
abcdefghijklmnopqrstuvwxyz 58112

You'll notice not all of them can be joined together. For example, the best group of 7 letters (i.e. the group of 7 letters that can form the most words) is "aelprst", but you can't add a letter to that to get to the best group of 8 letters, "adeinrst"

This happens 5 times, producing 5 different sequences. Each sequence starts with a group that is: a) the best group of its size, and b) does not appear in any other sequence.

The sequences are:

Code: Select all

Sequence 1
==========
ah 5 [Best 2 letters]
ahs 13 (+s)
ahms 28 (+m)
aehms 64 (+e)
aehmst 172 (+t)
aehmrst 443 (+r)
aehimrst 759 (+i)
aehimnrst 1387 (+n)
adehimnrst 2480 (+d)
adehimnorst 3950 (+o)
acdehimnorst 6648 (+c)
acdehilmnorst 10054 (+l)
acdehilmnoprst 14761 (+p)
acdehilmnoprstu 20107 (+u)
acdeghilmnoprstu 27081 (+g) [Best 16 letters] [Sequence 2 merges here]
abcdeghilmnoprstu 32206 (+b) [Best 17 letters]
abcdeghilmnoprstuy 37685 (+y) [Best 18 letters]
abcdefghilmnoprstuy 42878 (+f) [Best 19 letters]
abcdefghilmnoprstuvy 47410 (+v) [Best 20 letters]
abcdefghilmnoprstuvwy 50858 (+w) [Best 21 letters]
abcdefghiklmnoprstuvwy 54505 (+k) [Best 22 letters]
abcdefghiklmnoprstuvwxy 55873 (+x) [Best 23 letters]
abcdefghiklmnopqrstuvwxy 56728 (+q) [Best 24 letters]
abcdefghijklmnopqrstuvwxy 57525 (+j) [Best 25 letters]
abcdefghijklmnopqrstuvwxyz 58112 (+z) [Best 26 letters]

Sequence 2
==========
els 16 [Best 3 letters]
aels 48 (+a)
aelst 136 (+t)
aelrst 316 (+r)
aelprst 623 (+p) [Best 7 letters] [Sequence 3 merges here]
aeilprst 1124 (+i)
aeilnprst 1928 (+n)
aeilnoprst 3238 (+o)
aceilnoprst 5337 (+c)
acdeilnoprst 8485 (+d) [Best 12 letters] [Sequence 5 merges here]
acdeilnoprstu 11852 (+u) [Best 13 letters]
acdeilmnoprstu 16420 (+m) [Best 14 letters]
acdegilmnoprstu 22062 (+g) [Best 15 letters] [Sequence 4 merges here]
acdeghilmnoprstu 27081 (+h) [Best 16 letters] [Merges with sequence 1]

Sequence 3
==========
aest 51 [Best 4 letters]
aerst 165 (+r) [Best 5 letters]
aeprst 346 (+p) [Best 6 letters]
aelprst 623 (+l) [Best 7 letters] [Merges with sequence 2]

Sequence 4
==========
adeinrst 1219 [Best 8 letters]
adeginrst 2114 (+g) [Best 9 letters]
adegilnrst 3400 (+l)
adegilnorst 5178 (+o)
acdegilnorst 8094 (+c)
acdegilnoprst 11768 (+p)
acdegilnoprstu 16286 (+u)
acdegilmnoprstu 22062 (+m) [Best 15 letters] [Merges with sequence 2]

Sequence 5
==========
acdeinorst 3430 [Best 10 letters]
acdeilnorst 5500 (+l) [Best 11 letters]
acdeilnoprst 8485 (+p) [Best 12 letters] [Merges with sequence 2]

smallfarm wrote:I was, however, surprised by the somewhat linear early growth of the list. I had assumed it would be far sharper after the first 3 letters, sort of a short flat end on a bell curve.

Here are the totals in graph form. It's a classic "S" shape, so the differential would be a bell curve, but it only picks up after about 11 letters.

pwl3.png

You may spot some of the numbers are one or two off from those I posted earlier. I found the word list didn't include single letter words, so I added "a" and "I".

Now to see how many letters you need to make half the words in the works of Shakespeare...

KarenRei
Posts: 273
Joined: Sat Jun 16, 2012 10:48 pm UTC

Re: most words with fewest different letters

Postby KarenRei » Mon Jan 25, 2016 10:55 am UTC

Why do people use online services to solve problems like this? Don't you all have egrep/grep -E and /usr/share/dict/words?

Hmm, speaking of that, does anyone know where to get good wordlists like /usr/share/dict/words but broken down differently, say by parts of speech, or ordered by frequency of usage? I could probably use Teh Google but I'm too lazy ;)

User avatar
PM 2Ring
Posts: 3638
Joined: Mon Jan 26, 2009 3:19 pm UTC
Location: Mid north coast, NSW, Australia

Re: most words with fewest different letters

Postby PM 2Ring » Tue Jan 26, 2016 6:42 am UTC

There are links to various free word lists at The National Puzzlers League, but I don't know if any of them have part of speech data.

User avatar
gmalivuk
GNU Terry Pratchett
Posts: 26412
Joined: Wed Feb 28, 2007 6:02 pm UTC
Location: Here and There
Contact:

Re: most words with fewest different letters

Postby gmalivuk » Wed Jan 27, 2016 5:47 pm UTC

KarenRei wrote:Why do people use online services to solve problems like this? Don't you all have egrep/grep -E and /usr/share/dict/words?
If everyone had those and knew how to use them, they presumably wouldn't come to forums with questions like this, so the fact that there's a thread about it makes the answer to your question kind of obvious, I should think.
Unless stated otherwise, I do not care whether a statement, by itself, constitutes a persuasive political argument. I care whether it's true.
---
If this post has math that doesn't work for you, use TeX the World for Firefox or Chrome

(he/him/his)

KarenRei
Posts: 273
Joined: Sat Jun 16, 2012 10:48 pm UTC

Re: most words with fewest different letters

Postby KarenRei » Thu Jan 28, 2016 10:14 am UTC

gmalivuk wrote:
KarenRei wrote:Why do people use online services to solve problems like this? Don't you all have egrep/grep -E and /usr/share/dict/words?
If everyone had those and knew how to use them, they presumably wouldn't come to forums with questions like this, so the fact that there's a thread about it makes the answer to your question kind of obvious, I should think.


Do not the vast majority of people here use Linux or similar? If so then you almost certainly have them.

User avatar
gmalivuk
GNU Terry Pratchett
Posts: 26412
Joined: Wed Feb 28, 2007 6:02 pm UTC
Location: Here and There
Contact:

Re: most words with fewest different letters

Postby gmalivuk » Thu Jan 28, 2016 12:42 pm UTC

I strongly doubt it's any kind of majority, let alone a vast one, and having something doesn't mean knowing how to use it.
Unless stated otherwise, I do not care whether a statement, by itself, constitutes a persuasive political argument. I care whether it's true.
---
If this post has math that doesn't work for you, use TeX the World for Firefox or Chrome

(he/him/his)

Sandor
Posts: 177
Joined: Sat Feb 13, 2010 8:25 am UTC

Re: most words with fewest different letters

Postby Sandor » Fri Jan 29, 2016 12:24 pm UTC

KarenRei wrote:Why do people use online services to solve problems like this? Don't you all have egrep/grep -E and /usr/share/dict/words?

I'm curious, how would use egrep and a dictionary to solve this? I just can't see it.

KarenRei
Posts: 273
Joined: Sat Jun 16, 2012 10:48 pm UTC

Re: most words with fewest different letters

Postby KarenRei » Fri Jan 29, 2016 4:57 pm UTC

Sandor wrote:
KarenRei wrote:Why do people use online services to solve problems like this? Don't you all have egrep/grep -E and /usr/share/dict/words?

I'm curious, how would use egrep and a dictionary to solve this? I just can't see it.


Well, there's not a "single" problem presented above. But since the main point is trying to maximize the number of words a child could read while minimizing a set of letters needed, then for example to try "est":

egrep -c "^[est]*$" /usr/share/dict/words # Optionally include -i to ignore case

You can of course run it in a for loop using whatever rules you want for growing it to try different letters... it takes a fraction of a second per run, so it's not like it'll take months to complete.

Part of the problem you'll get with any approach is that not all words are equally important in a child learning to read, some are really rare. /usr/share/dict/words has particularly rare words included. Do you plan to write a book for children about a tsetse who lives ese of Tete who stets testes during Tet? Not super useful ;) You need to take word frequency into account, and grammatic necessity.

If I was working on this project, I'd get a word list with usage frequency and grammatical parts of speech. I'd then weight the result by something like:

Sum over parts of speech: (weighting for the part of speech) * (sum of word frequencies of this part of speech) ^ (some fractional exponent for the part of speech)"

I recommend a relatively low exponent... you want to make sure that you get at least some of each category (some articles, some conjunctions, etc), rather than a ton of whatever category is easiest. Beyond that I'd weight nouns highly in the beginning, then boost verbs, then adjectives, then adverbs.

KarenRei
Posts: 273
Joined: Sat Jun 16, 2012 10:48 pm UTC

Re: most words with fewest different letters

Postby KarenRei » Fri Jan 29, 2016 5:05 pm UTC

There's also another problem that raises its head in that a child can't just read something simply because they know the letter. There's all sorts of rules and exceptions. If you want to work around this, you'll also have to add to your project a pronunciation list (IPA, for example), ideally including syllable breaks. Then you'd need to pair the mapping up between the pronunciation and the spelling. For example:

chicken : ˈʧɪkən
chi*cken : ˈʧɪ*kən
(ch,ʧ) (i,ɪ) (ck, k) (e, ə) (n, n)

Those list of pairs would then be treated as the spelling of the word, and you could apply the above algorithm. It'd be trickier (you'd have to use a variety of rules to try to figure out how the spelling maps to the IPA), but still eminently doable.


Return to “Language/Linguistics”

Who is online

Users browsing this forum: No registered users and 9 guests