Whitaker's WORDS

scrabulista

Consul
Staff member
Yeah, I'm back to parsing Whitaker's WORDS -- quite useful for questions like how many words end in -on...potentially useful for determining which declension/conjugation is the most frequent...need to figure out the proper weights of the frequency codes.

The word list may be found at: http://users.erols.com/whitaker/dictpage.htm

Bitmap pointed out a mistake in it some time ago.

A few things have caught my eye:

Greek-derived adjectives that end in -os, -os, and -on:
Examples:
cacozelos, -os, -on: "stylistically in bad taste."
acanthicos, -os, -on: "from pine-thistle."
acosmos, -os, -on: "unadorned, careless."
adiaphoros, -os, -on: "indifferent."

However there are also such words as:
agios, -a, -on: "holy"
In the program, it's extant in the masculine and neuter and nonexistent in the feminine. In the WORDLIST, it's this way.
arcticos,-e,-on: "initial."
I assume it would be -es in the feminine genitive singular. But in the program, it's extant only in the masculine.
arteriacos, -e, -on: "affecting the air passages/windpipe"
It's this way in the WORDLIST but in the program it's extant in the feminine and defective in the masculine.
auleticos, -e, -on: "used for making reed pipes/flutes." In the program, it's three separate words all meaning the same thing.

So is the -os a legitimate ending for the feminine nominative singular? Would it be better to think of it as an appositive noun? Is it a mistake?
 

Bitmap

Civis Illustris
Some adjectives have -os for both the masculine and the feminine singular and -on for the neuter. That's a question of Ancient Greek grammar, though.

I think that program is pretty random
I wouldn't start using them just because they are in that word list. Most of them are hapax legomena, which makes the question as to how they should be declined unnecessary
 

scrabulista

Consul
Staff member
In the WORDLIST, they've got quite a few NUM (number) words that didn't make it to the program.

Things like:
centensim
centensum


I think what is meant is centensimus and centensumus (A number of other long words were wrongly cut short).
The program has centesimus as the preferred word for "one hundredth."
 

scrabulista

Consul
Staff member
I'm sure that's right now.

If you type centensimus or centensumus into the program you get "one hundredth."

A few more things:

dimetr: "of two measures or two/four metric feet." I think this is a typo for dimeter.
ausum, -, -, -: intend, be prepared; dare (to go/do), act boldly, risk; (SUB for audeo kludge);
Surely this should be "-, -, -, ausum," or even "audeo, audere, -, ausus (sum)?" (which Whitaker lists separately)
 

Bitmap

Civis Illustris
what's the point of this thread?
 

scrabulista

Consul
Staff member
Oh, sorry, I was trying to go over Whitaker's wordlist to see if there are mistakes in it or misunderstandings on my part.

As we have said on automated translators before, a lot of programmers think that all you have to do is write lookups to translate from one language to another. This is not the case. Regardless of the project, you need to know your data.
 

scrabulista

Consul
Staff member
There was some mention of an S-O-X rule on another thread.

A 3rd declension noun ending in -o is a good bet (89% among native* words) to be feminine. Those ending in -s or -x not so much (about 65% compared to 56% overall feminine).
A 3rd declension noun ending in -z is a 100% bet to be feminine but it's rare (The two words are dioryz, diorygis and dioryz, diorigis - there's also a dioryx, diorygis).

A 3rd declension noun ending in -r is a good bet (91%) to be masculine. -d is 100% but David, Davidis is rare.

A 3rd declension noun ending in -a, -c, or -m is a good bet (100%) to be neuter. -c and -m are rare though.
lac, lactis; alec, alecis; allec, allecis; hallec, hallecis - the Greek neuter is coec, coecis/coecos.
emblem, emblematis; didrachm, didrachmatis; holocautom, holocautomatis; glaucom, glaucomis; the Greek masculine is dem, demos/demis. A 3rd declension noun ending in -e is a good bet (99%) to be neuter.

*If the genitive takes -is or -os then it's a Greek 3rd. Otherwise it's "native" - that's probably not correct linguistically.

I probably should have corrected for word frequency.

Code:
Ending   M    F   N   C  Gk-M  Gk-F   Gk-N  Gk-C
a        0    0 155   0     0     0     5     0
c        0    0   4   0     0     0     1     0
d        1    0   0   0     0     0     0     0
e        0    1 224   0     0     0     0     0
g        1    0   1   0     0     0     0     0
i        2    0   6   0     0     0     0     0
l       19    1  47   4     0     0     0     0
m        0    0   4   0     1     0     0     0
n       73   16 172   2    17     4     0     1
o      270 2320   0   9     1     0     0     0
r     1115    7  76  24     6     0     0     0
s      456 1194  96  92    19    85     5     3
t        0    2   6   0     0     0     0     0
x      114  266   3  29     2     5     0     1
y        0    0   0   0     0     0     2     0
z        0    2   0   0     0     0     0     0
Total 2051 3809 794 160    46    94    13     5
 

cinefactus

Censor
Staff member
I don't think that I will bother learning the SOX rule then...
 

scrabulista

Consul
Staff member
Henle's exceptions to SOX:

But masculini generis
are words in -os, -nis, -guis, and -cis,
in -es (-itis) and -ex (-icis);
as neuter mark the -us (with -ris).

-os has 6 common, 5 feminine, 24 masculine, 4 neuter, 2 Greek masculine (62% vs. 30% overall masculine)
-nis has 9 common, 12 feminine, 18 masculine (46%)
-guis has 1 common, 1 feminine, 2 masculine (50%)
-cis has 1 common, 13 feminine, 8 masculine, 2 Greek feminine (36%)
-es (with -itis) has 17 (5) common, 81 (1) feminine, 94 (32) masculine, 11 (0) neuter, 2 (0) Greek common, 3 (0) Greek feminine,
1 (0) Greek masculine. - 46% with the -es ending overall, 84% if you confine it to the -es, -itis ending
-ex (with -icis) has 11 (10) common, 21 (13) feminine, 53 (38) masculine, 3 (1) neuter, 1 (0) Greek feminine.
- 60% with the -ex ending overall, 61% with -ex, -icis.

-us (with -ris) has 4 (2) common, 18 (1) feminine, 12 (5) masculine, 66 (66) neuter, and 5 (0) Greek masculine.
- 66% vs. 12% overall neuter; 89% for the -us, -ris ending.
 

Imber Ranae

Ranunculus Iracundus
It would be helpful if you actually listed these words, or at least the non-obvious ones. There aren't that many, so it shouldn't be a particularly onerous task. The problem is that Words has been shown to be untrustworthy in the past concerning designations of noun classes.

Henle excludes common nouns from consideration in a previous rule as well, so it's hardly fair to count them as violations of the SOX rule.

scrabulista dixit:
-os has 6 common, 5 feminine, 24 masculine, 4 neuter, 2 Greek masculine (62% vs. 30% overall masculine)
I imagine the masculines are all words like honos, labos, etc., frequent early variants of honor, labor, etc.

I'd be interested in the neuter and feminine exceptions. I can't think of any such nouns myself.

-us (with -ris) has 4 (2) common, 18 (1) feminine, 12 (5) masculine, 66 (66) neuter, and 5 (0) Greek masculine.
- 66% vs. 12% overall neuter; 89% for the -us, -ris ending.
I can't think of any 3rd declension nouns in -us,(-ris) that are masculine or feminine, either. Could you list these? No need to bother with the neuters, of course.

Also, I don't know what the numbers in parentheses mean.
 

scrabulista

Consul
Staff member
Feminine in os: arbos, cos, cocos, cheneros, dos
Neuter in os: os, oris; os, ossis; os, ossuis also epos (no genitive)
os, ossuis: bones (pl.) (dead people)....source is X = "General or unknown or too common to say." I'd have to go with unknown on this particular entry.

Feminine in -us: Hiericus, apus, grus, laus, fraus, pecus, incus, subscus, palus, tellus, salus, senectus, servitus, juventus, virtus, Venus

Venus is listed twice - (1) Venus, goddess of sexual love, planet Venus, charm/grace;
(2) sexual activiity/appetite/intercourse
virtus is listed twice - (1) strength/power; courage/bravery; worth/manliness/virtue/character/excellence
(2) army; host; mighty works (pl.)
It should have been 18 (3) rather than 18 (1)....Venus and tellus take -ris in the genitive.

Masculine in -us: stercus, Achilleus, complus, conplus, lepus, tripus, chytropus, coronopus, cytropus, dasypus, vetus, trapetus. The bolded ones take -ris in the genitive.
 

Imprecator

Civis Illustris
-Stercus is neuter
-'Cocos' is not a Latin word
-I can't find any evidence for ossuis
-Co(n/m)plus is the same word, obviously
 

scrabulista

Consul
Staff member
stercus is indeed neuter. That's a mistake in WORDS.
ossuis -

Lewis and Short have:
ŏs , ossis (collat. form ossum , i, Varr. ap. Charis. p. 112 P.; Att. ap. Prisc. p. 750 ib.; Tert. Carm. adv. Marc. 2, 196: ossu , u, Charis. p. 12 P.—In plur.:
I. “OSSVA for OSSA, freq. in inscrr.,” Inscr. Orell. 2906; 4361; 4806; Inscr. Osann. Syll. p. 497, 1; Cardin. Dipl. Imp. 2, 11: ossuum for ossium, Prud. στεφ. 5, 111), n. prop. ossis for ostis, kindred with Sanscr. asthi, os; Gr. ὀστέον; Slav. kostj, a bone (class.).

In "Remains of Old Latin" at archive.org (it's a Loeb Classical Library book), there's a note on p. 195: Still some old writers used to inflect os from a nominative ossu and from a nominative ossum.
==============================
ossu would produce such forms as ossua and ossuum, and oss[u-long:1ay1b8oa][/u-long:1ay1b8oa]s. ossuis ~= oss[u-long:1ay1b8oa][/u-long:1ay1b8oa]s? If not then ossuis should definitely be dropped. If so then changed to 4th declension.

cocos, cocois: coconut tree SOURCE_TYPE = K = Calepinus Novus, modern Latin, by Guy Licoppe.
There's such scientific names as Cocos nucifera but maybe they shouldn't count.
 

scrabulista

Consul
Staff member
I started trying to sift through the wordlist to tag alternate spellings (my vision was to have a pointer to the preferred spelling).
But that is *quite* a challenge.

An easier task is sifting through the genitives looking for unusual ones like ossuis that might not be right.

What I found:
apalocrocodes, apalocrodis: It seems unusual for a word to drop a syllable in the genitive. SOURCE_TYPE = O (Oxford Latin Dictionary)
ascyroides, ascyrodis: This is a typo - it should be ascyroidis in the genitive. SOURCE_TYPE = Lewis & Short
commaterr, commatris - I'm guessing this is a typo - it should be commater in the nominative. SOURCE_TYPE = L.F. Stelten, Dictionary of Ecclesiastical Latin.
halcycon, halcyonis - It should be halcyon in the nominative, no? SOURCE_TYPE = "General or unknown or too common to say."
communiceps, communnicipis - Lewis and Short do not have the n doubled in the genitive, although Whitaker's SOURCE_TYPE = Oxford Latin Dictionary.
Traex, Treacis - it seems strange for the vowels to flip like that. I think it should be Traecis in the genitive.
Thraex, Threacis - same thing. SOURCE_TYPE is Oxford Latin Dictionary.
virens, virentiis - SOURCE_TYPE = Lewis and Short; the online version does not have a double i in the genitive.
os, ossuis - already discussed
euro, euphonis - referring to the currency - SOURCE_TYPE = Calepinus Novus, modern Latin, by Guy Licoppe.
Cocos at least has botanical taxonomy going for it. So far as euro goes, all I can say is - yuck!
 

scrabulista

Consul
Staff member
Suspicious patterns in comparative adverbs:

crebo - with comparative crebrius and superlative creberrime.
crebro is also listed with the same comparative and superlative.
I don't think crebo exists. Crebre is listed separately with no comparative/superlative.

tolerabiter - with comparative tolerabilius and superlative tolerabilissime.
It should be tolerabiliter.

valde with comparative valdius and superlative valdissime. There is no corresponding adjective.
I think this should be valide, with comparative validius and superlative validissime.
 

cinefactus

Censor
Staff member
valde is definitely a word. The author of the Gesta Francorum uses it in just about in every second sentence...
 

Imprecator

Civis Illustris
It is ualide with the middle syllable sync'pated, and by far the more common form of the word. The other two are genuine errors.
 

cinefactus

Censor
Staff member
Thanks Imprecator. I had never realised that, although it is obvious now that you say it!
 

scrabulista

Consul
Staff member
A few more thoughts on the comparative adverbs:

There are 387 such adverbs, with 237 being in the 1st/2nd declension.
One is denixe. No denixus is listed, but it seems obvious from de and the past participle of nitor.
Another is indigniter, with indignus listed but no indignis or any other 3rd declension adjective.
Also we have fraudlenter, with a fraudulentus on the adjective side but no fraudulens.
6 1st/2nd's have positive adverbs in -o. Of course these are ablatives as adverbs: arcano, certo, cito, festinato, sero, and tuto.
Then we have two in -m: multum and parum.

134 3rd declension omparative adverbs.

Then 16 additional ones:
crebo already discussed as wrong, but there is a crebre and a crebriter among noncompared adverbs.
tolerabiter already discussed as wrong

intra, ultra, prope, diu listed twice, : no positive adjective
saepe, temperi, tempori, pedetemptim - no adjective at all
deterius - no positive adverb or adjective
fiducialiter with no fiducialis
ditius/ditissime related to the adjective dis/ditior/ditissimus with no positive adverb.
uberius/uberrime with no positive form but there is an uberte and an ubertim.
nuper, nuperrime with no comparative form.
 

Imber Ranae

Ranunculus Iracundus
scrabulista dixit:
Feminine in os: arbos, cos, cocos, cheneros, dos
Neuter in os: os, oris; os, ossis; os, ossuis also epos (no genitive)
os, ossuis: bones (pl.) (dead people)....source is X = "General or unknown or too common to say." I'd have to go with unknown on this particular entry.
Ah, I wasn't thinking of the obvious monosyllabic words for some reason. Save for cos I was aware of those exceptions. Cheneros apparently only occurs in the plural as chenerotes, and is of Greek origin anyway. Epos is also just a Greek word, only occurring in the nominative and accusative in Latin. Cocos seems to be neo-Latin, so I wouldn't count that either. The archaic alternative spelling of arbor is a legitimate exception, though.

(Even if the genitive ossuis is an attested variant of ossis, which is far from clear, it wouldn't count.)

Feminine in -us: Hiericus, apus, grus, laus, fraus, pecus, incus, subscus, palus, tellus, salus, senectus, servitus, juventus, virtus, Venus
Of these, only tellus is an unquestionable exception. Venus of course refers to a person (the goddess), so it shouldn't count per se. It's true that the name is sometimes used metonymically for "charm, sexual love, etc.", but that's not the primary meaning. And while pecus,-udis is indeed feminine, pecus,-oris is always neuter.

Masculine in -us: stercus, Achilleus, complus, conplus, lepus, tripus, chytropus, coronopus, cytropus, dasypus, vetus, trapetus. The bolded ones take -ris in the genitive.
Of these, I would say only lepus counts, though it might actually be of common gender, in which case it wouldn't. Stercus, as Imprecator said, is always neuter, whereas vetus is really an adjective. Perhaps it is occasionally used as a substantive, though I have never seen such myself. I'm guessing complus/conplus is the imagined singular form of the adjective complures "several", which doesn't count for obvious reasons.
 
Top