I continued doing some stats, and ended up looking at clues for the WSJ crosswords in 2017.
Any questions are welcome, if I didn’t think of them in the questions I wanted answered…
Anyhow, I’ll restate a few random facts that are relevant to any analysis of data:
- Let’s start with the commonly known rule that a single word generally almost always is not allowed to appear more than once in a specific grid. Applying this rule will make looking at this data a lot easier.
- I processed 304 PUZ files. This short number is not to be unexpected since the WSJ does not run a puzzle on Sunday or on holidays.
- The WSJ crossword puzzles used 14,684 unique words in 2017.
- Of those words, 10,079 were used exactly once. A ratio of words trending towards once is to be expected since most theme entries will be unique. But for some reason, I was surprised that there is this many that only occurred once.
Question 1: Repetition
Now the first question to look at with clues that I thought of involves repetition of cluing. This involves the same word appearing multiple times with the same clue. For example, ALOT appeared with the clue [Heaps] 7 times. Naturally, if a word appears once, the clue used with it only appears once, but a single word can be clued multiple ways.
Some random facts out of this analysis:
- Words in the WSJ crossword puzzles were clued 24,822 separate ways in 2017.
- Of those, 23,631 were clued only once. If we subtract the words used only once, there were 13,552 words that were used multiple times that were clued in different ways. This suggests a degree of creativity in how the clues were written.
- Of the rest, 1,031 were used twice, 126 used three times. This eliminates all but 33 of the word/clue pairs.
That list of 33 word/clue pairs used 4 or more times in 2017 WSJ crosswords (or more than 1.3% of the time) – click to reveal:
ALOT [Heaps] 7 ALA [In the style of] 6 ONO [Lennon's love] 6 ALA [Copying] 5 ANI [Singer DiFranco] 5 AREST ["Give it ___!"] 5 ATON [Heaps] 5 EXERT [Bring to bear] 5 ONSET [Beginning] 5 AER [___ Lingus] 4 AGO [In the past] 4 APT [Fitting] 4 AREA [Vicinity] 4 ARIA [Diva's delivery] 4 ASTO [About] 4 ATE [Put away] 4 CLAD [Not nude] 4 DES [___ Moines] 4 EAT ["Dig in!"] 4 EMT [CPR pro] 4 EON [Interminable wait] 4 ERAS [Eon divisions] 4 ESPY [Spot] 4 EXPO [Convention center event] 4 IDEA [Notion] 4 INRE [About] 4 MIEN [Bearing] 4 NADA [Zilch] 4 OPEN [Ready for business] 4 OVAL [Cameo shape] 4 PAL [Buddy] 4 REB [Yank's foe] 4 SEEN [Spotted] 4
Question 2: Creativity
The last question I had involved the creativity of cluing. In other words, how many times has a word appeared with different clues attached to them. For instance, UNO appeared 8 times using 7 different clues:
- [One, for Juan]
- [Card game with a four-color deck]
- [Game akin to Crazy Eights]
- [Start of a Cuban count]
- [Game with red, green, blue and yellow suits]
- [56-Down, to Fernando]
- [One of the Medicis]
Some facts out of this analysis:
- The number of this list should match the original list, which it does.
- Of those, 10,294 words were clued exactly one way. This discrepancy with the number of words (215) that appeared only once (10,079) can be explained by different words appearing with the same clue.
- 2,259 have 2 separate clues, 924 have 3 separate clues, 456 have 4, 248 have 5, 192 have 6, 90 have 7, 64 have 8, 37 have 9, 37 have 10. This eliminates all but 83 of the words.
- The top of this list bears a striking resemblance to the original list. This says that even with the repetition that the constructors/editor are making an attempt to vary the clues.
The list of 83 words with the most different clues:
ORE 30 ERA 28 OLE 26 ALI 23 ERIE 22 ALOE 21 AREA 21 ASH 21 ALE 18 ELI 18 ETA 18 RIO 18 SET 18 ARIA 17 ERR 17 YES 17 ANTE 16 EDEN 16 LEE 16 ONE 16 TEN 16 ALTO 15 AMI 15 EWE 15 OREO 15 SEE 15 TEE 15 ALA 14 AMEN 14 ASIA 14 ELS 14 END 14 ICE 14 NET 14 SPA 14 ANTI 13 ASS 13 EASE 13 EMU 13 IDO 13 ISLE 13 SEA 13 USE 13 ABBA 12 ABEL 12 ACE 12 AGE 12 AIR 12 ALAS 12 ARE 12 ARENA 12 ARI 12 AWE 12 EDIT 12 EGO 12 EROS 12 EVE 12 IRE 12 LAB 12 NEE 12 ORAL 12 SHE 12 ACRE 11 ADA 11 ANN 11 ARC 11 ATE 11 ATM 11 CIA 11 EBB 11 ENDS 11 ERAS 11 ESP 11 OAR 11 OBOE 11 OTTO 11 RED 11 RIOT 11 SCOT 11 SPAS 11 STY 11 TIN 11 URSA 11
Thanks for reading, and as stated, if anyone has any other good questions to ask out of the data, be sure to ask!