I continued doing some stats, and ended up looking at clues for the WSJ crosswords in 2017.

Any questions are welcome, if I didn’t think of them in the questions I wanted answered…

Anyhow, I’ll restate a few random facts that are relevant to any analysis of data:

  • Let’s start with the commonly known rule that a single word generally almost always is not allowed to appear more than once in a specific grid. Applying this rule will make looking at this data a lot easier.
  • I processed 304 PUZ files. This short number is not to be unexpected since the WSJ does not run a puzzle on Sunday or on holidays.
  • The WSJ crossword puzzles used 14,684 unique words in 2017.
  • Of those words, 10,079 were used exactly once. A ratio of words trending towards once is to be expected since most theme entries will be unique. But for some reason, I was surprised that there is this many that only occurred once.

Question 1: Repetition
Now the first question to look at with clues that I thought of involves repetition of cluing. This involves the same word appearing multiple times with the same clue. For example, ALOT appeared with the clue [Heaps] 7 times. Naturally, if a word appears once, the clue used with it only appears once, but a single word can be clued multiple ways.

Some random facts out of this analysis:

  • Words in the WSJ crossword puzzles were clued 24,822 separate ways in 2017.
  • Of those, 23,631 were clued only once. If we subtract the words used only once, there were 13,552 words that were used multiple times that were clued in different ways. This suggests a degree of creativity in how the clues were written.
  • Of the rest, 1,031 were used twice, 126 used three times. This eliminates all but 33 of the word/clue pairs.

That list of 33 word/clue pairs used 4 or more times in 2017 WSJ crosswords (or more than 1.3% of the time) – click to reveal:

ALOT	[Heaps]	7
ALA	[In the style of]	6
ONO	[Lennon's love]	6
ALA	[Copying]	5
ANI	[Singer DiFranco]	5
AREST	["Give it ___!"]	5
ATON	[Heaps]	5
EXERT	[Bring to bear]	5
ONSET	[Beginning]	5
AER	[___ Lingus]	4
AGO	[In the past]	4
APT	[Fitting]	4
AREA	[Vicinity]	4
ARIA	[Diva's delivery]	4
ASTO	[About]	4
ATE	[Put away]	4
CLAD	[Not nude]	4
DES	[___ Moines]	4
EAT	["Dig in!"]	4
EMT	[CPR pro]	4
EON	[Interminable wait]	4
ERAS	[Eon divisions]	4
ESPY	[Spot]	4
EXPO	[Convention center event]	4
IDEA	[Notion]	4
INRE	[About]	4
MIEN	[Bearing]	4
NADA	[Zilch]	4
OPEN	[Ready for business]	4
OVAL	[Cameo shape]	4
PAL	[Buddy]	4
REB	[Yank's foe]	4
SEEN	[Spotted]	4

Question 2: Creativity
The last question I had involved the creativity of cluing. In other words, how many times has a word appeared with different clues attached to them. For instance, UNO appeared 8 times using 7 different clues:

  • [One, for Juan]
  • [Card game with a four-color deck]
  • [Game akin to Crazy Eights]
  • [Start of a Cuban count]
  • [Game with red, green, blue and yellow suits]
  • [56-Down, to Fernando]
  • [One of the Medicis]

Some facts out of this analysis:

  • The number of this list should match the original list, which it does.
  • Of those, 10,294 words were clued exactly one way. This discrepancy with the number of words (215) that appeared only once (10,079) can be explained by different words appearing with the same clue.
  • 2,259 have 2 separate clues, 924 have 3 separate clues, 456 have 4, 248 have 5, 192 have 6, 90 have 7, 64 have 8, 37 have 9, 37 have 10. This eliminates all but 83 of the words.
  • The top of this list bears a striking resemblance to the original list. This says that even with the repetition that the constructors/editor are making an attempt to vary the clues.

    The list of 83 words with the most different clues:

    ORE	30
    ERA	28
    OLE	26
    ALI	23
    ERIE	22
    ALOE	21
    AREA	21
    ASH	21
    ALE	18
    ELI	18
    ETA	18
    RIO	18
    SET	18
    ARIA	17
    ERR	17
    YES	17
    ANTE	16
    EDEN	16
    LEE	16
    ONE	16
    TEN	16
    ALTO	15
    AMI	15
    EWE	15
    OREO	15
    SEE	15
    TEE	15
    ALA	14
    AMEN	14
    ASIA	14
    ELS	14
    END	14
    ICE	14
    NET	14
    SPA	14
    ANTI	13
    ASS	13
    EASE	13
    EMU	13
    IDO	13
    ISLE	13
    SEA	13
    USE	13
    ABBA	12
    ABEL	12
    ACE	12
    AGE	12
    AIR	12
    ALAS	12
    ARE	12
    ARENA	12
    ARI	12
    AWE	12
    EDIT	12
    EGO	12
    EROS	12
    EVE	12
    IRE	12
    LAB	12
    NEE	12
    ORAL	12
    SHE	12
    ACRE	11
    ADA	11
    ANN	11
    ARC	11
    ATE	11
    ATM	11
    CIA	11
    EBB	11
    ENDS	11
    ERAS	11
    ESP	11
    OAR	11
    OBOE	11
    OTTO	11
    RED	11
    RIOT	11
    SCOT	11
    SPAS	11
    STY	11
    TIN	11
    URSA	11
    

    Thanks for reading, and as stated, if anyone has any other good questions to ask out of the data, be sure to ask!

    Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.