An Analysis of Wall Street Journal Crossword Puzzle Authorship

Back in 2016, Jim Peredo editorialized about the number of times Mike Shenk constructs Wall Street Journal crossword puzzles. Prompted by this observation, I wrote a program that lists the authors of these puzzles and count them to see exactly how many are attributable to Mike Shenk, the crossword puzzle editor there. Peredo repeated his observations today. This has given me the occasion to repeat the processing and post some analysis.

Shenk uses a large number of aliases in using his own puzzles as editor. This is a common crossword editing practice for several reasons that I won’t go into here. The question is not whether Shenk should be constructing puzzles as an editor, as he is a superb constructor that deserves to practice his craft. The question that has been asked by Peredo is whether the sheer volume of Shenk’s puzzles that appear in the Wall Street Journal reflects an issue:

Well, at the Journal, when 66% of the puzzles come from one individual, everyone else is an under-represented group.

To that total, we must also add the number of puzzles that we know are contracted out by Shenk, most notably the Friday meta puzzles, as he does not accept submissions for those.

So to remove the other possible editorial questions, the only question I have interest in looking into is exactly how many puzzles are constructed by Mike Shenk.

Analysis
To that end, I used the program I used then to produce a list of authors and counts as tab-delimited CSV files. I processed against PUZ files available from the site that sources them (today’s PUZ file), as posted here. The date range I used is the entire time to today’s date that the Wall Street Journal produced a daily puzzle (09-14-2015 to 05-17-2018). I produced a listing of the entire period, but also a listing by years.

To begin, I will list the number of puzzles processed:

2015 - 91 puzzles
2016 - 306 puzzles
2017 - 303 puzzles
2018 - 114 puzzles
------------------
All  - 814 puzzles

Upon first observation, the five data sets are incredibly similar in the names that show up. Other than this, I will keep observations confined to the entire data set of puzzles.

There are 165 constructors/groups of constructors that are listed. This data set is included below along with a raw listing of puzzles processed along with authors:

Count of times that authors have appeared in the WSJ Crossword Puzzle

List of puzzles I processed along with each author

The next problem at hand is identifying Mike Shenk’s aliases and his contractors. I hesitate to do this since I’m not incredibly versed at all of his likely aliases and could be wrong either in listing one or in not listing one. If I am, I am glad to correct things.

In looking over the list, a number of Mike Shenk’s aliases appear. Of the ones listed, those that have been identified or suspected to be known as Mike Shenk on crosswordfiend.com: Marie Kelly (his meta puzzle alias), Alice Long, Dan Fisher, Harold Jones, Daniel Hamm, Gabriel Stone, Damian Peterson, Melina Merchant, Nancy Cole Stuart, Julian Thorne, Ethan Erickson, Colin Gale, Martin Leechman, Maxine Cantor, Charlie Oldham, Heidi Moretta, Mae Woodard, Theresa Schmidt, Celia Smith, Natalia Shore, Becky Melius, Judith Seretto, and Maryanne Lemot.

If we add those together (assuming everything is correct above), Mike Shenk constructed 323/814 or around 40% of the total puzzles published since 09-14-2015 in the Wall Street Journal. We are left with 141 constructors.

The next question is one of contracted for grids if we are looking into the window of open submission space at the WSJ. It is known that submissions are not accepted for the Friday WSJ meta puzzle. Shenk does about half of these, but there are others that pick up this slack. I happen to save meta solutions and have 138 on hand from this period. Subtract the number from Shenk’s meta alias and you get 85 other puzzles. Add this to Shenk’s total and you have 408/814 that were either done by the editor or contracted by him. This amounts to 50% of the total number of puzzles printed.

Conclusion
While this is not Peredo’s original guess, it still represents a pretty high ratio of concern for his original concern. I can’t say I have a huge horse in this race, since I haven’t actually submitted grids yet. But I can say in seeing the claims of concern that looking into this has been an interesting pursuit. I’m pretty sure that I probably got something wrong somewhere along the way, so I’m pretty welcome to any criticism or thoughts that may stem from this.

Edit: I discovered in rechecking the data that I double-counted 11 puzzles (duplicate files from restoring all my WSJ puzzle backups). This post and its associated data files has been modified to reflect that correction.

Advertisements

Wall Street Journal Crossword: Clue Stats For 2017

I continued doing some stats, and ended up looking at clues for the WSJ crosswords in 2017.

Any questions are welcome, if I didn’t think of them in the questions I wanted answered…

Anyhow, I’ll restate a few random facts that are relevant to any analysis of data:

  • Let’s start with the commonly known rule that a single word generally almost always is not allowed to appear more than once in a specific grid. Applying this rule will make looking at this data a lot easier.
  • I processed 304 PUZ files. This short number is not to be unexpected since the WSJ does not run a puzzle on Sunday or on holidays.
  • The WSJ crossword puzzles used 14,684 unique words in 2017.
  • Of those words, 10,079 were used exactly once. A ratio of words trending towards once is to be expected since most theme entries will be unique. But for some reason, I was surprised that there is this many that only occurred once.

Question 1: Repetition
Now the first question to look at with clues that I thought of involves repetition of cluing. This involves the same word appearing multiple times with the same clue. For example, ALOT appeared with the clue [Heaps] 7 times. Naturally, if a word appears once, the clue used with it only appears once, but a single word can be clued multiple ways.

Some random facts out of this analysis:

  • Words in the WSJ crossword puzzles were clued 24,822 separate ways in 2017.
  • Of those, 23,631 were clued only once. If we subtract the words used only once, there were 13,552 words that were used multiple times that were clued in different ways. This suggests a degree of creativity in how the clues were written.
  • Of the rest, 1,031 were used twice, 126 used three times. This eliminates all but 33 of the word/clue pairs.

That list of 33 word/clue pairs used 4 or more times in 2017 WSJ crosswords (or more than 1.3% of the time) – click to reveal:

ALOT	[Heaps]	7
ALA	[In the style of]	6
ONO	[Lennon's love]	6
ALA	[Copying]	5
ANI	[Singer DiFranco]	5
AREST	["Give it ___!"]	5
ATON	[Heaps]	5
EXERT	[Bring to bear]	5
ONSET	[Beginning]	5
AER	[___ Lingus]	4
AGO	[In the past]	4
APT	[Fitting]	4
AREA	[Vicinity]	4
ARIA	[Diva's delivery]	4
ASTO	[About]	4
ATE	[Put away]	4
CLAD	[Not nude]	4
DES	[___ Moines]	4
EAT	["Dig in!"]	4
EMT	[CPR pro]	4
EON	[Interminable wait]	4
ERAS	[Eon divisions]	4
ESPY	[Spot]	4
EXPO	[Convention center event]	4
IDEA	[Notion]	4
INRE	[About]	4
MIEN	[Bearing]	4
NADA	[Zilch]	4
OPEN	[Ready for business]	4
OVAL	[Cameo shape]	4
PAL	[Buddy]	4
REB	[Yank's foe]	4
SEEN	[Spotted]	4

Question 2: Creativity
The last question I had involved the creativity of cluing. In other words, how many times has a word appeared with different clues attached to them. For instance, UNO appeared 8 times using 7 different clues:

  • [One, for Juan]
  • [Card game with a four-color deck]
  • [Game akin to Crazy Eights]
  • [Start of a Cuban count]
  • [Game with red, green, blue and yellow suits]
  • [56-Down, to Fernando]
  • [One of the Medicis]

Some facts out of this analysis:

  • The number of this list should match the original list, which it does.
  • Of those, 10,294 words were clued exactly one way. This discrepancy with the number of words (215) that appeared only once (10,079) can be explained by different words appearing with the same clue.
  • 2,259 have 2 separate clues, 924 have 3 separate clues, 456 have 4, 248 have 5, 192 have 6, 90 have 7, 64 have 8, 37 have 9, 37 have 10. This eliminates all but 83 of the words.
  • The top of this list bears a striking resemblance to the original list. This says that even with the repetition that the constructors/editor are making an attempt to vary the clues.

    The list of 83 words with the most different clues:

    ORE	30
    ERA	28
    OLE	26
    ALI	23
    ERIE	22
    ALOE	21
    AREA	21
    ASH	21
    ALE	18
    ELI	18
    ETA	18
    RIO	18
    SET	18
    ARIA	17
    ERR	17
    YES	17
    ANTE	16
    EDEN	16
    LEE	16
    ONE	16
    TEN	16
    ALTO	15
    AMI	15
    EWE	15
    OREO	15
    SEE	15
    TEE	15
    ALA	14
    AMEN	14
    ASIA	14
    ELS	14
    END	14
    ICE	14
    NET	14
    SPA	14
    ANTI	13
    ASS	13
    EASE	13
    EMU	13
    IDO	13
    ISLE	13
    SEA	13
    USE	13
    ABBA	12
    ABEL	12
    ACE	12
    AGE	12
    AIR	12
    ALAS	12
    ARE	12
    ARENA	12
    ARI	12
    AWE	12
    EDIT	12
    EGO	12
    EROS	12
    EVE	12
    IRE	12
    LAB	12
    NEE	12
    ORAL	12
    SHE	12
    ACRE	11
    ADA	11
    ANN	11
    ARC	11
    ATE	11
    ATM	11
    CIA	11
    EBB	11
    ENDS	11
    ERAS	11
    ESP	11
    OAR	11
    OBOE	11
    OTTO	11
    RED	11
    RIOT	11
    SCOT	11
    SPAS	11
    STY	11
    TIN	11
    URSA	11
    

    Thanks for reading, and as stated, if anyone has any other good questions to ask out of the data, be sure to ask!

Wall Street Journal Crossword: Most Used Words For 2017

As part of my interest to get some interesting data (and a word list), I ended up using the code I used here (refined) against all the WSJ PUZ puzzles released in 2017 and generated a CSV word list, along with counts of the words.

Any questions are welcome, if I didn’t think of them in the questions I wanted answered…

Note: Some of the words will be invalid/nonsensical because some gimmicks involve taking a universal part of a theme set and moving it up or down from the across entry (the WSJ has run at least one puzzle like this in this time frame). So the whole list won’t be that exact or accurate.

Anyhow, here’s some readily identifiable random facts I found:

  • Let’s start with the commonly known rule that a single word generally almost always is not allowed to appear more than once in a specific grid. Applying this rule will make looking at this data a lot easier.
  • I processed 304 PUZ files. This short number is not to be unexpected since the WSJ does not run a puzzle on Sunday or on holidays.
  • The WSJ crossword puzzles used 14,684 unique words in 2017.
  • Of those words, 10,079 were used exactly once. A ratio of words trending towards once is to be expected since most theme entries will be unique. But for some reason, I was surprised that there is this many that only occurred once.
  • 2,230 occurred twice, 980 occurred three times, 503 occurred 4 times, 269 occurred 5 times, 207 occurred 6 times, 114 occurred 7 times, 84 occurred 8 times.
  • This eliminates all but 218 of the words in the list. This entire list has words that occurred in 3% of the total number of puzzles or greater.
  • ERA and ORE occurred 34 times, making them the most used words in WSJ crosswords for 2017. This constitutes 11% of the total number of grids that were produced.
  • A super-majority of the 218 are three or four letter words with a few five letter words sprinkled in between.

Now here’s what I’m sure people were waiting for: The top #100 words in the WSJ according to usage:

ERA 	34
ORE 	34
AREA 	27
OLE 	27
ALA 	25
ALOE 	25
ERIE 	25
ALI 	24
ARIA 	22
ASH 	22
ELI 	20
ERR 	20
RIO 	20
SET 	20
ALE 	19
IRE 	19
ONE 	19
SEE 	19
YES 	19
AMI 	18
ANTE 	18
EDEN 	18
END 	18
ETA 	18
ALTO 	17
ANTI 	17
ISLE 	17
LEE 	17
OREO 	17
ALOT 	16
ELS 	16
EMU 	16
EWE 	16
TEE 	16
TEN 	16
USE 	16
AMEN 	15
ARI 	15
ASIA 	15
ATE 	15
ENDS 	15
SPA 	15
ABBA 	14
ABEL 	14
ACE 	14
ASS 	14
AWE 	14
EASE 	14
EGO 	14
ERAS 	14
EROS 	14
ICE 	14
NET 	14
ORAL 	14
SEA 	14
ALAS 	13
ARE 	13
ARENA 	13
EAT 	13
IDO 	13
IKE 	13
LAB 	13
NEE 	13
OAR 	13
RIOT 	13
ADO 	12
AGE 	12
AIR 	12
ALEE 	12
ANN 	12
ARC 	12
ASSET 	12
EBB 	12
EDIT 	12
ELK 	12
ELSE 	12
ESP 	12
EVE 	12
OBOE 	12
ODE 	12
PSI 	12
RED 	12
SETS 	12
SHE 	12
TIN 	12
ULNA 	12
ACRE 	11
ADA 	11
AGO 	11
ALEC 	11
AMMO 	11
ANT 	11
ASK 	11
ATM 	11
ATOM 	11
BRA 	11
CIA 	11
ETON 	11
EURO 	11
EYE 	11

Counting Across and Down Clues

Another interesting question came about in following a discussion. Writing on Bill Butler’s NYT blog, Dale Stewart writes:

Why are there always more Across entries than Down entries?

I had thought originally that that must mean that the Acrosses are shorter words than the Downs.

Am I missing something that is obvious? I really do not know. Can you possibly explain this to me?

Despite mistaking the clue numbers for the actual counts of across and down entries, Stewart comes upon something that’s interesting to look into.

Are There Any Construction Constraints Upon Across/Down Answers?
A further question is put forth by Dave Kennison that adds to the discussion:

… I might insist on flipping some of my completed puzzles (all of them? half of them, chosen at random?) about the diagonal running from upper left to lower right (so that every across clue becomes a down clue and vice versa), necessarily swapping the numbers of clues in the two directions and invalidating your observation for my puzzles …

As it turns out in this case, unless there’s a cogent reason to have answers run in the down direction (e.g. “something up” embedded theme answers), the preference for long or theme answers is in the across direction as people read them most naturally. According to most of the style guides I’ve read on constructing, most editors will reject important answers in the Down direction. Typical crossword constructing software will perform this “grid flip” easily in a setter or editor’s hand, so it becomes a moot point. But it brings out a constraint that we can note in our further discussion.

The Analysis
To look into the original question of ratios of down and across answers in a grid, I collected a number of Wall Street Journal, BEQ grids, and Matt Jones grids. I removed all but the 15×15 grids (the most common), and then wrote software to analyze them and output a CSV with the number of across/down clues and a ratio. I ended up with 294 puzzles in the final output, which I then loaded into my spreadsheet and sorted by the ratio. A small sample below:

(PUZ NAME),(ACROSS),(DOWN),(AcrClues),(DownClues),(Ratio)
938NoBigPunIntended.puz,15,15,27,45,0.6
wsj170731.puz,15,15,37,39,0.948717948717949
993ThemelessMonday.puz,15,15,34,34,1
928ThemelessMonday.puz,15,15,41,27,1.51851851851852

I couldn’t get an attractive looking chart off of this data (too many data points), but one fact came out in observing the Mode of the data (the data point that occurs the most):

Most grids are perfectly balanced (Ratio of 1).

Furthermore, in performing a similar culling of the data as in this study (Mean: 0.926713490721385, Std Dev: 0.094482691813345, 61 outliers total), we can make a few observations:

The majority of grids are balanced slightly towards less across answers than down ones.

1009ThemelessMonday.puz,15,15,31,33,0.939393939393939

This is further shown in the first outlier entries on either side:

jz171012.puz,15,15,35,43,0.813953488372093
990PartyLine.puz,15,15,41,39,1.05128205128205

This would seem that the editing constraint I mentioned above comes into play to pull this data slightly towards the across side.

So What Determines This Ratio?
In looking at Dale Stewart’s original comment, the question above becomes interesting in answering the question. The side effect of this data analysis is that we can identify extreme cases, where whatever property that causes this should be very evident, and investigate further. These are the most extreme cases:

938NoBigPunIntended.puz,15,15,27,45,0.6
928ThemelessMonday.puz,15,15,41,27,1.51851851851852

These both happen to be BEQ grids. In posting to his own web site, he gets to experiment a bit more as opposed to when he is subject to another editor. We will start with the first:

938-AcrDown-01

The large number of long across answers should jump out at you immediately. But we’ll delve a bit deeper. In counting the words:

Across:
4 3 letter words.
4 4 letter words.
8 6 letter words.
2 7 letter words.
4 8 letter words.
1 9 letter word.
2 10 letter words.
2 15 letter words.

Down:
26 3 letter words.
4 4 letter words.
8 5 letter words.
2 6 letter words.
5 7 letter words.

We can definitely note that the larger across words are limiting to the size of the words in the down fill. This is to facilitate completing the grid in an easier way, as often the longer answers are preferred for theme entries or the like. This is especially seen in the next puzzle after this one, which contains 2 triple stacks of 15 entries.

I won’t go into as much depth with the other entry, but I’ll provide a screen shot of it:
928-AcrDown-02

Note that a similar situation to the first example occurs in the Down direction.

Conclusion
In performing this analysis, it seems that the ratio of across/down answers is determined by the relative number of long entries. Furthermore, given editor constraints that important/long theme entries be in the across direction, most puzzles will tend to have fewer across answers than down answers. I also observed that the kind of puzzle (themeless, 21×21) doesn’t make a difference as these kinds of grids appeared similarly in the analysis. Beyond this, I would make the observation that most constructors aren’t going to particularly care about how many Across and Down clues that might exist in any particular puzzle. Whatever results will tend to be from the requirements of the puzzle.

I don’t know how interesting this will turn out to be, but hopefully it was interesting to someone. If you have any comments or questions, please feel free to do so below or to my e-mail.

Comparing New York Times and LA Times Crosswords

If you recall from last time, I mentioned the possibility of comparing Bill Butler’s New York Times crossword blog and his Los Angeles time crossword blog relative to solving time. Subsequently I have had the opportunity to perform the previous analysis on Mr. Butler’s LA Times blog myself (09-18-2012 to 06-30-2017).

The Result
While I won’t reiterate the process from last time, I’ll simply relay the final result. As before, this is the range of values that are plus or minus one standard deviation of the mean.

Monday 293.17 342.49 391.8 (5m 42s)
Tuesday 320.41 377.93 435.46 (6m 18s)
Wednesday 375.24 452.69 530.14 (7m 32s)
Thursday 437.88 557.82 677.76 (9m 18s)
Friday 550.55 739.25 927.96 (12m 19s)
Saturday 690.12 936.89 1183.67 (15m 37s)
Sunday 971.27 1282.24 1593.22 (21m 22s)

Here is the chart from that data:

LA Times Chart

Comparison
Now that I have data from a solver for two puzzle providers, comparisons are possible. I’ll provide charts to that effect below:

Comparison03

Comparison02

Standard deviation reflects the variance of the data, as seen by both the separate line charts. As presented here, similar conclusions can be drawn to what I surmised previously.

Again, hopefully this has provided some further interest to the whole topic.

An Analysis of New York Times Solving Times

My main interest was in answering some other questions.  The thought of duplicating David Kaleko’s study as I heard of it on Bill Butler’s LAT Crossword blog proved interesting.

Data
As he did, I located time data on Bill’s NYT blog. I scraped through Bill’s blog from 03-01-2011 to 06-27-2017. I could have gone earlier than 03-2011, but it became hard to locate posts where he records times, as his first post to that blog illustrates. As well, this provides a greater number of data points than was available to Mr. Kaleko.

Given the tools I possess on my computer that I am fluent with (Delphi & spreadsheet), along with the other questions I wished to answer, I produced a CSV file to process against which could be manipulated to answer most of the questions I (or others) might have. These can be things like suggestions of grids to play in the archive if one has a NYT subscription, hardest grids, easiest grids, time periods for hard/easy publishing (I’ve wondered if it varies around certain events like the ACPT, since the NYT Crossword Editor, Will Shortz, manages this event), and the like.  If there is any demand for the CSV files, please let me know in the comments.

Sort Code, NYT Code, Day Published, Bill’s Time, Bill’s Time in Seconds
17-0624,0624-17,Saturday,Did Not Finish,-1
17-0625,0625-17,Sunday,23m 25s,1405
17-0626,0626-17,Monday,4m 56s,296
17-0627,0627-17,Tuesday,5m 31s,331

The second field is the NYT key code which they attach to all their puzzles. I reordered that so it could be sorted chronologically in the first field. The third is the day the puzzle appeared, the fourth is Bill’s specification of the time as it appears on the page.

After manually cleaning up the time data, I processed this file to add the time field. -1 specifies a puzzle which Bill either did not finish or did not record a time.

I split the data into separate files based on day of the week. Then I sorted each file based on time to aid more manual investigations.

Analysis
After doing that, I sought to duplicate David Kaleko’s methods. I obtained the mean and standard deviation for each day within the spread sheet. Then I stripped the outliers (2*StdDev away from Mean) from each file. Stats below for that:

Monday: 1.52% records removed as outliers.
Tuesday: 3.95% records removed as outliers.
Wednesday: 4.27% records removed as outliers.
Thursday: 5.18% records removed as outliers.
Friday: 7.58% records removed as outliers.
Saturday: 8.79% records removed as outliers.
Sunday: 5.49% records removed as outliers.

Commentary
In terms of the breadth of times, the more interesting part to note is how there can be a general variance of times from puzzle to puzzle. The New York Time puzzles are generally consistent, and Bill illustrates that in the Monday to Wednesday section. However the difficulty increases dramatically from Thursday to Sunday at times (but not others), increasing causing Bill’s times to vary, and producing a high standard deviation even when outliers are removed.

For instance, Bill fastest Friday solve is 8m27s, while his slowest is 68m06s. Thursday seems to be highly variable in difficulty, as it’s often known as “Tricky Thursday”, where changes in the typical crossword format are often published such as rebuses (multiple letters in one square) or other things where some kind of “trick” must usually be found out in order to complete the puzzle.  Usually in this puzzle, the time is often determined by how long it takes to discover this trick.  Friday and Saturday are themeless challenge days. The Sunday is a 21×21 grid, as opposed to the 15×15 grids typically offered throughout the week, where the claim is that it is at Thursday difficulty. There are occasional “tricks” in the Sunday NYT grid, but typically is a straight grid.

The Result
The next step is to duplicate the chart that Kaleko presents at the end. I do that by recalculating the Mean and standard deviation and then plot the times, plus or minus one standard deviation. Raw data appears below:

Monday 304.09 352.26 400.43 (5m 52s)
Tuesday 394.19 494.47 594.76 (8m 14s)
Wednesday 473.64 637.48 801.32 (10m 37s)
Thursday 723.31 1102.24 1481.17 (18m 22s)
Friday 822.46 1329.27 1836.07 (22m 09s)
Saturday 1176.17 1868.95 2561.74 (31m 08s)
Sunday 1266.15 1671.14 2076.12 (27m 51s)

So the traditional progression of difficulty seems to hold, though it is notable that Saturday on average took longer for Bill to solve than Sunday did, even though Sunday features a larger grid. I produced a chart which appears below, reflecting the New York Times solving times. As well, I reproduce the chart that David Kaleko produced for his LAT study.

How Hard Chart
Kaleko-LAT

While I don’t have data points of Kaleko’s LAT chart to plot onto this one to visually (or otherwise) compare, additional observations are interesting in comparing the relative difficulty of both puzzles. It seems the Monday NYT and LAT are roughly equivalent. The Tuesday NYT seems roughly equivalent to the Wednesday LAT. The Wednesday NYT seems equivalent to the Thursday LAT. Then the Thursday NYT seems equivalent to the Saturday LAT. Difficulty speaking. Personally, this is what I found when I started doing NYT grids regularly. In comparison to the LAT, the NYT grid seems to win out in its claims to be “the challenge” of crosswords.

Hopefully this was enjoyed, and the possibility to make some comparative observations would prove interesting.