Another interesting question came about in following a discussion. Writing on Bill Butler’s NYT blog, Dale Stewart writes:
Why are there always more Across entries than Down entries?
I had thought originally that that must mean that the Acrosses are shorter words than the Downs.
Am I missing something that is obvious? I really do not know. Can you possibly explain this to me?
Despite mistaking the clue numbers for the actual counts of across and down entries, Stewart comes upon something that’s interesting to look into.
Are There Any Construction Constraints Upon Across/Down Answers?
A further question is put forth by Dave Kennison that adds to the discussion:
… I might insist on flipping some of my completed puzzles (all of them? half of them, chosen at random?) about the diagonal running from upper left to lower right (so that every across clue becomes a down clue and vice versa), necessarily swapping the numbers of clues in the two directions and invalidating your observation for my puzzles …
As it turns out in this case, unless there’s a cogent reason to have answers run in the down direction (e.g. “something up” embedded theme answers), the preference for long or theme answers is in the across direction as people read them most naturally. According to most of the style guides I’ve read on constructing, most editors will reject important answers in the Down direction. Typical crossword constructing software will perform this “grid flip” easily in a setter or editor’s hand, so it becomes a moot point. But it brings out a constraint that we can note in our further discussion.
To look into the original question of ratios of down and across answers in a grid, I collected a number of Wall Street Journal, BEQ grids, and Matt Jones grids. I removed all but the 15×15 grids (the most common), and then wrote software to analyze them and output a CSV with the number of across/down clues and a ratio. I ended up with 294 puzzles in the final output, which I then loaded into my spreadsheet and sorted by the ratio. A small sample below:
I couldn’t get an attractive looking chart off of this data (too many data points), but one fact came out in observing the Mode of the data (the data point that occurs the most):
Most grids are perfectly balanced (Ratio of 1).
Furthermore, in performing a similar culling of the data as in this study (Mean: 0.926713490721385, Std Dev: 0.094482691813345, 61 outliers total), we can make a few observations:
The majority of grids are balanced slightly towards less across answers than down ones.
This is further shown in the first outlier entries on either side:
This would seem that the editing constraint I mentioned above comes into play to pull this data slightly towards the across side.
So What Determines This Ratio?
In looking at Dale Stewart’s original comment, the question above becomes interesting in answering the question. The side effect of this data analysis is that we can identify extreme cases, where whatever property that causes this should be very evident, and investigate further. These are the most extreme cases:
These both happen to be BEQ grids. In posting to his own web site, he gets to experiment a bit more as opposed to when he is subject to another editor. We will start with the first:
The large number of long across answers should jump out at you immediately. But we’ll delve a bit deeper. In counting the words:
4 3 letter words.
4 4 letter words.
8 6 letter words.
2 7 letter words.
4 8 letter words.
1 9 letter word.
2 10 letter words.
2 15 letter words.
26 3 letter words.
4 4 letter words.
8 5 letter words.
2 6 letter words.
5 7 letter words.
We can definitely note that the larger across words are limiting to the size of the words in the down fill. This is to facilitate completing the grid in an easier way, as often the longer answers are preferred for theme entries or the like. This is especially seen in the next puzzle after this one, which contains 2 triple stacks of 15 entries.
I won’t go into as much depth with the other entry, but I’ll provide a screen shot of it:
Note that a similar situation to the first example occurs in the Down direction.
In performing this analysis, it seems that the ratio of across/down answers is determined by the relative number of long entries. Furthermore, given editor constraints that important/long theme entries be in the across direction, most puzzles will tend to have fewer across answers than down answers. I also observed that the kind of puzzle (themeless, 21×21) doesn’t make a difference as these kinds of grids appeared similarly in the analysis. Beyond this, I would make the observation that most constructors aren’t going to particularly care about how many Across and Down clues that might exist in any particular puzzle. Whatever results will tend to be from the requirements of the puzzle.
I don’t know how interesting this will turn out to be, but hopefully it was interesting to someone. If you have any comments or questions, please feel free to do so below or to my e-mail.