The Hugo Award and Publication Dates, Part 3: Methodology and Data
This methodology post is unlikely to be much of interest to the casual reader, but I’m recording this information in case anyone wants to double check the data, or to call into question the kind of data I used. It is very easy to mislead the public using statistics, and Chaos Horizon is trying to avoid that by providing maximum transparency on all studies and reports. If you have questions, ask in the comments or e-mail me at firstname.lastname@example.org.
Date Range: Why 2001-2014? I used this date range because 2001 marks a substantial shift in the Hugo awards. Prior to 2001, the Hugo award for Best Novel was basically a SF award, with all prior awards having been Science Fiction novels. J.K. Rowling wins for Harry Potter and the Goblet of Fire in 2001, and this opens up the Hugos to all sorts of different genres and types of books, and can be thought of as starting the “modern” era of the award. There is also undeniable convenience to starting studies with the new millennia. It’s also hard to believe that the book market back in something like 1994 was the same as now: no internet, no e-books, vastly different audience and buying habits. The farther we go back in time, the more we cloud the statistics.
September 2014 is when the study was made, thus marking the upper part of the date limit.
Limitations: I limited myself to US publication dates in this study, although the Hugo encompasses both the American, British, and international authors and voters. No novel in translation was nominated for the Hugo Award from 2001-2014, so the exclusion of international publication dates seems justified.
British publication dates were trickier, and I initially explored them in some detail. That data is present on the third page of the Excel spreadsheet. British dates were not as readily accessible, and even when I could find them I had no real way of double-checking them. Furthermore, some texts were published simultaneously in the UK; in the case of British authors, some texts were published earlier; and in the case of American authors, some texts were published later. Those discrepancies introduced a great deal of uncertainty into the project, as it wasn’t clear which date should be used. British publication dates likely greatly impacted the years the WorldCon was in the UK, and had less impact when the WorldCon was in the US. If anyone can think of a clever way to find and handle British publication dates, I’m all ears.
Sources: To find the publication dates, I utilized three main sources. First, I used the International Science Fiction database, found at www.isfdb.org, to come up with an initial publication date. Probably the most in-depth resource for finding information about different SFF book editions, I utilized the first available date for US print editions in this study, excluding limited availability special editions.
Second: I cross-checked that isfdb date with Amazon. While we can debate some of Amazon’s sale practices, there is no doubt about the wide variety of book-related information their site offers. Since they are a professional book-seller, they have a huge stake in providing accurate data. Again, I tried to find the earliest published print edition, and, whenever possible, to match the ISBN of that edition against the isfdb.org info.
Interestingly—and frustratingly—the isfdb.org and amazon.com information often disagreed. Of the 68 dates provided, there were discrepancies in 20 of them. However, these were often very minor: isfdb.org reporting a March publication date, and amazon.com reporting a late February date. In general, amazon.com usually reported earlier publication dates by a few weeks.
Third: If the isfdb.org date and the amazon.com date disagreed, I went to the Barnes and Noble website to resolve the issue. Like amazon.com, this provides a wealth of information, and I trust their database because that’s how they make their money. In almost all instances, the amazon.com date agreed with the bn.com, so I went with the amazon/bn publication date. All disagreements are marked in the Excel spreadsheet.
Any discrepancies were only a matter of weeks (pushing a book from June to July), and are unlikely to cause major changes in the analysis. Still, you might want to avoid placing too much stock in any individual month; I believe the ranges of the seasons are more reliable.
Other possible sources: I tried out several other possible sources for publication data before discarding them. Both WorldCat and the Library of Congress, two major sources for cataloging books, only provided publication month, and I wanted as precise as information as possible.
Notes: Four nominated texts were excluded from the study. Robert Jordan and Brandon Sanderson’s The Wheel of Time is a series of 14 novels published over decades. Connie Willis won for Blackout/All Clear, two novels published during the same year. I could have used both dates, but I decided to go with neither to keep the data clear. Two books, both from the 2005 Hugos held in Glasgow, did not receive American releases prior to their year of nomination; those were River of Gods by Ian McDonald and The Algebraist by Iain M. Banks.
Weakness of the Study: With only 68 pieces of data, we’re falling far short of a substantial data set. As a result, small changes in the data—an individual author publishing in October rather than September—may affect the final results unduly. Since each individual novel accounts for around 1.5% of the total data, take everything with a grain of salt. While I feel it likely the broader conclusions are accurate, the specifics of months, particularly for the winners, probably needs to be de-emphasized. We shouldn’t place all that much stock that Jo Walton published Among Others in January rather than February, for instance.
While I could expand the data back another decade, and likely pick up 50+ more dates, I’ve decided not to go that route. I feel that the publishing market in the 1990s was substantially different than the publishing market in the 2000s, and that this additional data would not contribute much to the study. If someone else feels otherwise, and would like to chart that data, feel free. Send me a link if you do the analysis.
Here’s a link to the Excel spreadsheet that gathers all the data: Hugo Dates Study.
I think that sums up methodology questions. Let me know if you need any other information.