Correlating Goodreads vs. Amazon vs. Bookscan Numbers, Part 2

Over the past few posts, I’ve been looking at the correlation between Amazon data, Goodreads data, and the mythical “actual books sold” data that we don’t have. It would be nice if either Amazon or Goodreads correlated with that data, because then we’d be able to get a good estimate of actual sales.

Unfortunately, it appears that both the Goodreads and the Amazon data is demographically unreliable. That makes a certain amount of sense: websites cater to very specific audiences, and specific audiences don’t reflect the general reading public. Goodreads seem to be lean very young (under 40), and, according to Quantcast, has around a 70/30 female/male demographic. Amazon seems more neutral in terms of gender, but leans older (over 40) and wealthier (i.e. people who have enough money to buy lots of books online).

I’ve got one more set of data to present to you: for the past 5 months, I’ve been collecting data points that would compare Goodreads to BookScan numbers. BookScan is a point-of-sale recording service to see how many books are sold; instead of estimating from sampling, they actually try to count how many books are sold by different venues. They claim to cover some 80-90% of the market, although that’s probably inflated. Here’s a good article from Forbes that can serve as an introduction. I’ve heard plenty of authors say that Bookscan grabs less than 25% of their sales, and I think the more sell you through untraditional means (at cons, through small bookstores, etc.), the worse the BookScan numbers are.

Most BookScan data is locked behind a huge paywall—but Publisher’s Weekly prints weekly Hardcover bestseller lists on their website. They try to make it difficult (they only print # books sold this year), and they don’t include e-books. Still, if we were to take that data and compare it to Goodreads data . . . we’d have something.

This is exactly what I’ve done. Since early November, I’ve been tracking (weekly) any SFF book (broadly defined, and also including horror) that has shown up on the Top Hardcover Fiction list. I’ve then been comparing that to the number of Goodreads ratings for that week to see if there’s a sensible correlation between the two numbers.

While this isn’t perfect—some authors may sell a higher proportion of e-books than others—we’ll at least have a rough look at “actual physical” sales versus Goodreads ratings.

Table 1: Goodreads vs. Bookscan Data for SFF Bestsllers
Bookscan Chart

Only six SFF novels showed up on the Hardcover Bestseller list in the last 4 months. I put down the publication date because more people buy a book right when it comes out than read it right when it comes out; a long book like Revival might take people a month to read, so it may take a while for Goodreads ratings to catch up with sales. “Last Data” is the data for the week when the book fell off the cart; Gibson fell off quickly (two weeks), while Rothfuss stuck around longer. King is still going strong. The Hardcover column is the total amount of Hardcover sales as given by Publisher’s Weekly for that week; the Goodreads column is the number of Goodreads ratings from that same week.

Lastly we have the interesting column: the ratio of Hardcover sales/Goodreads ratings. In an ideal world, that would have been close the same number of everyone.

It is not. Goodreads is tracking the Rothfuss and Mandel in a totally different way than it is King, Rice, Koontz, or King. Perhaps this is because Mandel and Rothfuss are selling more to Goodreads’ specific audience (younger, female). Perhaps this is because the books are shorter. Perhaps this is because King and Rice sell more in places like Wal-Mart or Target, whose readers aren’t using Goodreads. Perhaps King and Rice are selling primarily to older readers, who are less inclined to use internet websites to record their reading habits.

In a complex statistical case like this, it probably comes down to a multitude of factors. With just 6 books, we don’t have enough data to hash that out. What we can say is that someone like Mandel is overperforming (compared to the average) on Goodreads in an enormous way. Even if we consider that a young author like Mandel might have a 50/50 Hardcover/e-book split in her sales (thus meaning she’s sold around 120,000 copies of Station Eleven, which seems reasonable), almost 25% of her total readers rated the book on Goodreads. That is astonishingly high. In contrast, King–whose probably tripled or quadrupled Mandel’s sales–only has about 5% of his readers on Goodreads.

That’s an enormous gap, and re-enforces what we learned in the last post: Goodreads is not a reliable indicator of total readers. It’s tracking Mandel and King in totally different fashion, and to compare Mandel to King via Goodreads makes Mandel seem more popular than she is and King less popular.

That doesn’t mean Goodreads is useless: it just means that it tracks a specific demographic. Whether that demographic is more in touch with the Hugo/Nebula awards is an open question.

One last chart for true stat geeks: let’s see what’s happened to the Amazon/Goodreads ratio over time. Not a ton of data here, as only 4 SFF books had a decent run on the Bestseller chart. Here it is:

Goodreads Amazon Ratio

You can see that Rice and King have reasonably shaped curves that are converging to around 15 in King’s case and at about 30 in Rice’s case. Mandel and Rothfuss have basically straight lines: they were popular on Goodreads to start, and haven’t changed at all. That re-enforces my last point: Goodreads treats King and Rice fundamentally differently than Mandel or Rothfuss.

With enough time and data—which we don’t have—we might be able to get a better sense of why books are tracked in different ways. Perhaps it would be a simple demographic correction (authors over 40 have this kind of ratio, authors under 40 have this kind of ratio). However, since Publisher’s Weekly doesn’t share enough data, we’re stuck. So be careful when looking at Goodreads numbers; they reflect a young audience, and are misleading when making comparisons between a Mandel and a Gibson.

I won’t lie: I’m a little disappointed that the Goodreads data isn’t more reliable. Given the large sample size, I’d hoped that Goodreads would flatten out any demographic bias. It doesn’t appear to do so, so any Goodreads numbers should be approached with healthy skepticism.

Next up for Chaos Horizon: start collecting Amazon data to see if that’s a better match to Publisher’s Weekly. Check back in a couple months, and we’ll see if that data lines up any better!

Tags:

2 responses to “Correlating Goodreads vs. Amazon vs. Bookscan Numbers, Part 2”

  1. Jo Walton says :

    When I got my last royalty statement, out of interest, I corerelated actual sales to Goodreads. The relationship is not the same from book to book of mine — not just not the same, but statistically all over the place — 11% of one book, 3% of another.

    Amazon is much worse.

    From looking at either one with actual sales figures, the comparative number of ratings certainly does reflect sales, but not in any usefully quantitatative way. I was hoping you’d find something different, but

    Also, _The Slow Regard of Silent Things_ IS A NOVELLA.

    • chaoshorizon says :

      Always good to get another few data points that confirm my observations. Thanks. It’s looking increasingly impossible to correlate sales to Goodreads or Amazon data. Maybe we can find some sort of limit (2-20%?), which at least can get us within an order of or so of magnitude. Perhaps the unreliability of Goodreads or Amazon data can help us better realize how isolated certain segments of the market are from each other. The level of popularity in one place does not necessarily equally popularity across the board.

Leave a comment

Xeno Swarm

Multiple Estrangements in Philosophy and Science Fiction

AGENT SWARM

Pluralism and Individuation in a World of Becoming

Space and Sorcery

Adventures in speculative fiction

The BiblioSanctum

A Book Blog for Speculative Fiction, Graphic Novels... and more!

The Skiffy and Fanty Show

Running away from the thought police on wings of gossamer and lace...

Relentless Reading

"A Veritable Paladin of Blogging!"

MyLifeMyBooksMyEscape

A little about me, a lot about books, and a dash of something else

Far Beyond Reality

Science Fiction and Fantasy Reviews

Andrew Liptak

three more from on high

Eamo The Geek

The Best In Sci-Fi And Fantasy Book Reviews by Eamon Ambrose

Mountain Was Here

writing like a drunken seismograph

thegrimdarkreview.wordpress.com/

Grimdark Book Reviews Every Wednesday

SFF Book Reviews

a reader's thoughts about fantasy & science fiction books

Philip K. Dick Review

A Re-read Project

Notes From the Darknet

Book reviews and literary discussion

Bookish

All books, reviews, genres, and bookish types welcome