# Building the Nebula Model, Part 2

This post continues my discussion on building my 2015 Nebula Best Novel prediction. See Part 1 for an introduction.

My model combines a number of factors (which I’m calling indicators) of past Nebula Best Novel success to come up with an overall percentage.

In 2014, I used 12 different indicators of Nebula success based on Nebula Data from 2001-2014. They were as follows:

NOTE: These percentages have not yet been updated with the 2014 results. Leckie’s win in 2014 will lower the % value of Indicators #1-4 and raise the % value of Indicators #5-12. That’s on my to-do list over the next few weeks.

To come up with those percentages, I looked up the various measurables about Nebula nominees (past wins, placement on lists, etc.) using things like the Science Fiction Award Database. I then looked for patterns in that data (strong correlations to winning the Nebula), and then turned those patterns into the percentage statements you see above.

Using those statements, I calculate the probability for each of the 2015 nominees for each Indicator. So, for example, take Indicator #1: Nominee has Previously Been Nominated for a Nebula. Such novels win the Nebula a robust 84.6% percent of the time. Of this year’s 6 nominees, 4 have previously been nominated for a Nebula (Leckie, VanderMeer, McDevitt, Gannon). If I considered no other factors, each would wind up with a (84.6% / 4) = 21.2% chance to win the Nebula. Our two fist timers (Liu and Addison) have to split the paltry remnants ((100% – 84.6%)) / 2 = 7.7% each.

I like it when my indicators make some logical sense: a prior Nebula nominee is more familiar to the SFWA voting audience, and thus has an easier time grabbing votes. That bias is reflected in the roughly 13% advantage prior nominees gain in a category. That is a significant bump, but not an overwhelming one. It would be pretty unsatisfying to end there. Past Nebula noms are just one possible indicator: by doing the same kind of calculation for all 12 of my indicators, and then combining them together, we get a more robust picture. Leckie had never been nominated for a Nebula before last year, but she won anyway; she dominated many of the other indicators, and that’s what pushed her to the top of my prediction.

So, that’s the basic methodology: I find past patterns, translate those into percentage statements, and then use those percentages to come up with a probability distribution for the current year. I then combine those predictions together to come up with my final prediction.

I’ve got to make a couple tweaks to my Indicators for 2015. First off, I was never able to get Indicator #10 to work properly. Finding a correlation between Amazon/Goodreads ratings or scores and Nebula/Hugo wins has so far, at least for me, proved elusive. I also think I need to add an Indicator about “Not being a sequel”; that should help clarify this year, where the Leckie, McDevitt, and Gannon novels are all later books in a series. I’m tossing around adding a “Didn’t win a Best Novel Nebula the previous year” concept, but I’ll see how things work out. EDIT: This would be there to reflect how rare back to back Nebula wins are. That has only happened 3 times (Delany, Pohl, Card), and hasn’t happened in 30 years. This’ll factor in quite a bit this year: is Leckie looking at back to back wins, or will voters want to spread the Nebula around?

I’m always looking for more indicators, particularly if they can yield high % patterns. Let me know if you think anything should be added to the list. The more Indicators we have, the more balanced the final results, as any one indicator has less of an impact on the overall prediction.

You’ll notice that my Indicators break into four main parts: Past Awards History, Genre, Current Year Critical/Reader Reception, and Current Year Awards. Those four seem the big categories that determine (in this kind of measure) whether or not you’re a viable Nebula candidate.

In the next post, we’ll talk about how this data gets weighted and combined together.

Tags:

### 19 responses to “Building the Nebula Model, Part 2”

1. Joe Sherry says :

Not Being a Sequel of a Nebula Winning Novel? Because if Ancillary Sword wins, much of that is tied to Ancillary Justice. If Sword does not win, there really isn’t much hope for Mercy.

Which raises different questions: How many sequels were nominated in which the first book was nominated and how many sequels were nominated when the first book was not nominated?

How many sequels won, when the first book did not win?

• chaoshorizon says :

I probably didn’t phrase it right. I think Leckie is at a disadvantage this year because it is so hard to win back to back Nebulas. I think a lot of voters will deliberately not vote for Ancillary Truth just to vary it up. That’ll give Ancillary Mercy a great shot to win, though. I’ll change the phrasing to clarify.

• Joe Sherry says :

According to a quick search, it’s only happened twice (Card and Pohl). I’d say it’s hard to win a Nebula, and less about what it takes to go back to back. Leckie has as good a shot as anyone to pull that off, but there are some serious contenders also nominated.

• chaoshorizon says :

I’m definitely going to add “not a sequel.” We’ll see if that does enough to the prediction, and I want to do a full fledged study of sequels soon. Too many things to look at, not enough time!

• Tudor says :

The Leckie situation in 2015 is very similar to Card’s in 1986, because exactly like Speaker for the Dead, Ancillary Sword is a very different second volume that came a year after an extremely popular first volume.

2. What about something along the lines of ‘popularity buzz’? Being the number of times a nominee is an interview or guest on media such as blogs/podcasts, and same thing for reviews and mentions of the novel being nominated.
Not sure how you do this though… If #10 is difficult, I can’t imagine figuring out correlations with that.

• chaoshorizon says :

I’d love to figure something like that out—but it’s the lack of data that’s killing me. Since we don’t have historical numbers about such things, you’d just be guessing how influential blogs/podcasts/buzz were. That’s one of the reasons I’m collecting data as I go along: maybe in 3-4 years I’ll have enough data to find a pattern. I also fear that the field is in a “silo” state right now: six or seven different communities that don’t necessarily influence each other. Does buzz on Reddit bleed over to Tor.com and then to the Baen forums? Or are each of those separate?

• That’s what I figured – collecting the data and how much influence (good or bad) has to be a nightmare to put together. If you could figure out some pattern though… Well, first, I’d be incredibly impressed by the time and effort of that. Second, with how influential media is, I imagine that could greatly improve your vote percent yield. (Although you did pick the Hugo and Nebula winners last year! 🙂 )

3. Tudor says :

Have you looked at something like the lowest rating or lowest number of ratings on Goodreads? Can you say that beyond a certain threshold a book will probably not get a nomination?

• chaoshorizon says :

I’d like using something like that, and it’s one of the reasons I’m collecting Goodreads and Amazon data. The problem is the difficulty in capturing historical data: we can’t go back in time and see what Jo Walton’s # of ratings before the Hugos + Nebulas were. Once the awards are given, they sell copies, which screws up the data. One nice things about my formula is that I can add Indicators and then check how they influence the final outcome. It’s very easy to zero out Indicators that don’t work. I did that last year with Amazon ratings.

• Tudor says :

Because I’m using for my analysis the data from the day after the nomination period is over, I can say that the rating of any nominated book is the same now as was at the end of nomination period (Redshirst 3.82-3.81, Ancillary Justice 3.98-3.98). The number of ratings is always changing, but the rating is stable after a couple of months and no award can change it, so I think that you can use the book’s rating now in order to see what the rating was in the period when people were thinking about which books to nominate even if we are talking about 2012 or 2013.

4. Alan Calder says :

While there may be suficient correlation between the winners of Hugo, Nebula and other awards to suggest a degree of causation, the reality is that the differences between the nomination and judging processes for each of the awards is such that data mining is no more likely than reading the entrails of a an early-morning slaughtered goat to deliver a (reasonably) certain outcome. The key question must be: what causes judges to select their winners? There may be more value in looking at the characteristics of multiple award-winning novels to determine what a writer has to do to win awards than in extrapolating from one award to another. Publishers think about audiences; a book with a very narrow or very specific audience (eg Jonathan Strange or The Anubis Gates) in mind may do extraordinarily well with that audience, but win neither a Hugo nor a Nebula, whereas a book with a more universal theme which engages a wider audience (eg Ancillary****) is likely to be nominated for more awards. Abercrombie’s ‘Half a…’ Is going to turn up as a nominee on a number of award lists because it has wide appeal (and is well-written although maybe not heavy-weight enough to win) whereas The Three Body Problem is heavy-weight but insufficiently engaging to shift it from being an insider’s tip to being a winner. How did a Harry Potter win a Hugo if not through breadth of appeal? iIt wasn’t exactly included on other SF award lists – after all, SF it ain’t.

• NatLovin says :

Jonathan Strange won the Hugo. 🙂

• chaoshorizon says :

No statistical measure is going to provide a “reasonably certain outcome”; that’s not what statistics can do. The past patterns, though, can help us improve from basic coin-flip odds (20% odds for the Hugo, or 1 in 5), to something in the range of 30% for some entries, 10% for others. That’s a 50% gain in certainty. Whether you find that significant is up to you.

• Alan Calder says :

I collect ultra-modern first editions, so being able to predict likely prize winners in advance would be useful – and undoubtedly there are frequent common nominations. The work you’re doing is interesting; please keep it up. My point is that a novel’s target audience, the breadth of its appeal, and the nominating/judging mechanisms for each award, are all important factors in predicting the outcomes.

• chaoshorizon says :

I agree with you on those categories—I’m just struggling to come up with an objective way to measure them. I’ve been thinking about tracking reviews more closely; maybe a metric like total number of positive reviews would be meaningful. Trying to measure breadth of appeal, though, often comes down to sheer opinion. While I respect opinion, I’ve been trying to shy away from that on Chaos Horizon. Have you checked out pprize.com for modern collecting? They do some good work on trying to predict the Pulitzer Prize winners.

• Alan Calder says :

My brain thought ‘Ack-Ack Macaque’ but for some reason my finger wrote ‘Jonathan Strange’. The latter is of course a good example of a First Novel doing well across a number of prizes – another instance where quality of book trumped absence of previous award history – whereas the former’s high-water mark was a BSFA co-win with Ancillary Justice last year.

Shouldn’t the number of times the person has been nominated be a factor as well?

Clearly since McDevitt has been nominated MULTIPLE times before he should get a bigger bump (in this category) than someone like Leckie or Gannon.

• chaoshorizon says :

McDevitt throws everything off: he has a terrible win percentage. Before this year, he had 11 nominations and only 1 win, for a dismal 9.1% win percentage. I do agree that, in most cases, multiple nominations across a broad spectrum of categories and awards (Hugo + Nebula) should help, and that’s what my Indicator #4 tries to do. If I tossed McDevitt’s data out, we could problem find a good correlation between # of nominations and # of wins. Since we have so little data, though, I’ve been hesitant to do that.

Xeno Swarm

Multiple Estrangements in Philosophy and Science Fiction

AGENT SWARM

Pluralism and Individuation in a World of Becoming

Rare Horror

We provide reviews and recommendations for all things horror. We are particularly fond of 80s, foreign, independent, cult and B horror movies. Please use the menu on the top left of the screen to view our archives or to learn more about us.

Space and Sorcery

The BiblioSanctum

A Book Blog for Speculative Fiction, Graphic Novels... and more!

The Skiffy and Fanty Show

Running away from the thought police on wings of gossamer and lace...

MyLifeMyBooksMyEscape

A little about me, a lot about books, and a dash of something else

Far Beyond Reality

Science Fiction and Fantasy Reviews

Andrew Liptak

three more from on high

Reading science fiction and fantasy novels and short fiction.

Eamo The Geek

The Best In Sci-Fi And Fantasy Book Reviews by Eamon Ambrose

The Other Side of the Rain

Book reviews, speculative fiction, and wild miscellany.