This post continues my discussion on building my 2015 Nebula Best Novel prediction. See Part 1 for an introduction, Part 2 for a discussion of my Indicators, Part 3 for a discussion of my methodology/math, and Part 4 for a discussion of accuracy.
Taken together, those posts should help explain to anyone new to Chaos Horizon how Chaos Horizon works. This post wraps things up with a to-do list for 2015. To update my model for 2015, this is what I need to do:
1. Update the data sets with the results of the 2014 Nebulas, Hugos, and everything else that happened last year.
2. Rethink the indicators, and possibly replace/refine some of them.
3. Reweight the indicators.
4. Test model reliability using the reweighed indicators.
5. Use the indicators to build probability tables for the 2015 Nebulas.
6. Run the probability tables through the weights to come up with the final results.
I won’t be able to get all of this done until mid-April. It doesn’t make any sense to run the numbers until the Hugo noms come out, and those will be coming out this Saturday (April 4th).
This post continues my discussion on building my 2015 Nebula Best Novel prediction. See Part 1 for an introduction, Part 2 for a discussion of my Indicators, and Part 3 for a discussion of my methodology/math.
Now to the only thing anyone cares about: how reliable is the model?
Here’s my final Nebula prediction from 2014:
1. Ann Leckie, Ancillary Justice (25.8%) (winner)
2. Neil Gaiman, The Ocean at the End of the Lane (20.7%)
3. Nicola Griffith, Hild (11.2%)
4. Helene Wecker, The Golem and the Jinni (10.6%)
5. Karen Joy Fowler, We Are All Completely Beside Ourselves (9.8%)
6. Linda Nagata, The Red: First Light (8.2%)
7. Sofia Samatar, A Stranger in Olondria (7.7%)
8. Charles E. Gannon, Fire with Fire (6.0%)
As you can see, my model attaches % chances to each nominee, deliberately avoiding the certainty of proclaiming one work the “sure” winner. This reflects how random the Nebula has been at times. There have been some true left-field winners (The Quantum Rose, for instance) that should remind us statistical certainty is not a possibility in this case.
Broadly speaking, I’m seeking to improve our sense of the odds from a coin-flip/random model to something more nuanced. For 2014, a “coin-flip” (i.e. randomly picking a winner) would have given a 12.5% chance to each of these 8 nominees. My prediction doubled those odds for Leckie/Gaiman, and lessened them for everyone else. While that lacks the crisp assurance of “this person will definitely win,” I think it correctly reflects how variable and unpredictable the Nebula really is.
A fundamental weakness of my model is that it does not take into account the specific excellence/content of a book. I’ve done that deliberately. My thought process is that if you want analysis/forecasts based on the content/excellence of a book, you can find that elsewhere on the web. I want Chaos Horizon to do something different, not necessarily imitate what’s already being done. I don’t know how, in the more abstract/numerical terms that Chaos Horizon uses, to measure the relative quality of a Leckie versus a Gaiman. I don’t think Amazon or Goodreads measures quality in a compelling enough fashion to be useful for Chaos Horizon, although I’m happy to listen to counter-arguments.
Even if we could come up with an objective measure of quality, how would we correlate that measurement to the Nebulas? Some of my indicators do (either directly or indirectly) mirror excellence/content, but they do so at several removes. If a book gets lots of nominations, I’m accepting that SFF readers (and voters) probably like it. If SFF readers like it, it’s more likely to win awards. Those are pretty tepid statements. I’m not, at least for the purposes of Chaos Horizon, analyzing the books for excellence/content myself. I believe that an interesting model could be built up by doing that—anyone want to start a sister website for Chaos Horizon?
Lastly, I’ve tried to avoid inserting too much of my opinion into the process. That’s not because I don’t value opinion; I really like opinion driven web-sites on all sides of the SFF discussion. Opinion is a different model of prediction than what I use. I think the Nebula/Hugo conversation is best served by having a number of different analyses from different methodological and ideological perspectives. Use Chaos Horizon in conjunction with other predictions, not as a substitute for them.
I posted last year about how closely my model predicted the past 14 years of the Nebulas. The formula was 70% successful at predicting the winner. Not terrible, but picking things that have already happened doesn’t really count for much.
I’ll wrap up this series of posts with my “To-Do” list for the 2015 Nebula Model.
Now that we have 12 different indicators, how do I combine them together? This is where the theory gets sticky: how solidly do I want to treat each indicator? Am I try to find correlations between them? Do I want to pick one as the “main” indicator as my base, and then refine that through some recursive statistical process? Do I treat each indicator as independent, or are some dependent on each other? Do I treat them as opinions or facts? How complicated do I allow the math to be, given the low N we have concerning the Nebulas?
I thought about this, and read some math articles and scoured the internet, and I decided to use an interesting statistical tool: the Linear Opinion Pool. Under this model, I treat my data mining results as opinions, and combine them together using a Reliability Factor, to get a weighted combined percentage score. This keeps us from taking the data mining results too seriously, and it allows us to weigh a great number of factors without letting one of them dominate.
Remember, one of my goals on Chaos Horizon is to keep the math transparent (at a high school level). I want everyone who follows Chaos Horizon to be able to understand and explain how the math works; if it doesn’t, it becomes sort of a mysterious black box that lends an air of credibility and mystery to the statistics that I don’t want.
a weighted arithmetic average of the experts’ probability distributions. If we let Fi(x) denote expert i’s probability distribution for an uncertain variable of interest (X), then the linear opinion pool Fc(x) that results from combining k experts is:
where the weight assigned to Fi(x) is wi, and Σwi = 1.
Although the linear opinion pool is a popular and intuitive combination method with many useful properties, there is no method for assigning weights that is derived entirely from first principles. One can, however, interpret the weights in a variety of ways, and each interpretation lends itself to a particular way to calculate the weights.
This is a model often used in risk analysis, where you have a number of competing opinions about what is risky, and you want to combine those opinions to find any possible overlap (while also covering your ass from any liability). There’s plenty of literature on the subject; just google “Linear Opinion Pool” for more reading.
We have the probability distributions from my data mining. What weights do I use? That’s always the challenge in a Linear Opinion Pool. For Chaos Horizon, I’ve been weighting by how often that Indicator has actually chosen the Nebula in the past. So, if you used that Indicator and that Indicator alone to guess, how often would you actually be right? Not every Indicator comes into play every year, and sometimes an Indicator doesn’t help (like if all the nominated novels previously had Nebula nominations). We’ll be looking at all that data late in April.
Now, on to my mathematical challenge: can I explain this in easy to understand terms?
A Linear Opinion Pool works this way: you walk into a bar and everyone is talking about the Nebula awards. You wander around, and people shout out various odds at you: “3 out of 4 times a prior Nebula nominee wins” or “70% of the time a science fiction novel wins” and so forth. Your head is spinning from so much information; you don’t know who to trust. Maybe some of those guesses overlap, maybe some of those don’t. All of them seems like experts—but how expert?
Instead of getting drunk and giving up, you decide to sum up all the opinions. You figure, “Hell, I’ll just add all those probabilities up, and then divide by the total number of suggestions.” Then you begin to have second doubts: that guy over in the corner is really wasted and doesn’t seem to know what he’s talking about. I sidle over and ask his friend: how often has that guy been right in the past? He says 5% of the time, but that guy over there—the one drinking gin and tonics—is right 50% of the time. So I figure I better weight each opinion based on how correct they’ve been in the past. I add things up using those weights, and viola!, I’ve got my prediction.
It’s mathematically easy to calculate; no fancy software needed.
This allows me to add more indicators (opinions) very easily.
This treats my data mining work as an “opinion,” not a fact, which I think is closer to reality.
The weighting component allows me to dial up or dial down indicators easily.
The simple mathematics reflects the relative low amount of data.
The methodology is easy for readers to follow.
It’s not as mathematically rigorous as other statistical models.
The weighting component introduces a human element into the model which may be unreliable.
Because this treats my data mining results as “opinions,” not “facts,” it may compromise the reliability of the model for some readers.
Because it is simple, it lacks the flashiness and impressiveness of grander statistical models.
When we’re dealing with statistical modeling, the true test is the results. A rigorous model that is wrong all the time is worse than a problematic model that is right all the time. In my next post, we’ll talk about past accuracy. Here’s my older posts on the Linear Opinion Pool and weighting if you want some more info.
As a last note, let me say that following the way model is constructed is probably more interesting and valuable than the final results. It’s the act of thinking through how different factors might fit together that is truly valuable. Process, not results.
This post continues my discussion on building my 2015 Nebula Best Novel prediction. See Part 1 for an introduction.
My model combines a number of factors (which I’m calling indicators) of past Nebula Best Novel success to come up with an overall percentage.
In 2014, I used 12 different indicators of Nebula success based on Nebula Data from 2001-2014. They were as follows:
Indicator #1: Nominee has previously been nominated for a Nebula. (84.6%)
Indicator #2: Nominee has previously been nominated for a Hugo. (76.9%)
Indicator #3: Nominee has previously won a Nebula award for best novel. (46.1%)
Indicator #4: Nominee was the year’s most honored nominee (Nebula Wins + Nominations + Hugo Wins + Nominations). (53.9%)
Indicator #5: Nominated novel was a science fiction novel. (69.2%).
Indicator #6: Nominated novel places in the Locus Awards. (92.3%)
Indicator #7: Nominated novel places in the Goodreads Choice Awards. (100%)
Indicator #8: Nominated novel appears on the Locus Magazine Recommended Reading List. (92.3%)
Indicator #9: Nominated novel appears on the Tor.com or io9.com Year-End Critics’ list. (100%)
Indicator #10: Nominated novel is frequently reviewed and highly scored on Goodreads and Amazon. (unknown%)
Indicator #11: Nominated novel is also nominated for a Hugo in the same year. (73.3%)
Indicator #12: Nominated novel is nominated for at least one other major SF/F award that same year. (69.2%)
NOTE: These percentages have not yet been updated with the 2014 results. Leckie’s win in 2014 will lower the % value of Indicators #1-4 and raise the % value of Indicators #5-12. That’s on my to-do list over the next few weeks.
To come up with those percentages, I looked up the various measurables about Nebula nominees (past wins, placement on lists, etc.) using things like the Science Fiction Award Database. I then looked for patterns in that data (strong correlations to winning the Nebula), and then turned those patterns into the percentage statements you see above.
Using those statements, I calculate the probability for each of the 2015 nominees for each Indicator. So, for example, take Indicator #1: Nominee has Previously Been Nominated for a Nebula. Such novels win the Nebula a robust 84.6% percent of the time. Of this year’s 6 nominees, 4 have previously been nominated for a Nebula (Leckie, VanderMeer, McDevitt, Gannon). If I considered no other factors, each would wind up with a (84.6% / 4) = 21.2% chance to win the Nebula. Our two fist timers (Liu and Addison) have to split the paltry remnants ((100% – 84.6%)) / 2 = 7.7% each.
I like it when my indicators make some logical sense: a prior Nebula nominee is more familiar to the SFWA voting audience, and thus has an easier time grabbing votes. That bias is reflected in the roughly 13% advantage prior nominees gain in a category. That is a significant bump, but not an overwhelming one. It would be pretty unsatisfying to end there. Past Nebula noms are just one possible indicator: by doing the same kind of calculation for all 12 of my indicators, and then combining them together, we get a more robust picture. Leckie had never been nominated for a Nebula before last year, but she won anyway; she dominated many of the other indicators, and that’s what pushed her to the top of my prediction.
So, that’s the basic methodology: I find past patterns, translate those into percentage statements, and then use those percentages to come up with a probability distribution for the current year. I then combine those predictions together to come up with my final prediction.
I’ve got to make a couple tweaks to my Indicators for 2015. First off, I was never able to get Indicator #10 to work properly. Finding a correlation between Amazon/Goodreads ratings or scores and Nebula/Hugo wins has so far, at least for me, proved elusive. I also think I need to add an Indicator about “Not being a sequel”; that should help clarify this year, where the Leckie, McDevitt, and Gannon novels are all later books in a series. I’m tossing around adding a “Didn’t win a Best Novel Nebula the previous year” concept, but I’ll see how things work out. EDIT: This would be there to reflect how rare back to back Nebula wins are. That has only happened 3 times (Delany, Pohl, Card), and hasn’t happened in 30 years. This’ll factor in quite a bit this year: is Leckie looking at back to back wins, or will voters want to spread the Nebula around?
I’m always looking for more indicators, particularly if they can yield high % patterns. Let me know if you think anything should be added to the list. The more Indicators we have, the more balanced the final results, as any one indicator has less of an impact on the overall prediction.
You’ll notice that my Indicators break into four main parts: Past Awards History, Genre, Current Year Critical/Reader Reception, and Current Year Awards. Those four seem the big categories that determine (in this kind of measure) whether or not you’re a viable Nebula candidate.
In the next post, we’ll talk about how this data gets weighted and combined together.