# 2015 Nebula Prediction: Indicators and Weighting

One last little housekeeping post before I post my prediction later today. Here are the 10 indicators I settled on using:

Indicator #1: Author has previously been nominated for a Nebula (78.57%)

Indicator #2: Author has previously been nominated for a Hugo (71.43%)

Indicator #3: Author has previously won a Nebula for Best Novel (42.86%)

Indicator #4: Has received at least 10 combined Hugo + Nebula noms (50.00%)

Indicator #5: Novel is science fiction (71.43%)

Indicator #6: Places on the Locus Recommended Reading List (92.86%)

Indicator #7: Places in the Goodreads Best of the Year Vote (100.00%)

Indicator #8: Places in the Top 10 on the Chaos Horizon SFF Critics Meta-List (100.00%)

Indicator #9: Receives a same-year Hugo nomination (64.29%)

Indicator #10: Nominated for at least one other major SFF award (71.43%)

I reworded Indicator #4 to make the math a little clearer. Otherwise, these are the same as in my Indicator posts, which you can get to by clicking on each link.

If you want to see how the model is built, checking out the “Building the Model” posts.

I’ve tossed around including a “Is not a sequel” indicator, but that would take some tinkering, and I don’t like to tinker at this point in the process.

The Indicators are then weighted according to how well they’ve worked in the pass. Here are the weights I’ve used this year:

Indicator #1: 8.07%

Indicator #2: 8.65%

Indicator #3: 13.78%

Indicator #4: 11.93%

Indicator #5: 10.66%

Indicator #6: 7.98%

Indicator #7: 7.80%

Indicator #8: 4.24%

Indicator #9: 16.54%

Indicator #10: 10.34%

Lots of math, I know, but I’m going to past the prediction shortly!

I’m a little confused by the difference between the accuracy of individual indicators in the past and the weight you’ve given to each in your model. For example, indicator 1 has been more accurate than 2 in the past, but the weight you’ve given 1 is lower than 2.

For Indicator #1, the accuracy % I use is (Number of Winners that Have Been Nominated for a Nebula In the Past 15 years / 15 years). The weighting is based on (Number of Winners that Have Been Nominated for a Nebula in A Given Year / Total Number of Nominees that Have Been Nominated for a Nebula That Year), averaged across the 15 years. Think about the first number as the base prediction, and the second number a “real-world” measure of how often the prediction works. This allows me to introduce more data into the model. Here’s how I calculate the weights: I weighted the model by measuring how accurate each indicator would be if we used that indicator—and only that indicator—to pick the Nebula. Those are then normalized against each other.

In other terms: so, while winners are more likely to have been nominated for the Nebula than the Hugo, more prior Nebula nominees are nominated for the Nebula than prior Hugo nominees. This dilutes the Nebula nomination effect slightly, and that’s what the weighting is trying to show. It helps not “overvalue” Indicators that are inconsistent, and it minimizes the effect of a single year where (let’s imagine) six prior Nebula nominees were nominated. In the mathematical model I use, you have a variety of choices for weighting. Here’s my old Weighting Post from 2014 that explains a little more.

How are the weights computed? As a math professor, I am interested in the specific mathematical technique used. It’s my understanding you are trying to pick the weights by learning from their impact on previous predictions…

The weights are trying to represent the “reliability” of each opinion in the Linear Opinion Pool. To do that, I calculated the percentage chance based on the last 15 years of data as to whether you would be right using that Indicator alone. Here’s the calculation for Indicator #1 (Had previously been nominated for a Nebula Award):

2014: 0% chance of getting it right using Indicator #1 alone: Leckie hadn’t been nominated for a Nebula before. So if you went with Indicator #1, you’d always be wrong.

2013: 25% chance of getting right using Indicator #1 alone: Robinson wins and had been previously been for a Nebula, but Ahmed, Kowal, and Jemisin had also been nominated before. 1 out of 4 is where I get the 25% from.

2012: 25% chance of getting it right using Indicator #1 alone: Walton wins and had previously been nominated for a Nebula, but so had Mieville, McDevitt, and Jemisin

2011: 25% chance of getting it right using Indicator #1 alone: Willis wins and had previously been nominated for a Nebula, but so had McDevitt, Jemisin, and Hobson

2010: 33% chance of getting it right using Indicator #1 alone: Bacigalupi wins and had previously been nominated for a Nebula, but so had Barzak and Mieville

And so on. The percentage for the other years are:

2009: 20%

2008: 0%

2007: 25%

2006: 33%

2005: 20%

2004: 20%

2003: 25%

2002: 16.67%

2001: 25%

That gives us an average of 20.9% Since these are real world results (not dice rolls, for instance), that would show how often you’d be right if you placed all your chips on Indicator #1. Since a random choice would give you a 1/6 chance (16.67%), this allows me to check that my indicator is at least decent. 20.9% isn’t a huge gain over 16.67%, but it is a gain.

I then generate this percentage for each of the 10 indicators using the same methodology. Then I normalize them to 100%. In the case of Indicator #1, that left us 8.07%.

The hand-wavy part of the Linear Opinion Pool model I use is that the weights are up to the discretion of the model builder. I think you could come up with other equally logical ways to weight the system. The test of a model, of course, is not whether it is pretty mathematically, but whether it works.