# 2014 Nebula Prediction: Methodology and Math

To predict future Nebula wins on past Nebula patterns, let’s note several things:

1. We don’t have much data: The Nebula does not release voting numbers (the Hugo does), so we don’t have any sense of who finished second versus sixth. This makes it harder to predict.

2. The data is unreliable: Nebula voting patterns have changed a great deal over the past 20-30 years. While the Nebula goes back to 1966, can we really use what happened in the 1970s–back before Fantasy was part of the SFWA, and the whole field of SF was much smaller and more insular–to predict an award in 2014?

3. The Nebula is erratic: There have been some rather unpredictable Nebula awards in the past 15 years, with the award going to lesser known books like *The Quantum Rose *or *The Speed of Dark*. Good for them, but bad for a predictive model.

So, what does that mean? It means that we don’t have enough reliable information to run some of the more complex statistical models (i.e. the kind of Bayesian models made popular by people like Nate Silver).

I looked over several possible statistical modeling methodologies, and I decided, at least for the purpose of this blog, that a **Linear Opinion Pool **makes the most sense. Frequently used in risk management, a Linear Opinion Pool is a way of aggregating expert opinions (predictions) and then weighting them to come up with a predictive average. Crudely, here’s what it looks like:

Final % of a Book Winning the Award = Expert #1’s % Guess * Weight for Expert #1 + Expert #2’s % Guess * Weight for Expert #2 + Expert #3’s % Guess * Weight for Expert #3, and so on.

Here, the Weight is a measure of how reliable we think each expert is, and all those weights have to add up to 1.

Bored? It gets worse. Analogy time! Imagine a game of dice with a standard six sided dice. If we believe the dice aren’t fixed, then the % for any given number 1-6 is the same, a simple 16.67%. That would be the chance of correctly picking the Nebula winner if we just drew the name out of a hat.

Now, we know that the Nebula dice are weighted–certain numbers have a better chance of coming up than others. This is where are experts come in. Let’s say one expert watches the game for a while, and then decides that the dice are heavily weighted in favor of the numbers 1 and 2. He then draws up a nice table for us:

Chance of Rolling a #1: 20%

Chance of Rolling a #2: 20%

Chance of Rolling a #3: 15%

Chance of Rolling a #4: 15%

Chance of Rolling a #5: 15%

Chance of Rolling a #5: 15%

We have three things we could do here. We could roll the dice a bunch of times and then check to see if he was correct. We can only do that, though, if we have access to the dice. Second, we could trust the expert, and use his percentages to make a bet. Lastly, we could reject the expert, and go back to our simple 16.7% equal chances.

What if we have more than one expert? This new expert thinks the game is more heavily fixed towards rolling a #1. His probability chart comes out like this:

Chance of Rolling a #1: 25%

Chance of Rolling a #2: 15%

Chance of Rolling a #3: 15%

Chance of Rolling a #4: 15%

Chance of Rolling a #5: 15%

Chance of Rolling a #6: 15%

Who do we trust? If we trusted them equally, we could average the two probabilities to come up with a 22.5% chance of rolling a 1. A Linear Opinion Pool allows us, though, to trust one expert more than another. So let’s say I trust Expert #1 90% and Expert #2 10% because Expert #2 is drunk, I could then compute an average of .9*20+.1*25 = 20.5% chance of rolling a 1.

This is the model I’m going to use to predict the Nebula, with 14 different “Experts” providing probability tables from data mining previous award results and predictors.

While I can get more into the math if anyone wants, let’s run down the advantages of a Linear Opinion Pool:

1. The math is easy to understand and transparent, unlike more complex statistical measures.

2. The formula can be easily tweaked by changing the weights for the experts.

3. Unreliable data can be tossed out more easily because we can see if parts of the formula are messing everything up.

4. Expert probabilities don’t rely on the same kind of enormous data sets other statistical models do.

5. Since the Nebula award is inherently erratic, this doesn’t hypothesize the process is more predictable than it actually is.

So, the next steps are to start building Probability Tables based on past Nebula results. After that, we have to weight them. Then, we finalize the prediction. Onward!