Math, Damned Math, and Statistics
Jeff Leek, a biostatistics professor at Johns Hopkins, explains how it is that President Obama can have a 75 percent chance of winning an election despite being essentially tied in the polls:
Let’s pretend, just to make the example really simple, that if Obama gets greater than 50% of the vote, he will win the election. Obviously, Silver doesn’t ignore the electoral college and all the other complications, but it makes our example simpler. Then assume that based on averaging a bunch of polls we estimate that Obama is likely to get about 50.5% of the vote.
Now, we want to know what is the “percent chance” Obama will win, taking into account what we know. So let’s run a bunch of “simulated elections” where on average Obama gets 50.5% of the vote, but there is variability because we don’t have the exact number. Since we have a bunch of polls and we averaged them, we can get an estimate for how variable the 50.5% number is. The usual measure of variance is the standard deviation. Say we get a standard deviation of 1% for our estimate. That would be a pretty accurate number, but not totally unreasonable given the amount of polling data out there.
We can run 1,000 simulated elections like this in R* (a free software programming language, if you don’t know R, may I suggest Roger’s Computing for Data Analysis class?). Here is the code to do that. The last line of code calculates the percent of times, in our 1,000 simulated elections, that Obama wins. This is the number that Nate would report on his site. When I run the code, I get an Obama win 68% of the time (Obama gets greater than 50% of the vote). But if you run it again that number will vary a little, since we simulated elections.
A few things to note here:
1. While statisticians will run multiple simulations to arrive at their best estimate, we’ll run only one actual election. So, there’s a 1 in 4 chance of Romney winning even if all the polling going into these estimates is accurate.
2. We assume the polling is accurate because it has been for decades and it’s all we have to go on. But several trends (the rise of cell-only households, the advancement of call screening, the proliferation of polls and resulting onset of poll fatigues, etc.) could ultimately undermine the current model. We won’t know that until after the fact.
3. The Electoral College makes forecasting more tricky than it would be in the simplified example Leek points to. There are multiple states where the polling is close and the polling is less reliable at the state level.
via Chris Lawrence’s Facebook stream