Election Polling Works

No Senate candidate with a lead of more than 5.5 points in the polling average, with 30 days to go in the race, has lost his race since 1998: these candidates are 68-0.

James Joyner · Friday, October 1, 2010 · 10 comments

Nate Silver continues his series extolling the virtues of polling averages as predictors of outcomes with some really impressive data:

I have a database containing almost all polls conducted in all U.S. Senate and gubernatorial races since 1998. I say almost all because it excludes internal polls released by campaigns and other explicitly partisan groups, and it excludes Internet polls conducted by Zogby Interactive, which in my view are not scientific. My database also does not include polls for irregularly-scheduled special elections, like the one in Massachusetts earlier this year — only contests in November.

[…]

Let’s construct about the simplest possible study around this:

Step 1. Take all polls conducted between 30 and 60 days from the election.

Step 2. Average them together.

That’s it. We’re not doing any of the fancy stuff that we do in our actual Senate model, like weighting the polls based on sample size or the quality of the pollster. We’re just taking a simple average.

There is one “trick”, though: we’re only looking at races in which at least two different polling firms published a survey in the 30-to-60 day window. If you have just one company polling a race, you don’t really have much of an average, properly speaking.

Doing this simple test, he derives the following table:

This is truly stark: ”[N]o Senate candidate with a lead of more than 5.5 points in the polling average, with 30 days to go in the race, has lost his race since 1998: these candidates are 68-0.” The accuracy is slightly less in gubernatorial contents, as two candidate with 6-9 point margins and one with a 9-12 point margin has lost. Still, the overall record for gubernatorial candidates with 6 or point leads in the polls has been an amazing 124 for 127, or 97.3%.

Even granting that these numbers are grossly inflated by the fact that many contests are laughers, with polling margins of 15 points or more, Silver’s point is strong: Simply averaging reputable pre-election polls gives a tremendous snapshot of who will go on to win.

This is especially impressive when one factors in two things. First, a lot of these polls are simply of “registered voters” or even “adults.” Polls that apply a “likely voter” screen are much more accurate. Second, a lot can theoretically happen in the last 30 days of a campaign. Scandals, gaffes, debates, advertising, and so forth are magnified late in the cycle. And, yet, candidates in statewide races win almost every race when they lead by 6 points or more a month out.

Comments

Wayne says:

Friday, 1 October 2010 at 12:56

Since 1998 there weren’t that many elections. When looking for something unlikely to happen, you shouldn’t look at what is typical of past events or past records but what is different from the past that could make it happen.

For example looking at what is typical of past events or past records especially since 1998, the GOP would stand at no chance of making the gains that are predicted. Will they accomplish those gains? Only time will tell, especially since many so call “Republicans” are trying to tear down Tea party back candidates instead of going after the Dem candidates.

Anyway, this reminds me a little of sportscasters who constantly come up with little facts that have little to do with predicting the game outcome but sounds good. Something like this made up line of “Cowboys never lost on Thanksgiving against a team of losing record etc” then they lose and the next time they come up with something else.
James Joyner says:

Friday, 1 October 2010 at 13:12

This is a rather large dataset. Going much beyond 1998 and you get into a truly different era in terms of media.
Wayne says:

Friday, 1 October 2010 at 13:45

A great deal of data doesn’t necessary make it accurate or relevant. Overall we are talking what 5 elections. So we are taking roughly 170 senate races. . A good % of which were probably large blowouts or close races.

Anyway yes someone behind a good ways is likely to lose. However I would caution to use data from last 5 elections to predict the outcomes of this one. Using those standards, Castle would have won and won big time.

IMO much will be determine on who turns out. It sounds like Obama has the same opinion. If the primaries are any indication then I can see some Republicans who are 6 to 12% point behind winning. However IMO the general election will be closer. I won’t bet my house either way though.
James Joyner says:

Friday, 1 October 2010 at 14:23

However I would caution to use data from last 5 elections to predict the outcomes of this one. Using those standards, Castle would have won and won big time.

Silver has been up front that it’s much, much harder to predict hyper-low-turnout races like primaries.
Wayne says:

Friday, 1 October 2010 at 15:09

Re “hyper-low-turnout races like primaries”

I suppose if there are upsets, Silver and you will come back and say” it is hard to predict hyper-low-turnout races like mid-terms”?
This Guy says:

Friday, 1 October 2010 at 18:40

Great, but who’s knocking an averaging system? Duh, Averaging lowers influence of outliers. Duh, larger data set is — duh — more reliable.
James Joyner says:

Friday, 1 October 2010 at 22:09

Great, but who’s knocking an averaging system? Duh, Averaging lowers influence of outliers. Duh, larger data set is — duh — more reliable.

He’s defending polling per se. There are an inordinate number of commentators, including those who get network air time to bloviate, who think “anything can happen” and that polls are much, much less accurate than they are in aggregate.
Steven L. Taylor says:

Saturday, 2 October 2010 at 10:33

I suppose if there are upsets, Silver and you will come back and say” it is hard to predict hyper-low-turnout races like mid-terms”?

Except there is absolutely no comparison between even a low turn-out statewide race (e.g., for Senator of Governor) and primary turn-out, especially in midterm years.

In raw terms there were 57,584 voters in the GOP primary. There were 242,947 total voters in the last midterm (2006). This a very different universe of voters from which to sample and even with lower turnout in midterms than in presidential years, polling is easier and far more accurate than in primaries (which frequently aren’t polled for just that reason).

Indeed, I am not entirely sure what your point is save that perhaps you are hoping for some upsets (which is fine, but I am not sure what they has to do with the data as presented).
Wayne says:

Saturday, 2 October 2010 at 19:12

Re “Duh, larger data set is — duh — more reliable”

Not necessarily. If you have three students that did accurate, quality controlled experiments, their data set would be much more reliable then including 100 students sloppily and haphazardly done experiment data. That is only one instance where a larger dataset is not more reliable.

I would like to see data set over a larger time frame then “since 1998” with reliable pollster than simply including a whole bunch of pollsters. However as I suspect James already knows you can run into problems because of different polling techniques use over time.
Wayne says:

Saturday, 2 October 2010 at 19:26

Steven
My point is it has been an unusual year so far. Using typical tendencies in a untypical situation is chancy at best.

Many people were confident before many of the primaries this year. They were wrong and made excuses after the fact. Many are once again sounding off confident. I was just curious if they are wrong once again will they use the same type of excuses.

Yes as always I am hoping for some upsets going my way. Who doesn’t? However I am not confident either way. Most years I would be pretty confident against upsets but as I have said this year has been different. I just shake my head at those who refuse to acknowledge that this election has not been typical and will likely stay that way.