A Note on 2016 Polling
Because sometimes a comment starts to become a post.
In a post about Joe Biden’s current poll numbers, I responded to a comment which called into question that 2016 polling. After all, the polls said that Hillary Clinton was ahead, and she lost. Ergo: polling sucks! Or, less dramatically, the question arises as to why we should trust polling now when it got it oh so very wrong that last go-round.
The point is, however, that the polling didn’t get it all that wrong in 2016. Indeed, let’s look at the actual numbers.
The official final numbers for the 2016 election are as follows:
- HRC: 48.18%
- DJT: 46.09%
So, HRC won the popular vote by 2.09%
Now, the final RCP polling average was
- HRC: 46.8%
- DJT: 43.6%
That suggested HRC winning the popular vote by 3.2%
The difference between the final total and the polling is 1.11 percentage points. That is pretty darn close. It is anything but a failure of polling. The polling called the popular vote winner and got the numbers quite close. Note, too, that the RCP average in 2012 had more of a gap between polling and final results than in 2016, but no one wails about how bad the polling was, because the guy we all expected to win, won. We have to remind ourselves how much we allow expectations to cloud our understanding of numbers.
So, the polling in 2016 was pretty good–it was how the polling was used by some analysts that was the problem (linked, too, with media and the public doing a poor job of understanding what they were looking at). Throw in the fact that most people still seem not to really understand the Electoral College (they think it is a magic formula created by The Founders instead of an allocation rule that distorts the outcome) and you get a narrative that the polling was waaay off.
And yes, there were predictive models by Sam Wang and Natalie Jackson at HuffPo which predicted an HRC with > 98% confidence that HRC would win. (As compared to Nate Silver’s model that gave HRC a 71.4% chance to Trump’s 28.6% chance).
Look, if you are taking a bet, you’d rather have 7 in 10 odds than 3 in 10, but an almost 30% chance of something is still a really good chance that it happens. If there is a 30% chance of rain, it is not unreasonable to take an umbrella. And even if you take the 7 in 10 bet you can still lose (would you bet your annual salary on 7 in 10 odds?).
As I noted in the comments, a .286 batting average is quite good in the MLB.
To be clear as to why I commented in the first place, and why I am pushing back: it is simply wrong to say that the polls were wrong in 2016. Yes, Wang’s prediction was grossly wrong, as was Natalie Jackson’s at HuffPo. But if the lesson people think they learned from 2016 is that we should distrust the polls, (and therefore we should dismiss polls now) they learned the wrong lesson.
And look, I thought HRC was going to win—it was the more probable outcome going into election night and, again, she did win the popular vote. And, yes, I was surprised (more than I should have been) by the Trump wins in MI, WI, and PA (especially PA). And, yes, polling in those states was inadequate.
The Wisconsin polling was way off (HRC +6.5%, but Trump winning 0.7%) but it stopped also several days before the race (i.e, it was under polled). PA was closer (HRC +1.9% with Trump winning by 0.7%). Michigan was between the two, with HRC’s advantage in the average being 3.4%, but with Trump winning 0.3%. It is worth noting that the last poll noted in the RCP average in Michigan had Trump at +3.4%.
The general lessons here, therefore, are as follows:
- The polling itself, especially the national polling, in 2016 was pretty good (despite media narratives that were reinforced by the emotions of media consumers, whether it was elation at Trump winning when he seemed doomed or despair because Trump won when he seemed doomed).
- State-level polling is more problematic than national polling. It is expensive and less polling is done in some states (although I can guarantee a lot of polling dollars will flow into MI, PA, and WI this year).
- In general, the broader population (often including reporters) don’t understand basic polling issues (to include basic issues like margin of error) and they very frequently fail to grasp probability (see, e.g., the thriving lottery industry in this country).
- Models and predictions can create false understanding.
- Nate Silver has the best track record out there to this point over several electoral cycles (but he isn’t a wizard or soothsayer).
- Sam Wang and Natalie Jackson at HuffPo blew it in 2016. That doesn’t mean that the polling in 2016 failed. It means that they screwed up their applications of the polling data.
Coincidentally, I noted the following in my Twitter feed yesterday:
Now, am I saying that because Biden has good numbers now that he is going to win? No, not in the least. I do think that there are some good signs in the numbers for him (such as his ability to hit 50% and the general stability of his lead). It is also true that there is a reasonable ability to infer EC outcomes from national numbers when the margins are large enough.
Still, I expect even more state-level polling in 2020 than we saw in 2016 and there is still a lot of time to go.
I am saying that it is likely, however, that pollsters will be able to accurately gauge, within a reasonable margin of error, what public support is for the candidates, just as they did in 2016.