More on this “Stolen” Election
One of the things that many people seem to be hanging their hat on in terms of the “Bush Stole the Election” meme is the exit polling. Early in the day exit poll leaks showed that Kerry had some pretty substantial leads in key battleground states. People then look at the margin of error (MOE for short) and then try to do some back-ass-ward calculation to come up with some completely stupid “probability” that Bush stole the election.
I’m sorry to do this to all you budding probability theorists but the above kind of stuff is just plain and simple crap. These exit polls rely on a view of probability and statistics known as Frequentist. What does that mean? It means you rely on view of probability that rests on the frequency of an event happening over a large number of trials (if you are guessing this isn’t the only view of probability you are quite right). The first immediate problem is that the Frequentist approach really cannot be used for one-time events. Sporting events is a good example. The weather, that yearÃ¢€™s team, the injured players for that game, etc. all play a role in determining the outcome. Hence you cannot have repeated trials of the same game over and over to get some sort of handle on the probabilities of which team wins. Now it is not a huge leap from the sporting event example to elections. Elections are one-time events. We cannot go back in time and re-run the election; never mind making sure that if we could go back in time that everything happened exactly like it did the first time through. So why do people use Frequentist statistics on elections? It is easy. Just about every canned software package out there basically has built in Frequentist techniques and almost totally ignores other approaches to statistics (e.g., the Bayesian approach which does not have the above problem with assigning probabilities to one time events).
Now why do all canned statistics packages have Frequentist techniques? Because the technique is always the same. There is nothing different if you apply linear regression analysis to agricultural data than if you are applying it to astrophysics. The techniques are always the same. Always. The nice thing (and in my view also the bad thing) about Frequentist statistics is that it is mechanistic. Basically it provides a very nice way for people with little or no grasp of statistics to crunch numbers and get a Result. That magic thing…the Result. What is even better is if the Result is Statistically Significant. With this you have something Important. Now this is nice if you know about statistics, are doing research, and want to not be bothered always writing up code to crunch your numbers. The problem is that it also means any Joe Schmoe can come along dump some numbers in there and crunch away. Who cares if it is valid to use that technique, the data is messy, etc. Crunch, crunch, crunch.
So wherein lies the danger with all of this? Lets suppose we have two candidates for President. The polling data shows that one candidate is leading. The polls have a 95% confidence level attached to them. Wow, that candidate is going to win. After all the Result is Statistically Significant, right? Well two days later it breaks that the candidate is also a pedophile and has a 9-year-old child as a lover (this example has been shamelessly stolen from Dierdre McCloskey). Whoops, the trailing candidate now wins by a landslide. But, but, but…those polls. Why the trailing candidate must have STOLEN THE ELECTION!
No, no and no!. This extreme example highlights one of the features about Frequentist statistics that is almost never…ever mentioned by pollsters. The level of confidence or MOE is a pre-experimental measure. What does that mean? It means that the confidence for Frequentist results is due to the performance of the technique in repeated trials. That is prior to actually observing the data there is 95% probability that the technique will capture the true value of the parameter of interest (POI). However, once the data is observed the probability degenerates to the trivial case, either 0 or 1 (100%). So those exit polls were based on the notion that 95% of the time they work, but once we have the data in hand either they are right or they are wrong. Further, there is nothing random with Frequentist results once the data has been obtained. All components of confidence intervals, estimates, etc. are constants. I normally wouldn’t point out that a constant is not random, but I think in this case it has to be pointed out: a constant is not random.
Another problem with Frequentist statistics is the properties that are attached to the parameters that being estimated. For example, in the case of an election we want to know the probability that Kerry (or Bush) is going to win. The Frequentist view holds that this is a fixed constant that exists….somewhere, and that by sampling we can make inferences about this unobservable fixed constant (feeling a little uncomfortable about this notion…well hey, that’s your problem I personally take the Bayesian view). As the example above shows, this isn’t always the case. Sure you could try to get around this problem by saying, “Well, sure the probability isn’t fixed for all time, but it is a fixed constant at each instant, but moves over time and any poll is simply an inference about that fixed constant at the moment in time.” But this actually undermines the view that early exit polling meant something 12 hours later when the polls were closed and it was looking very much like Bush was going to win. I bet if we sent Zogby and all the others back to their phones and they did another poll the results would look much different.
The bottom line is that you have to be very, very careful when using Frequentist results. Using the results of Frequentist analysis to try and gin up some sort of probability about the election being stolen is difficult at best. First is the problem that the only probability associated with those results are trivial (0 or 1). Second, is that even a statistically significant result at one point in time does not have to mean that result is always statistically significant. Third is that, strictly speaking, we shouldn’t be using Frequentist measures for things like elections. So, to all you wannabe statistical investigators, budding probability theorists, and whackos who think that Bush stole the election and you can prove it with exit polling…do yourself a favor. Buy a textbook on probability theory and statistics, preferably one that provides a side-by-side discussion of both Frequentist and Bayesian approaches. At least you wont look so Goddamned incoherent.