Probabilities and the NSA Call Data Base
Well, there is lots of concern about the NSA amassing information about call records. There is lots of concern about privacy surrounding this practice, but seems to be little discussion of why the NSA is doing it.
Now I don’t know the exact reason for this, but I am going to put forward an educated guess. To show why the NSA might be doing this, I’ll use an analogy: junk e-mail, a.k.a. spam.
Most of us who have e-mail also have some sort of junk mail filter in place. Many of these filters use mathematical algorithm for determining the probability that an e-mail is “spam” or “ham”. How does this work? Well, it relies on Bayes’ Theorem (for more click),
Basically, what Bayes’ Theorem allows us to do is determine the probability of event A given B given that we know the likelihood of A given B–that is, L(A|B)–and the (prior) probability of A–that is, P(A). Notice that the likelihood of A given B is also the conditional probability of B given A.
For our Spam/Ham problem we want to know what is the probability of a given e-mail is Spam (A) given some information (B). In the case of the Bayesian e-mail filter B is examples of previous spam e-mails, how frequent the are and so forth.
These filters work, and they work very well. This is why now you’ll see spam e-mails with subject lines that read, “Yore % Cityb an ck CArD Con % tact Infor^^Ation NeEds Updat!ng”. Because of the effectiveness of the Bayesian spam filter a subject line like, “Your Citibank Card Information Needs Updating” would be flagged and dumped. So all the mis-spellings and special characters are the spammers’ attempt to get around these filters and get to the person who might be gullible enough to think that such an e-mail is legitimate.
However, for Bayesian filters to work well, they can require quite a bit of data–that is lots of both Ham and Spam in the case of our Bayesian e-mail filter. This isn’t a problem in that spammers send out hundreds of thousands of spam e-mails every hour. And most people who use e-mail frequently (and also the likely targets of spammers–I have an account at home I rarely use and it almost never gets spam) also get lots of “Ham” as well.
Now, how does this apply to the NSA? Well, it seems reasonable that the NSA would like to know who is the likely terrorist so that they can start listening in on that guy’s phone calls to see if there is some intelligence that they can pick up. But how to do that? Well, one way would to be to build a data base of phone calls in the U.S. and use that like the Bayesian e-mail filters would. The big difference would be that instead of looking for telemarketing phone calls, the NSA would be looking for people with really unpleasant intentions. Without that kind of data it becomes much harder to determine who is and who is not the likely terrorist.
Does this justify building a database of the calls that people make? I don’t know (the point of this post is to offer a reasonable guess as to why the NSA has compiled such a database). James has argued that while it is a violation of our civil liberties, it is a small one (like the violation at airports). Using such a database to try to catch terrorists and foil their plots does sound reasonable. However, there is the issue of “mission creep.” Will this data be used in the war on drugs and other criminal activities? This does seem to be a legitimate concern and it sure would be nice to know how the NSA plans on preventing such a thing. Based on news reports it looks like there is nothing to prevent mission creep.
The NSA told Qwest that other government agencies, including the FBI, CIA and DEA, also might have access to the database, the sources said. As a matter of practice, the NSA regularly shares its information — known as “product” in intelligence circles — with other intelligence groups. Even so, Qwest’s lawyers were troubled by the expansiveness of the NSA request, the sources said.
I think there is definitely reason for concern here.
And for all of the hype, there may not even be much “news” here. Last December 24, a few days after they spilled the beans about the NSA terrorist surveillance program, New York Times reporters Eric Lichtblau and James Risen disclosed how U.S. phone companies were helping the NSA by giving them “access to streams of domestic and international communications.”
While I agree that there is a sensationalistic tone to this story I think there are indeed legitimate concerns about privacy and the growth of government power.
The conservative line of defense seems to be: this is defending America, so leave it alone. Besides being somewhat of a non-sequitur it is also a bit of a strawman. I think one can want to defend America, but at the same time be concerned about expanding the power of government to intrude on our lives.
John Hinderaker is also probably wrong when he writes,
Two, it’s obvious that what the NSA does with this vast amount of data is to run it through computers, looking for suspicious patterns, especially involving known or suspected terrorist phone numbers. I did a quick calculation: assuming that there are 200 million adult Americans, each of whom places or receives ten phone calls a day (a conservative estimate, I think), it would require a small army of 35,000 full-time NSA employees to pay a total of one second of attention to each call. In other words, lighten up: the NSA obviously isn’t tracking your phone calls with your friends and relatives.
As noted above, when “looking for suspicious patterns” you need both the “ham” and the “spam”. So they will be tracking the phone calls to your friends and relatives…chances are though, they will be deemed spam (i.e. something the NSA doesn’t want). And his calculation is just another example of bad math. They don’t need to monitor each second of a phone call and the first part of the above paragraph points this out.