Probabilities and the NSA Call Data Base

Steve Verdon · Thursday, May 11, 2006 · 7 comments

Well, there is lots of concern about the NSA amassing information about call records. There is lots of concern about privacy surrounding this practice, but seems to be little discussion of why the NSA is doing it.

Now I don’t know the exact reason for this, but I am going to put forward an educated guess. To show why the NSA might be doing this, I’ll use an analogy: junk e-mail, a.k.a. spam.

Most of us who have e-mail also have some sort of junk mail filter in place. Many of these filters use mathematical algorithm for determining the probability that an e-mail is “spam” or “ham”. How does this work? Well, it relies on Bayes’ Theorem (for more click),

Basically, what Bayes’ Theorem allows us to do is determine the probability of event A given B given that we know the likelihood of A given B–that is, L(A|B)–and the (prior) probability of A–that is, P(A). Notice that the likelihood of A given B is also the conditional probability of B given A.

For our Spam/Ham problem we want to know what is the probability of a given e-mail is Spam (A) given some information (B). In the case of the Bayesian e-mail filter B is examples of previous spam e-mails, how frequent the are and so forth.

These filters work, and they work very well. This is why now you’ll see spam e-mails with subject lines that read, “Yore % Cityb an ck CArD Con % tact Infor^^Ation NeEds Updat!ng”. Because of the effectiveness of the Bayesian spam filter a subject line like, “Your Citibank Card Information Needs Updating” would be flagged and dumped. So all the mis-spellings and special characters are the spammers’ attempt to get around these filters and get to the person who might be gullible enough to think that such an e-mail is legitimate.

However, for Bayesian filters to work well, they can require quite a bit of data–that is lots of both Ham and Spam in the case of our Bayesian e-mail filter. This isn’t a problem in that spammers send out hundreds of thousands of spam e-mails every hour. And most people who use e-mail frequently (and also the likely targets of spammers–I have an account at home I rarely use and it almost never gets spam) also get lots of “Ham” as well.

Now, how does this apply to the NSA? Well, it seems reasonable that the NSA would like to know who is the likely terrorist so that they can start listening in on that guy’s phone calls to see if there is some intelligence that they can pick up. But how to do that? Well, one way would to be to build a data base of phone calls in the U.S. and use that like the Bayesian e-mail filters would. The big difference would be that instead of looking for telemarketing phone calls, the NSA would be looking for people with really unpleasant intentions. Without that kind of data it becomes much harder to determine who is and who is not the likely terrorist.

Does this justify building a database of the calls that people make? I don’t know (the point of this post is to offer a reasonable guess as to why the NSA has compiled such a database). James has argued that while it is a violation of our civil liberties, it is a small one (like the violation at airports). Using such a database to try to catch terrorists and foil their plots does sound reasonable. However, there is the issue of “mission creep.” Will this data be used in the war on drugs and other criminal activities? This does seem to be a legitimate concern and it sure would be nice to know how the NSA plans on preventing such a thing. Based on news reports it looks like there is nothing to prevent mission creep.

The NSA told Qwest that other government agencies, including the FBI, CIA and DEA, also might have access to the database, the sources said. As a matter of practice, the NSA regularly shares its information — known as “product” in intelligence circles — with other intelligence groups. Even so, Qwest’s lawyers were troubled by the expansiveness of the NSA request, the sources said.

I think there is definitely reason for concern here.

Update: Via Instapundit–Apparently this is old news.

And for all of the hype, there may not even be much “news” here. Last December 24, a few days after they spilled the beans about the NSA terrorist surveillance program, New York Times reporters Eric Lichtblau and James Risen disclosed how U.S. phone companies were helping the NSA by giving them “access to streams of domestic and international communications.”

While I agree that there is a sensationalistic tone to this story I think there are indeed legitimate concerns about privacy and the growth of government power.

The conservative line of defense seems to be: this is defending America, so leave it alone. Besides being somewhat of a non-sequitur it is also a bit of a strawman. I think one can want to defend America, but at the same time be concerned about expanding the power of government to intrude on our lives.

John Hinderaker is also probably wrong when he writes,

Two, it’s obvious that what the NSA does with this vast amount of data is to run it through computers, looking for suspicious patterns, especially involving known or suspected terrorist phone numbers. I did a quick calculation: assuming that there are 200 million adult Americans, each of whom places or receives ten phone calls a day (a conservative estimate, I think), it would require a small army of 35,000 full-time NSA employees to pay a total of one second of attention to each call. In other words, lighten up: the NSA obviously isn’t tracking your phone calls with your friends and relatives.

As noted above, when “looking for suspicious patterns” you need both the “ham” and the “spam”. So they will be tracking the phone calls to your friends and relatives…chances are though, they will be deemed spam (i.e. something the NSA doesn’t want). And his calculation is just another example of bad math. They don’t need to monitor each second of a phone call and the first part of the above paragraph points this out.

Comments

Jimmy says:

Thursday, 11 May 2006 at 14:21

You mean Plame may have accessed the DIA database and found that all her retired CIA pals who started the 500 things and, maybe, a Senator who warned us alot about the dirty bomb were all monitored? It’s a nice way to point out she should not have gone bad, but too good to be true.

NSA ALWAYS monitored all communications. The phone companies can’t be in trouble, it’s necessary for the country. DOD would never ask for a warrant, it’s not needed.

Did the Al that is employed at CIA that Goss admitted to use the records or was it another bad agent at CIA?
yetanotherjohn says:

Thursday, 11 May 2006 at 14:41

There in lies the perpetual question of who will watch the watchers. We have much more sensitive information about us in commercial files (e.g. credit card records, doctor records, movie rental records, prescription records) and in government files (e.g. IRS financial records, legal records, SSA records, medicare records). All of these are subject to abuse. If you look at the echelon project begun under Clinton is much more intrusive since it reaches much further than what number called which number and extends into what was said in the conversation.

There are several ways to handle things. You could take an ISO approach of saying what you are going to do, doing what you say and be able to prove it with records. The problem with that is like the spammers who would love to know the details behind the formula, there are those who would like to take the information you would disclose to avoid the net. You can appoint watch dogs, but recent leaks in the press has shown that trusting mid level bureaucrats to keep a secret may not be the safest bet.

It boils down to pulling on one or both of two levers. Either increasing the chance of catching some one doing what they shouldn’t or increasing the penalty so that the reward is not worth the risk. I would fully support a death penalty for government officials abusing their position. That would increase the risk. I have a brother working for one of the large government entities. I asked him to check the data on me. He said he couldn’t as the computers monitor who gets access to the files and it has to match the cases assigned. I would be amazed if the NSA didn’t have similar monitoring capabilities to prevent private use of the data. As for official mission creep, that is the job of the congressional over sight committees.

I personally prefer the idea of holding the individual responsible vs throwing out a program based on it might be abused.
anjin-san says:

Thursday, 11 May 2006 at 14:47

The only problem with congressional oversight is that Bush has made it pretty clear that he only recognizes the authority of congress when it suits him to do so.
Toeaz says:

Thursday, 11 May 2006 at 15:50

‘We have much more sensitive information about us in commercial files’

This is traditional excuse.

How can we hold an individual responsible when we see Plame complete a retirement party years in the making? DOJ will no longer hold persons responsable.
Roger says:

Friday, 12 May 2006 at 06:37

The govt. has been “creeping” towards that “much more sensitive information” for some time now, all without warrants. Given Bushco’s belief that the law does apply to them, who’s to say they’re not already looking over everything you do? What’s to stop them?
searp says:

Friday, 12 May 2006 at 15:22

It would be more effective yet to encode the content of the phone call into the search. More information is better.

The potential for abuse is simply too high for a program of this type. They should monitor Syrian phone records or something.
Roger says:

Monday, 15 May 2006 at 22:22

That’s a great idea searp. They might actually get info regarding terrorism and they wouldn’t even have to break any US laws.