The N.S.A.'s Math Problem

The N.S.A.’s Math Problem

James Joyner · Tuesday, May 16, 2006 · 12 comments

Jonathan David Farley, a science fellow at the Center for International Security and Cooperation at Stanford, argues that the NSA’s scanning of phone records “probably isn’t worth infringing our civil liberties for — because it’s very unlikely that the type of information one can glean from it will help us win the war on terrorism.” The reason is mathematical:

If the program is along the lines described by USA Today — with the security agency receiving complete lists of who called whom from each of the phone companies — the object is probably to collect data and draw a chart, with dots or “nodes” representing individuals and lines between nodes if one person has called another.

Mathematicians who work with pictures like this are called graph theorists, and there is an entire academic field, social network analysis, that tries to determine information about a group from such a chart, like who the key players are or who the cell leaders might be. But without additional data, its reach is limited: as any mathematician will admit, even when you know everyone in the graph is a terrorist, it doesn’t directly portray information about the order or hierarchy of the cell. Social network researchers look instead for graph features like “centrality”: they try to identify nodes that are connected to a lot of other nodes, like spokes around the hub of a bicycle wheel. But this isn’t as helpful as you might imagine. First, the “central player” — the person with the most spokes — might not be as important as the hub metaphor suggests. For example, Jafar Adibi, an information scientist at the University of Southern California, analyzed e-mail traffic among Enron employees before the company collapsed. He found that if you naÃ¯vely analyzed the resulting graph, you could conclude that one of the “central” players was Ken Lay’s … secretary.

Of course, even with his superior understanding of pattern analysis, Farley suffers the same disadvantage as all of us on the outside of this program: having to make guesses on the improbable basis that the USA Today reportage of this classified program is accuate. That is incredibly unlikely. Indeed, considering that I have never attended an event and not found glaring errors in how that event was covered in the press, it is simply inconceivable to me that reporters are going to get it right when covering things that classified and incredibly complicated through the lens of leaking malcontents driven by an unknown agenda.

The remainder of Farley’s analysis is clever but rather odd:

In addition, the National Security Agency’s entire spying program seems to be based on a false assumption: that you can work out who might be a terrorist based on calling patterns. While I agree that anyone calling 1-800-ALQAEDA is probably a terrorist, in less obvious situations guilt by association is not just bad law, it’s bad mathematics, for two reasons.

The simplest reason is that we’re all connected. Not in the Haight-Ashbury/Timothy Leary/late-period Beatles kind of way, but in the sense of the Kevin Bacon game. The sociologist Stanley Milgram made this clear in the 1960’s when he took pairs of people unknown to each other, separated by a continent, and asked one of the pair to send a package to the other — but only by passing the package to a person he knew, who could then send the package only to someone he knew, and so on. On average, it took only six mailings — the famous six degrees of separation — for the package to reach its intended destination.

Looked at this way, President Bush is only a few steps away from Osama bin Laden (in the 1970’s he ran a company partly financed by the American representative for one of the Qaeda leader’s brothers). And terrorist hermits like the Unabomber are connected to only a very few people. So much for finding the guilty by association.

But the intent of the program–combining the USA Today report and a modicum of common sense–is not guilt by association but rather to find clues. Only a small fraction of the people a terrorist makes contact with are fellow terrorists; that’s a given. Still, one is more likely to find other terrorists on the call list of terrorists than on the call lists of non-terrorists, right? Lawyers, doctors, schoolteachers, and bloggers are more apt to network with their fellows than those outside those loops. Wouldn’t the same be true of terrorists?

A second problem with the spy agency’s apparent methodology lies in the way terrorist groups operate and what scientists call the “strength of weak ties.” As the military scientist Robert Spulak has described it to me, you might not see your college roommate for 10 years, but if he were to call you up and ask to stay in your apartment, you’d let him. This is the principle under which sleeper cells operate: there is no communication for years. Thus for the most dangerous threats, the links between nodes that the agency is looking for simply might not exist.

Then again, they might. Indeed, the fact that rooting out terrorist networks is hard is the reason for resorting to extraordinary means. If, for example, the American Jihadist Terrorist Association held an annual meeting and published a directory, it would be silly to spend a lot of time having computers scanning phone records–we’d just stake out the convention center.

If our intelligence agencies are determined to use mathematics in rooting out terrorists, they may consider a profiling technique called formal concept analysis, a branch of lattice theory. The idea, in a nutshell, is that people who share many of the same characteristics are grouped together as one node, and links between nodes in this picture — called a “concept lattice” — indicate that all the members of a certain subgroup, with certain attributes, must also have other attributes.

For formal concept analysis to be helpful, you need much more than phone records. For instance, you might group together people based on what cafes, bookstores and mosques they visit, and then find out that all the people who go to a certain cafe also attend the same mosque (but maybe not vice versa).

Well, no kidding. You think our intelligence agencies don’t know that? You think they’re not following those kind of leads? But, again, terrorists who are sufficiently high on the food chain as to be key intelligence targets are likely to avoid hanging out in the same cafe every day with their terrorist friends.

This is because, as Kennedy and Lincoln assassination buffs know, two people can be a lot alike without being the same person. Even if there is only a 1 in 150 million chance that someone might share the profile of a terrorist suspect, it still means that, in a country the size of the United States, two people might share that profile. One might be a terrorist, or he might be Cat Stevens.

Right. But, then, you’ve narrowed the field to 1 in 2 rather than x out of 300,000 million. That’s a good thing, right? The NSA isn’t taking people who fit a pattern into custody and shooting them. They’re investigating them more closely. Cat Stevens is, last I checked, at large.

This isn’t to say that mathematicians are useless in fighting terrorism. In September 2004 — 10 months before the bombing of the London Underground — Gordon Woo, a mathematician and risk-assessment consultant, gave a speech warning that London was a hotbed of jihadist radicalism. But Dr. Woo didn’t anticipate violence just using math; he also used his knowledge of London neighborhoods. That’s what law enforcement should have been doing then and should be doing now: using some common sense and knowledge of terrorists, not playing math games.

Again, just because you read about one particular program in a newspaper does not mean you understand the entire scope of U.S. counterterror ops. Indeed, I read everything I can get my hands (or a computer mouse) on and have no clue about the vast preponderance of it. That’s the nature of highly complicated, secret things.

Math is just a tool.

So, it seems, are some mathematicians.

Comments

Russell Newquist says:

Tuesday, 16 May 2006 at 12:52

“He found that if you naÃƒÂ¯vely analyzed the resulting graph, you could conclude that one of the Ã¢ï¿½ï¿½centralÃ¢ï¿½ï¿½ players was Ken LayÃ¢ï¿½ï¿½s Ã¢ï¿½Â¦ secretary.”

The implied sarcasm in this statement is ironic, because it completely misses the fact that both in Enron and in any terrorist organization, the secretary IS a central player. That doesn’t make them the boss, per se, but they can be critical – especially from an intelligence gathering perspective. If you can figure out who the secretary is and bug him/her, suddenly you’ve got the jackpot – again, as far as intelligence gathering goes. Sure, if you assassinated the secretary it wouldn’t bring down the organization – they’d just put somebody else in that spot.

But then, what business is the NSA in again? Oh yeah, intelligence gathering.

Personally, I think the entire rest of his argument is moot after this glaring statement of stupidity.
Alan says:

Tuesday, 16 May 2006 at 13:22

Bruce Schneier, a well-regarded security analyst, did a similar analysis on data mining in general, and reached a similar conclusion:

http://www.schneier.com/blog/archives/2006/03/data_mining_for.html

See also this:
http://www.schneier.com/blog/archives/2006/01/post_1.html

Having worked in that area before, I found the article to be persuasive.
James Joyner says:

Tuesday, 16 May 2006 at 13:45

Alan: I’m sure that data mining taken in isolation is relatively ineffective. I’m just pretty sure that we’re not doing it in isolation. Our intel agencies justly get plenty of criticism for various bureaucratic failings. They are not, however, manned by idiots.
Alan says:

Tuesday, 16 May 2006 at 14:02

Hello James,

Like I said, I worked in that area. I know exactly the kind of people who work there and I was one of them. I did not call anyone an idiot.

The various agencies love to throw large amounts of money at experimental technology. That is great, until it crosses the line on respecting the privacy of American citizens.

The most likely use for any large database is to investigate a person AFTER he/she has been identified as a suspect. That should be done using warrants, court oversight, etc.

Data mining, IMHO, is not a legitimate use of technology for crime enforcement, and if the current trends continue, will eventually turn the USA into a police state.
ken says:

Tuesday, 16 May 2006 at 14:11

James, even so called smart people do incredibly stupid things. Usually, in any competitive endeavor, the market forces a correction, or else. Just look at what the brainiacs at Long Term Capital Management did for a good example of how this works.

With the NSA on the other hand there are no consequences for acting stupidly as long as people like you continue to defer to what you think is their ‘superior’ knowledge.

Beside none of this is needed for security purposes. Our various law enforcement agencies had all the information they needed to stop the attacks on 9/11, they just did not know they had it. With the lessons learned from that mistake I trust they will not be dropping the ball again when such knowledge comes their way a second time.

The Bush domestic spying program does serve as an intimidation to all law abiding citizens however – now that we know Big Brother is listening. If that is your intent then I can see why you would support it.
James Joyner says:

Tuesday, 16 May 2006 at 14:31

Alan: Didn’t mean to imply you were calling these people stupid. Farley sure seems to think they are, though.

Ken: It’s not that I’m deferring to their superior knowledge; it’s that I understand that, from the outside, I can’t fully appreciate the program. Is it possible that Farley’s premise (it “probably isnÃ¢ï¿½ï¿½t worth infringing our civil liberties for Ã¢ï¿½ï¿½ because itÃ¢ï¿½ï¿½s very unlikely that the type of information one can glean from it will help us win the war on terrorism”) is right? Sure. It’s just that his analysis is based on an assumption this is happening in a vacuum.

I’m not at all sure we’ve learned that much from 9/11 and that the information stovepipes won’t bite us in the ass again. Bureaucracies don’t change that much after one foul up. Or ten.
Maniakes says:

Tuesday, 16 May 2006 at 18:22

He found that if you naively analyzed the resulting graph, you could conclude that one of the “central” players was Ken Lay’s … secretary.

Perhaps this is why we keep catching the #3 guy in Al Qaeda?
Boyd says:

Wednesday, 17 May 2006 at 07:28

Alan said:

Data mining, IMHO, is not a legitimate use of technology for crime enforcement…

Probably not, but that’s pretty much a non sequitur. The data mining under discussion here is for intelligence collection, not law enforcement. Folks can make all the “slippery slope” arguments they want, but these supposedly aggressive intelligence collection techniques are a far cry from police state tactics. They may be a precursor, and it’s wise to be watchful, but it seems to me to be hysterical histrionics to look at what’s going on today and say we’re turning into a police state.

IMHO, of course.
ICallMasICM says:

Wednesday, 17 May 2006 at 09:46

Whenever a client tells me they want to do some data mining my thoughts immediately go to ‘boondoggle’ and ‘bonanza’ and my mouth says ‘let’s discuss how I can help you with that’. Data minine = gold mine for db developers, dba’s, report developers, analysts, QA analysts…
McGehee says:

Wednesday, 17 May 2006 at 14:12

Probably not, but thatÃ¢ï¿½ï¿½s pretty much a non sequitur. The data mining under discussion here is for intelligence collection, not law enforcement.

Thank you, Boyd. We do seem to keep having to remind people that the NSA isn’t providing evidence for trial, but intelligence for WAGING @#$!ING WAR.

And they keep failing to grasp that seemingly rather simple fact.
Barry says:

Thursday, 18 May 2006 at 09:01

James Joyner: “Alan: IÃ¢??m sure that data mining taken in isolation is relatively ineffective. IÃ¢??m just pretty sure that weÃ¢??re not doing it in isolation. Our intel agencies justly get plenty of criticism for various bureaucratic failings. They are not, however, manned by idiots. ”

They are, however, under the political authority of people who’ve repeatedly demonstrated that they can’t be trusted. And after the history of the Cold War, it’s pretty much proven that such programs will be used for the political purposes of the people running the programs, and certain well-connected politicians.
Shane says:

Monday, 22 May 2006 at 12:29

I’m a journalist, and I’ve covered data mining and intelligence in-depth over the years. I’m working on a piece about what else the NSA and other agencies are doing with data mining, and how it might connect to the NSA’s terrorist surveillance program. I find this thread quite fascinating, and would like to invite some of the people who’ve posted comments here to get in touch with me if they feel they can add something to my piece. I’m particularly interested in hearing from people who’ve worked with data mining and related tools, particularly in a government setting. I would like to talk to experts who can speak from an informed perspective about such matters. Many thanks. Shane Harris National Journal sh*****@na*************.com

Outside the Beltway