There are several technological and mathematical reasons why TIA can't become truly oracular. Its main limitation is that it could never really know everything. Indeed, how much it could conceivably know -- and how fast it could know it -- is at this point unclear; a database on a huge scale that's meant to be as dynamic as TIA has never been set up before, experts say, and nobody knows if it's even possible. But even if DARPA does manage to create the database, TIA will face another limitation: It can only know what you do, not what you think. And, though it would have some idea -- maybe even a good idea -- of what a terrorist plan "looks" like, TIA would be limited to terrorist attacks it has seen in the past. And it's not clear that all new terrorism will look like old terrorism. Before Sept. 11, the possibility that a data-mining system might have predicted that four planes would be simultaneously hijacked and slammed into buildings would have been close to nil -- and the likelihood that terrorists will come up with new, unprecedented threats seems close to 100 percent.
TIA will most likely get its information from currently existing databases -- from banks, airlines, retail chains, etc. But giving TIA access to that data presents a significant database problem, engineers say. "You have lots of headaches in the management of the data," says Wisconsin's Ramakrishnan. "It's going to be copied from a multiplicity of sources at different schedules, and how do you keep track of where what came from when?"
Another problem is that two databases can be as different as two languages -- you can translate from one to the other, but sometimes it doesn't always make sense. A bank might reference every person by his account number, for example, while the DMV will do it by driver's license number. How will TIA know that the person named Mohamed Atta with a certain bank account number is the same Mohamed Atta with a certain driver's license? Michael Franklin, a database expert in the computer science department at the University of California at Berkeley, says that for many years, businesses have been looking for good ways to address this data-integration problem. "It's really an age-old problem, and companies have been trying to do it for years and years because even inside a company they tend to have lots of different databases," he says. "There are some ways to do it. If I was a credit card company and I made an agreement with an airline company, we could together figure out how to cobble together the databases. But what's missing is some larger way to do it." A good part of the short-lived, late-'90s boom in "business to business" (or B2B) companies was aimed at fixing this problem, Franklin says, and some part of the push for "Web services" is as well.
DARPA's involvement in a research area tends to accelerate advances in that field, and the group's stated goal for TIA is to do things that have never been done before. In its solicitation for research ideas for TIA, the agency asks for ideas "that enable revolutionary advances in science, technology or systems." It's possible that DARPA could hit on some new, easy way of integrating information, which scientists say would be a good side benefit of the project. "I'm retiring," says Stanford's Jeffrey Ullman, "so I'm not trying to use you as a way to get more money for my research project. But I think the government has made a huge mistake in not funding computer scientists, and this is an area -- the information-integration part of it -- which has good commercial use as well." (The Internet can be considered a good side benefit of DARPA research.)
After it puts together its database, TIA would then set about looking for hints of terrorism hidden in its data. An important question arises: What are some hints of terrorism? According to the TIA site, the system will have in its memory "patterns that cover 90 percent of all previously known foreign terrorist attacks." (TIA hasn't said what "foreign" means in that sentence -- is it referring only to attacks that occurred outside the U.S., or to all attacks perpetrated by foreign groups? If it means the former, 9/11 wouldn't be in its database; if it means the latter, the Oklahoma City bomb and the Unabomber's attacks would be left out.)
In other words, for TIA to single them out, potential terrorists would need to be doing many of the same things that some other terrorists have done before. Is that likely? Probably -- after all, every terrorist organization needs to communicate, shop for equipment, and participate in the financial system. The problem is that innocent people need to do those things, too. Thus, one of the main challenges John Poindexter will face in building his noise filter will be its calibration: Should TIA look at more specific, narrow traits of terrorism in an effort to reduce the false positives, while risking the chance that some novel disaster will slip through? Or should it do the opposite -- look for the more general characteristics of terrorists and risk pursuing thousands (or millions) of innocent people?
"That's a good question," says Gregory Piatetsky-Shapiro, a data-mining expert who runs KDnuggets, an online newsletter devoted to the subject. The answer, he says, "is that in general you do still want to protect against past attacks -- so you would look for the kinds of things that happen there and try to stop those. But also, there are general things that you would look for in other attacks" -- things that are statistically unlikely in the general population.
Ullman says, "You ask it about all of the unusual coincidences of people who are known to be involved with al-Qaida. The system should be able to notice that four guys have enrolled in different flight schools, and you have to distinguish that from noticing that four guys in al-Qaida have bought jeans at Macy's."
But what about regular people -- people who aren't suspected of being in al-Qaida? "That's where it becomes a hard algorithm problem and a good research problem," Ullman says. "This is something that requires the brightest minds in computer science."
But could even the brightest minds prevent TIA from fingering innocent people? Not long after he heard about the system, Bobby Gladd, a statistician and self-described "political pain in the ass" who lives in Las Vegas, set out to determine how many false positives a system like TIA would produce. It turns out that you don't need an advanced degree in statistics to do the calculation Gladd did to determine that even if TIA is very good, it will still be frequently wrong.
Gladd figures that if TIA has a scheme that can correctly identify as innocent 99.9 percent of the innocent people it sees -- an exceptionally high percentage that is probably not achievable -- then it will still end up with about 240,000 falsely accused Americans. (That is, 0.1 percent of the 240 million adult Americans.) If you reduce the percentage to 80 -- more reasonable but probably still too high -- the number of false positives becomes 48 million!
"I am offended by the constitutional implications of it," Gladd says, "but at the same time I'm calling attention to it on the basis of what I do. This is a waste of time, and it's going to take away resources." Like many other critics of the system, Gladd points out that intelligence analysts missed 9/11 not because they had too little information -- it turns out that, in retrospect, there were many "unconnected dots" pointing to an attack -- but because they didn't have the capability to analyze it. Gladd says that government money would be more wisely spent on information analysis. "Every dollar spent on TIA is going to be a dollar not spent on fighting terrorism," he says.
But Piatetsky-Shapiro says that we have to remember that law enforcement already falsely follows a lot of innocent people. Anyone who's seen "Law & Order" knows this. The recent hunt for the Washington sniper proved this too, as thousands of calls poured into hotlines, almost all of them pointing to people uninvolved with the crime. "I think we'll never be able to eliminate false positives," Piatetsky-Shapiro said, "but maybe this tool can improve the ratio."