DM Review: “TIAin’t.” Herb Edelstein points out four major problems with the TIA strategy from a technical point of view:
- Data integration and data quality: How much time and money will the TIA folks spend just on trying to match disparate records from fifty state drivers’ license bureaus, hundreds of utility bill providers and credit application sources, and all the different banks, credit card providers, and so forth?
- Too much data, too few examples: With only a handful of domestic terrorists and a US adult population of about 220 million, Edelstein points out, there’s way too low a signal to noise ratio: “Let’s assume there are 1,000 active terrorists in the U.S. (a number that likely overstates the case by an order of magnitude) out of a population (age 16 and up) of approximately 220 million. An algorithm could be 99.999995 percent accurate by saying no one is a terrorist. Even were we to look only at non-citizens (an arguable tactic), we would still have an accuracy rate of 99.99995 percent by declaring no one a terrorist.”
- Lack of sufficient examples to create good signatures (identifying patterns). This is a technical refinement of the previous point, but basically the sample size of terrorists is so small that it’s hard to build patterns from them that can reliably be used to predict future terrorist activity. Further, Edelstein points out, terrorists exhibit adaptive behavior, learning from what gets other terrorists caught.
- False positives. Edelstein summarizes this point as a kind of Hobson’s choice: you don’t want to falsely accuse anyone but you don’t want to miss any terrorists. And if you have a failure rate of your algorithms of 0.1%—an overwhelming success in most data mining applications—that’s still over 220,000 potential false positives!
Edelstein concludes that the right answer is to improve the technology and use it to answer fixed questions rather than look for patterns in all possible available data—to use the system for decision support rather than rely on it to make the decisions.
My question: given the large amount of money to be spent, and the large likely consequences of arresting and incarcerating innocent people, how big a disaster do we have to be able to predict and eliminate before a system like this justifies its cost?