Monday, March 6, 2006
Researchers Outline Difficulties In Boosting Screener Performance
While ongoing research to get deeper inside the heads of airport screeners can better explain the prevalent cognitive processes used in looking for hazardous items, huge difficulties arise in figuring out how to improve screener performance.
Eventually, such research may create even more impetus for federal policymakers to focus on next-generation screening technology as the primary way of improving airport checkpoint screening, rather than on the human factors. Indeed, some members of Congress have already said that little more can be done with screener training. However, the Transportation Security Administration (TSA) is apparently still trying.
In one effort, researcher J. David Smith at the University of Buffalo (UB) in New York and his colleagues at Georgia State University in Atlanta have found that screeners (actually, test subjects in the laboratory performing the screeners' job) have considerable difficulty in finding novel, less familiar threats -- which is just what terrorists are expected to attempt to hide -- and instead, tend to fall back on recalling previously seen images of hazardous items. Technically speaking, the test subjects are using "specific-token knowledge" rather than "category-general knowledge," which would be more effective in finding less familiar, "real world" threats.
This means that the TSA's traditional way of conducting screener training, through its Threat Image Projection (TIP) system, has an inherent flaw. (TIP uses a library of stored images showing hazardous items inside luggage to test screeners' detection abilities.) By expanding this image library, and even introducing new images from time to time, as Smith et al. report that TSA has started doing, TIP could remain a viable way of assessing screener performance.
So, TSA could find out that its screeners are doing as well as previously supposed, or somewhat worse. But figuring out how screeners could be coaxed into using more effective screening methods is another matter. Smith confirms that he yet knows of no effective way to train screeners to use category-general knowledge.
However, Smith admits that he could make some "educated guesses" about better training approaches, but he would rather not see them in print until more research is done. Unfortunately, TSA has decided to not continue funding his work, because an agency official feels it was too negative on TIP, Smith says. "While it isn't clear that this detracts from the importance of the science, it is fair to say that the TSA is balancing a wider set of security and public-anxiety concerns," Smith tells Air Safety Week.
Meanwhile, over at Brigham and Women's Hospital in Boston, researchers are increasingly certain that screeners are far less likely to detect images of threat objects when they seldom come across these images. In unpublished findings to be presented at the Vision Sciences Society in May, Michael J. Van Wert et al. also will report that at low prevalence rates of finding target objects, screeners give up each search too quickly.
More thorough searches, presumably, would lead to better results. And it's easier to be more thorough when the prevalence rate is higher and more searches are successful. But in the real world, guns and explosive materials do not turn up very often. "It's very hard to persuade people not to abandon a search when almost every search is a dud," says Jeremy Wolfe, one of the Brigham researchers.
The new results from Brigham build on findings from last May in the journal Nature (Vol. 435) that presented test subjects with "deeply artificial" screening tasks, like looking for images of stuffed penguins, Wolfe tells Air Safety Week. The latest work is an attempt to get closer to real-world conditions. Although it was still conducted under laboratory conditions, the experiment used images of weapons and other threat items supplied by TSA. The latest findings reveal even worse object-detection rates than the surprisingly low subject performance levels in the earlier research.
That earlier work yielded a 7 percent error rate when the target objects were present half of the time, with errors shooting up to 30 percent when there was a 1 percent prevalence rate. In the new, more realistic study using images of "virtual bags" containing 12-18 objects and target images of guns or knives, errors were 18 percent at the 50 percent prevalence rate, and 40 percent at 1 percent prevalence.
Overall, figuring out how to make screeners slow down or do a more thorough job remains quite a mystery, Wolfe reports. Part of the latest testing efforts at Brigham warned subjects when they were quitting their searches too quickly. The subjects did indeed slow down, but they still failed at the same 40 percent rate when target prevalence was in the 1-2 percent range.
Also, if screeners subjected to low target prevalence were just getting careless, they might be expected to make their errors on different bags than other screeners. But when the Brigham team also had two subjects observing the same sequence of bags, their errors occurred on the same bags.
Meanwhile, the other research team from UB and Georgia State found that only when novel objects appeared in test images by themselves did the screeners do fairly well. When novel objects were mixed in with other objects, as they would be at an airport checkpoint, detection of novel objects was very difficult. "Somehow, the screening task makes intact category knowledge essentially unusable, to the dramatic extent that participants perform about equally whether not they receive any category training," the researchers write.
Smith et al. previously tried to develop a test whereby the images would never repeat, and thus, subjects could not use specific-token knowledge. But the subjects "struggled along for hundreds of trials at about the same level that the present participants fell back to when suddenly denied their dependence on familiarity and specific-token knowledge."
Of the specific token strategy, Smith et al. conclude that "one cannot expect screeners to shun this strategy--they may not be able to." They may feel that they don't have enough time to do otherwise, or they may lack the mental energy to sustain the general-category strategy over hundreds of trials.
Similar performance drop-offs from novel to familiar targets were uncovered by British aviation security officials at a UK airport using TSA-supplied TIP images, Smith et al. report. "The critical need to ground this observation experimentally was the primary motivation for the present research," they note.
"Specific-Token Effects in Screening Tasks: Possible Implications for Aviation Security," by J. David Smith et al, is in the Journal of Experimental Psychology: Learning, Memory, and Cognition 2005 (Vol. 31, No. 6), and "Visual Search and the Collapse of Categorization," also by Smith et al., is in the Journal of Experimental Psychology: General (Vol. 134, No. 4).
>>Contact: Associate Professor J. David Smith, UB, (716) 645-3650 , ext. 346, psysmith@acsu.buffalo.edu; Jeremy Wolfe, (617) 768-8818, wolfe@search.bwh.harvard.edu<<

Join us on: Twitter AVProNet