ATM Modernization, Business & GA, Commercial, Military

Safety: Tracking System Glitches

By David Evans | January 1, 2003
Send Feedback

The use of a procedure in aviation similar to "Ctrl-Alt-Delete" as a last-ditch stratagem to resuscitate a dead personal computer has one big difference: one’s personal computer does not have to reboot and fly an airplane with hundreds of people on board.

The case concerns the fatal Nov. 12, 2001, crash of American Airlines Flight 587, in which a balky yaw damper was reset before flight. About a minute after takeoff, the accident airplane, an A300-600, encountered wake turbulence from a preceding B747-400 and, in the space of about seven seconds, four rapid back-and-forth rudder movements caused aerodynamic loads to build up rapidly and snap off the tail. Pedal movements captured by the digital flight data recorder (DFDR) provide circumstantial evidence that pilot actions are involved (see October 2002, page 45). However, the A300-600 has a history of occasional uncommanded rudder movements, incidents of high aerodynamic loadings on the tailfin, and an anecdotal reputation as a "tail wagger."

So far, in the exhaustive investigation still ongoing, it appears that the yaw damper performed properly during the airplane’s brief flight. It apparently did not perform properly before pushback at New York’s John F. Kennedy International Airport. Buried in the thousands of pages of documents released by the National Transportation Safety Board (NTSB) is this mention:

  • "According to statements taken by the Port Authority of New York and New Jersey, between 0730 and 0800, an American Airlines maintenance crew chief received a radio call from the cockpit of Flight 587 reporting that the number two pitch trim and yaw damper would not engage. He reported the problem to the avionics crew chief. [He] sent two avionics technicians to Gate 22 to investigate the problem.

  • "The technicians confirmed that the number two pitch trim and yaw damper system could not be engaged. An Auto Flight System [AFS] check was performed on the airplane and indicated a number two flight augmentation computer [FAC] fault.

  • "The circuit breaker was cycled for the system and another AFS check was performed. The problem was corrected."

Was it really? Or did cycling the circuit breaker and the successful retest mask a subtler fault? Probe deeper into the NTSB documents and one finds this maintenance history for the accident aircraft: 12 failures of the number two pitch trim in the year before the accident. The failures involved not staying engaged/disengaged during climb-out.

Consider the larger maintenance history over this period of time. It’s a veritable acronym soup of swapped-out components.

A week after the crash, the Port Authority detectives interviewed the avionics technicians who made that last-minute trouble call to the cockpit while the airplane was at the gate. Technician Joseph Merriam told the detectives that failure of the pitch trim and yaw damper system to engage "is a common problem on A300 aircraft." Frank Xavier, the other technician summoned to perform the troubleshooting with Merriam, recalled that when the circuit breaker was reset and the autoland system checked out properly, "the problem was deemed to have been solved" and appropriate routine reports were sent to American’s maintenance facility in Tulsa, Okla.

Was the glitch really fixed? The ever-thorough NTSB investigators examined American’s component reliability database. From December 1999 through May 2001, the latest summary data available, American’s A300 fleet exceeded the system standard flight "delay" count, as well as the rate per thousand departures. However, the number of pilot reports (PIREPs) was under the upper limit, as was the rate of premature flight system component removals. In other words, the records do not give off the acrid odor of the proverbial "smoking gun."

Nonetheless, the everyday practice of resetting circuit breakers and cycling systems before flight–the "Ctrl-Alt-Delete" approach–troubles Raymond Hudson. A veteran avionics system engineer, Hudson argues that for too long a "severe lack of focus" has prevailed within the industry regarding the design and approval of procedures, "especially when it comes to the ‘reset the breakers and see if it repeats’ phenomenon."

"We have allowed this to happen for so long that we have ignored–and therefore foregone–the ability to collect and analyze data on why the ‘glitch’ happened, so that it could be fixed permanently," Hudson asserts.

Maybe the problem isn’t in the "black boxes." Sources advise that wiring faults are notorious for causing intermittent tripping of the circuit breakers, or perhaps electromagnetic interference may be playing a role.

In any event, regarding the failed preflight check and the quick reset before dispatch of Flight 587, Hudson says, "There is another piece of the smoking gun here, and we could be collecting data on other A300-600s that potentially have this same ‘glitch’ on preflights."

In an industry that prides itself on "data-driven" safety, Hudson modestly suggests collecting more data. "A new rule–an operational rule, not a type design rule for certification–should be written dealing with the ‘glitch/reset’ phenomenon for safety-critical flight control systems that contain software," he suggests. Hudson deliberately includes software because systems with software are "more often susceptible to such glitches than all-hardware designs."

"I do not believe this rule needs to go as far as grounding the subject airplane until the problem is fixed," Hudson hastens to explain. "However, I think this new rule should be structured to require that these ‘glitch/reset’ reports be submitted to the manufacturer, the Federal Aviation Administration and the NTSB on the nature of the problem," he says.

"The purpose would be to begin building a database of occurrences, so ‘glitches’ could be tracked against airframe type, in-flight configuration, failures logged by the computers involved, and so forth," Hudson explains. He couples his concept to the Flight 587 crash:

"If such a rule had been in place prior to Flight 587, I’ll bet there would be a lot of entries in such a database, pointing to prior flight augmentation computer preflight test failures–‘glitches’ that were ‘cured’ by resetting the circuit breakers. People who do what I do go to great lengths in the laboratory and on the aircraft, testing the systems to hunt down, explain and resolve any such ‘glitches.’ But if we are not given data on their occurrences during operational service, our ability to fix them is diminished, if not totally restricted."

Simply put, a glitch/reset may be an unexceptional and common practice, but the pattern and frequency of such activity may portend a non-trivial consequence. As in the case of personal computers, "Ctrl-Alt-Delete" is a field expedient; it is neither a diagnostic nor a fix.

David Evans may be reached by e-mail at [email protected].

Receive the latest avionics news right to your inbox