Q&A: Bob Lee: Hearing Aids: Advent of 3D Audio

Cockpits today are highly visual environments, and display technology is advancing toward more intuitive interfaces. But audio technology has lagged behind. Now scientists at the U.S. Air Force Research Laboratory (AFRL) have developed the means to create "auditory displays" and navigation cues that augment visual interfaces. They have, for example, developed an "auditory horizon" and an auditory threat warning system through a technology known as 3D audio. This ability to artificially localize sounds also has applications in areas such as combat search and rescue and air traffic control. We talked with Bob Lee, chief of the Battlespace Acoustics Branch of AFRL’s Human Effectiveness Directorate.

Avionics: What is 3D audio?

Lee: In the real world, when people are talking to you, you hear sounds coming from all around you, which makes it easy to listen to them and comprehend them. Three-dimensional audio is the technology that, on a headset, recreates directional sound. We’re finding out that it’s very important to spatialize sounds in order to distinguish between them. It’s a technology with many applications.

Avionics: Can you tell us about a recent 3D audio experiment?

Lee: We got a congressional "add" [funding that Congress provides for a proposal not included in a program’s budget] two years ago, where we were looking at 3D audio for general aviation [GA] safety. GA pilots have the highest accident rate. They’re not as well trained as commercial pilots, who are prepared to deal with every conceivable bad situation out there.

Avionics: Does it apply to the military?

Lee: Absolutely. The first plane military pilots fly on is a general aviation aircraft.

Avionics: What’s the status of the project?

Lee: We just finished a test at the Test Pilot School at Edwards Air Force Base, Calif. The project took a novel approach, using 3D audio to create an artificial horizon. We found that when people go on long missions, they bring CD players and music. So we ran the music through a 3D audio "filter" to create a sense of horizon. As you’re tipping, the music changes to indicate the proper maneuver. You know very quickly if you’re pitching up or down or rolling your wings. You never get very far off before you self-correct. [The music goes back to normal as soon as the pilot responds to the cue.] We put them into a pitch, fly them blind, and see if they could correct back, just based on the audio. But we’re not using auditory cues to supplant visual cues. We want to augment them.

Avionics: What’s next?

Lee: NASA Langley is interested in the technology. A series of flight tests is planned for March. I believe, if you could get the technology out commercially, it actually could save lives.

Avionics: What can artificially localized sounds be used for?

Lee: Your ears are a natural pointing device for your eyes. You look in the direction the sound comes from. Think about a concept here for a fighter aircraft. They have a radar warning receiver that provides a warning in the cockpit.

Somebody on the ground senses there is an airplane in the air, starts a search pattern and locks onto your airplane. Then they shoot a missile at you. The [aircraft’s] radar can tell when someone is looking for you, when they find you and when they start trying to shoot you down. As soon as the radar finds out that the system is looking for you, an alarm goes off in the cockpit.

You are now aware of it, and look down to a visual system and do a mental translation to figure out where the threat is. You then translate that to the real world. With 3D audio the threat warning is directionalized. The pilot gets the sound in the direction the threat is coming from.

The application was demonstrated in the 1990s on a Navy AV-8A aircraft at China Lake [the Naval Air Weapons Station at China Lake, Calif.], using special hardware. When they first heard the 3D audio cues, pilots would go back and check their radar screens because that’s the way they had been trained. But after beginning to trust the system, they started to take evasive action as soon as they heard the sound.

Avionics: How did it work out?

Lee: They found you can actually evade better [with auditory cues]. It gives you a one-second advantage in a two-second gun fight. In some situations one second is important.

Avionics: Would this be useful for air traffic control applications?

Lee: We’re working with the ground station for the Nellis Air Force Base test range. The operators talk to all the aircraft out there in an exercise. For the [China Lake] radar warning receiver demo, we had to build customized hardware.

But now we can do that using software. That’s why we were able to put it in the Nellis ground station. Nellis is the only place in the Air Force right now that 3D audio is operational.

For no real extra cost to the ground system, they were able to put it in. We’re playing with the volume and changing the pitch to improve comprehension. It’s like at a cocktail party. Sometimes the sounds seem muddled. But if somebody had a distinctive voice, you’d hear it. Sometimes these guys will listen to three, four or five different radios at the same time. It’s part of their job.

Avionics: I understand you’re focusing on the E-3 aircraft. Can you explain?

Lee: Communications officers on aircraft such as the E-3 AWACS [airborne warning and control system] monitor radio traffic. They may be listening to as many as seven communications channels at one time. We proposed putting 3D audio in AWACS aircraft because we know they are building a digital intercom system. We demonstrated the technology on an aircraft with the software on a laptop computer. The operators liked it and put it in as a requirement. But it keeps missing the list because manufacturers don’t know how to do it, although we would be glad to show people how to do it.

The Electronic Systems Command has wanted to put 3D audio on the AWACS aircraft, but up until now it has been too expensive to pursue. With our new software-based system, we believe that the cost could be dramatically reduced. This is the basis of the Defense Acquisition Challenge [DAC] proposal that we submitted to demonstrate 3D audio on the AWACS. [The DAC program provides opportunities for new technology to be introduced in acquisition programs.]

Avionics: What’s the status of the challenge program?

Lee: The 2005 challenge made it all the way through to the final stage, where we were told that it was the next item to be funded. It was right below the cut line. If more funding is found, it will be funded, and if not, we have resubmitted it for the 2006 Defense Acquisition Challenge.

Avionics: Could it be used for combat search and rescue?

Lee: Yes. There were documented incidents in Vietnam where a pilot was hiding in a rice paddy. But all rice paddies look the same. You’ve got a visually rich field and you can’t find targets very easily. As soon as the helicopter comes in, the Viet Cong are drawn to it. The helicopter has to hover there, searching for the downed pilot. If you put a head tracker on the helicopter pilot–or the pararescue soldier who has to find the downed pilot–and give him directional cues from the downed pilot’s transponder, they would be able to spot him faster and quickly get in and out of a dangerous situation.

Avionics: Can you describe any other applications?

Lee: What we offer is a software solution. We’ve demonstrated that it’s useful. But some company out there has to build this and put it into airplanes or sell it as an option. We built this sphere [the Auditory Localization Facility, shown top right] about 20 years ago. But only now, as the technology gets mature, can we show benefits. Then you can start transitioning to something that is a useful product. For example, we’ve been able to build head tracker systems on a wearable computer, with 3D audio and a GPS receiver, that allow people to navigate very accurately across land. You hear a sound in your ear and it tells you where to go.

We did an experiment using medics at Wright-Patterson because they have to go through map and compass training. They are tested on how fast they can get to waypoints put on their maps. We gave them a visual cue like a GPS mapping system, plus an audio cue. We found the audio cue was still the best way to do it. It’s like taking a bearing line all the time. By recording shouted and whispered speech and normalizing it to the same volume level, we added speech-based distance cueing. [A whispered phrase, e.g., "Over Here," indicates a short distance, whereas a shouted phrase indicates a longer distance.]

To find a waypoint, you have to find an angle and a distance within a coordinate system. You get the angle from the directionality of the sound and the distance from the speech-based cue. The volume of the sound is the same, but at 1,000 yards [914 m] you hear it as a shout, at 100 yards [91 m] you hear it as normal, and within 2 yards [2 m] you hear it as a whisper.

Avionics: Was there special training?

Lee: They listened for two minutes. We said the sound will change as the distance changes. They figured it out very quickly.

Avionics: Could this help pilots?

Lee: It has a lot of implications for flying. As part of the general aviation project, we did audio-cued waypoints as well as the auditory horizon.

Avionics: Can you tell us a little about the science of 3D audio?

Lee: We look at how humans process the signals. In our Auditory Localization Facility, we created a sphere with 277 speakers that are going from every direction around you, spatially separated every 15 degrees. So, basically, we can play sound from different directions [with respect to a person situated at the center of the sphere.] There are two main, fundamental things that are happening to explain why sound is different at one ear from the other ear.

The first thing is the interaural time delay. It takes a little longer for a sound to get to one ear than the other. Sound travels at 1,100 feet per second. Basically, it’s going to take about half a millisecond to get from one ear to the other if it’s coming directly from the opposite side. So the interaural time delay gives you an idea of where a sound is [coming from].

Another thing is very important. There’s a "shadowing" effect from the sound that hits on one side of the head vs. the sound that hits on the other side of the head. Lower frequencies, for example, will pass around an object because they have very long wavelengths. But higher frequencies have a harder time getting across. So they get absorbed more on the [near] side. The higher-freqency signals are more susceptible to shadowing from my head. There is an amplitude difference between one ear and the other that varies with frequency.

Avionics: Why are the time delays and frequency effects important?

Lee: If a sound comes from a certain direction relative to the hearer, there will be certain time delay and frequency effects associated with the sound. These effects can be mapped in the localization facility and captured in algorithms known as head-related transfer functions. These transfer functions are like filters that allow you to adjust where the sounds seem to be coming from in space in order to better identify and comprehend the different sources of information. You run sound through the filter and that’s what puts it back out, for example, 90 degrees here or over there, down there. We modify the signal, basically, as it’s coming in, in real time.

By putting the time and frequency effects of one sound back into another sound, and then putting it back into the stereo headset, you can recreate the second sound as if it came from the direction of the first. That’s what we’ve been able to do.

Avionics: How would this apply to an intercom system, for example?

Lee: Sounds are coming in over a radio or an intercom system. So if I’ve got five different people coming over five different intercoms, I now can put them on the headset and spatialize their locations. This helps with the comprehension of what is said. If I wanted to know where my wingman is for formation flying, I could make the radio conversation come from where he is relative to me. Then I would quickly have intuitive knowledge of where he is. This would improve a pilot’s situational awareness.

Avionics: How well are humans able to localize sounds?

Lee: We have run thousands of experiments in the past couple of years–hundreds and hundreds of different types of experiments–because we’re trying to maximize people’s efficiency. We found out the best places to put the sounds if you want to listen to seven people. For example, equal spacing of sounds is not optimal most of the time. Most people can localize a sound to about 10 to 12 degrees. The best localizers can detect sounds at spacings of about 2 degrees. Conversely, we found out that some people are naturally bad localizers, just as some people are color blind. They can’t localize within 30 degrees.

Avionics: Could you give us an example of the head transfer function?

Lee: The best example I can think of is a car stereo. It’s got filters: bass boost, etc. Sounds coming in from the radio station are running through a filter that makes the sounds appear to be more powerful. They are amplifying some of the midranges. You prop up a lot of the other stuff and get mostly in the speech range. So then the voices come through a lot clearer even if the signal’s been distorted a little bit. That’s the same thing we’re doing, essentially, except with the interaural time delay and the interaural frequency effect. So it’s filtering, not a modulation. And we pay attention to the timing, which is kind of a phase delay.

Avionics: How many sounds can a listener discriminate simultaneously?

Lee: We can put up as many signals as you want. One of the research efforts we do here is to examine how much an individual can logically take in. I’ve actually got some data on that. The answer is, once you get past four or five people talking at the same time, then your ability to comprehend what they’re saying begins to drop off, although some people can do 12 people. But you can listen to all five at the same time. We’ve actually found that by spatially separating the sound, we can improve people’s intelligibility scores, We recreate what occurs naturally over a headset. So now it’s easier to keep track of people that are talking, and who’s doing the talking.

Avionics: You have to have a stereo headset and a head tracker?

Lee: With a monaural headset, if you’re listening to two people, you probably can catch most of the right answers although they step on each other a little bit. With stereo headsets, you can put one in each ear, and that’s good. So 3D audio doesn’t give you much of an advantage with two talkers.

As you bring up the number of people, however, your ability to listen to all of them degrades to the point where, when you get seven people talking, you’re only getting 20 percent of the correct responses about things you’re supposed to be looking for. Whereas, with that same scenario, with nothing changed except plugging in 3D audio in your stereo headset, people can get up to 60 percent of the correct answers. For most applications you would need a head tracker, too. Without it you would lose the directional information when you turn your head.

Avionics: Why is 3D audio technology so easy to use?

Lee: We think that people who listen to multiple conversations through 3D audio are able to comprehend critical information better because it is getting them closer to the natural way their audio systems were trained as they were growing up. Tests show that with spatial audio all listeners, even untrained listeners, can pick out target items with higher accuracy.

Avionics: What else are you looking at?

Lee: If you vary the volume and pitch, you can improve people’s comprehension because you make distinctions of both space and content. We can do the spatialization of sound, but now we are looking at what content or symbology we should put out in space to help a person do a job better. We’re also looking at voice inflections and distance cueing. And how do you pick up that something is moving? You can put in a frequency shift, but that’s not what people are picking up on. We are trying to understand what people actually cue in on to improve their ability to process the content of the communication.