Bob Williamson:  

Reverberation: Why two ears help and you can not talk easily to your computer

 

Why is that it is so hard to understand someone who is talking on a hands-free telephone? The most common reason is reverberation caused by the sound waves bouncing around the room before being picked up by the microphone. The same phenomena prevents automatic speech recognition systems working unless the user holds a microphone very close to their mouth.

In many text books on signal processing you can find as exercises or examples the illustration of the fact that the process of reverberation can be modelled as a so called ``linear time invariant filter'', which is conceptually the simplest sort of filtering a signal can undergo. It is also often suggested that an ``inverse'' filter can be used to remove the effect of reverberation. (A process known as inverse filtering or equalisation.)

This is where there is a large gap between what is conceptually possible and practically possible. Inverse filtering of reverberation is possible if 1) one can determine the exact filter that the room is effectively applying to the signals and 2) that filter does not change.

Neither of these conditions are satisfied in practice. Even if one could determine the filter (which would need exact knowledge of the original voice signal), it turns out that the slightest changes in position of the source change the filter drastically. So much so that in a typical room, if the room filter was inverted exactly, and the source moved by just a few centimetres, then the result would be worse than if no inverse filtering were applied at all.

It is easy to see why this is so. To sound waves the walls of a room behave like dirty mirrors: they reflect most of the sound waves hitting them. Imagine yourself in a room with optical mirrors that reflect say 70% of the light that impinges on them and there is a single light bulb in the room. You will see many reflections (a finite but large number because although each time the light is reflected it loses intensity, the number of reflections increases rapidly with the order of the reflection). Exactly equalising the room essentially involves exactly cancelling out all of the reflections by adding them together with the appropriate phase (destructive interference of the waves). Now imagine moving the light bulb slightly. The pattern of reflections changes in a very complex manner. And even if one knew the geometry of the room, the slightest deviations in exact geometry would mean that the pattern of reflections would be very different.

So how to pick up sound in a reverberant room? Use more than one sensor, which is exactly what our heads do and is why if you block one ear whilst listening to someone in a reverberant room, they become so hard to understand. (try it next time you are in a boring lecture!) How can it be done technologically: by utilising constructive interference rather than destructive. If one uses several microphones and their outputs are delayed such that the signal coming directly from the the desired source is phase (time) aligned exactly, then the direct signal is reinforced. The reverberation is not removed entirely, but the resulting device is far more robust to movements of the source.

Thus this illustrates a rather generic effect: if one tries to solve a problem perfectly one often gets a very non-robust solution, but if one only aims at a modest improvement, the solution can be intrinsically robust.

13 March 2000