The mp3 History

Fraunhofer Institute for Integrated Circuits

Technology

How Does Perceptual Audio Coding Work?

Music consists of many different components - not all of which are audible in the same way. For example, a soft flute may be hidden from the ear of the listener if a trumpet is played at the same time. The flute is still present, of course, but the listener is simply unable to perceive it: The flute is masked by the trumpet.

The various characteristics of human auditory perception are recognized and utilized by perceptual audio codecs such as mp3. Elements of the recordings that are easily perceived are represented with exacting precision, while other parts that are not very audible can be represented less accurately. Meanwhile, inaudible information can be discarded altogether.

So to return to our earlier example, an mp3 implementation sees the trumpet represented with great precision and the flute more vaguely. This flexible method of representation helps to reduce the amount of information to be transmitted or stored - helping to minimize overall file size - and, at the same time, introduces an error (noise) signal. Ideally, this so-called 'coding noise' is masked similarly to the flute signal. It stands to reason that the smaller the bit-rate of perceptual audio codecs, the less accurately the overall music signal is represented. Beyond a certain limit (i.e. at extremely low data-rates), the coding noise is no longer masked from the listener.

The diagrams illustrate the coding noise as it appears within an mp3 file. Each vertical bar represents a certain frequency range, with higher bars indicating the introduction of more coding noise. If a bar exceeds the masking threshold at 0 dB (red line), a listener may perceive the noise. At high bit-rates it is rare that a bar crosses that threshold, while at low bit-rates this can happen more often; consequently the difference to the original can become audible.

No audible coding noise is introduced if the bit-rate is set properly.

Low bit-rates cause audible coding noise.