(sonicArts) analog recording, digital recording, digital audio sampling theory

MUST 121 students need to know the following information by memory for the in-class portion of the Quiz on Tuesday, 9/24. Other information from this section of the class will be covered in the take-home portion of the quiz.

Analog Recording

Analog recording involves the transduction of energy from one form to another. With analog recording, the energy that results from each transduction fluctuates in an analogous way to the prior fluctuation of energy. Consider the analog recording and playback chain:

sound waves (air pressure) -> microphone (electrical) -> tape recorder (magnetic after record; electrical after playback) -> loudspeaker (electrical to magnetic) -> sound waves (air pressure)

At each stage of the chain the energy type is in parentheses. If you were to graph the amplitude of the energy change over time, your graph would be very close to the same in every step along the analog recording chain. When something is analogous, it is comparable in some particular way.

The disadvantage of analog recording is the fundamental aspect of transduction. When you convert energy from one from to another you inevitably lose energy. With analog recording, the act of making a copy involves two transduction processes, resulting in energy loss that is referred to as generational loss.

Digital Recording

With digital recording, sound is not stored as an analogous magnetic fluctuation. Instead, it is stored as a series of binary numbers that represent the instantaneous sampled amplitudes of the incoming audio signal. The analog tape recorder is replaced by the analog-to-digital converter (ADC) and the digital-to-analog converter (DAC). Storage could be still be magnetic (as on a hard drive, flash storage, or even tape), but the magnetic fluctuation will not be similar to the air pressure fluctuation of the sound, or the electrical fluctuation of the audio signal coming from a microphone. So our digital recording and playback chain looks like:

sound waves (air pressure) -> microphone (electrical) -> ADC (binary numbers) -> storage -> DAC (electrical) -> loudspeaker (electrical to magnetic) -> sound waves (air pressure)

Digital Audio Sampling Theory

Two key properties determine the quality of recorded digital audio. The sampling rate (SR) determines the highest frequency that can accurately be recorded, and the bit resolution (or bit depth) determines the signal-to-noise ratio (SNR).

Sampling Rate and the Nyquist Frequency

Since one cycle of a sound wave has a positive and a negative fluctuation, you must take at least two samples for every cycle of a wave. That means that your SR needs to be twice the highest frequency you want to record. Put another way, with a given SR you can only accurately record SR/2 frequency. SR/2 is the Nyquist Frequency (NF). If frequencies above the Nyquist reach your ADC they will get converted and recorded, but they will be aliased: recorded as something other than their actual frequency. In fact, the mistake is predictable. For any frequency above the Nyquist (SR/2), the difference between the frequency and the Nyquist is subtracted from the Nyquist Frequency to give the resulting recorded frequency.

NyquistFrequency – (originalFrequency – NyquistFrequency) = aliasedFrequency

-or-

original frequency – NF = Deltaƒ
NyquistFrequency -Deltaƒ = aliased frequency

For example,

If the Sampling Rate = 20,000 Hz, and an incoming audio signal contains a sound at 12,000 Hz, the aliased frequency would be 8000 Hz.

10,000 – (12,000 – 10,000) = aliasedFrequency
10,000 – (2,000) = 8,000 Hz

ADC’s eliminate frequencies above the NF by lowpass filtering the incoming signal, removing frequencies above the NF (in simple terms). On playback, the converted electrical signal will be stair-stepped, because it is a recording on instantaneous (not continuous) amplitudes. A stair-stepped wave will introduce a lot of high frequency content to the signal, so another lowpass filter is used to remove frequencies above the NF (since they could not have been recorded anyway) and smooth the output signal.

sound waves (air pressure) -> microphone (electrical) -> lowpass filter -> ADC (binary numbers) -> storage -> DAC (electrical) -> lowpass filter -> loudspeaker (electrical to magnetic) -> sound waves (air pressure)

Bit Resolution

The bit resolution refers to the number of bits (binary digits) used to store the converted audio signal. Consider bit resolution to be similar to color resolution in a picture. If you only have 256 colors to represent a photograph, your photograph will look very blotchy (discrete color blocks, instead of semi-continuous color transitions). If you have thousands of colors, your photo gets better, but if you have millions of colors your photo looks very close to the real object. Graphics cards simplify the resolution settings usually by offering hundreds of colors, thousands of colors, and millions of colors. Roughly speaking, these conform to 8 bits, 16 bits, and 24 bits being used to represent colors. The more bits, the more color variations you have.

Since digital audio records amplitudes, the more bits you have means the more accurate your amplitude representation (recording) will be. With fewer bits, you have more rounding error, or quantization. In an audio system, any error results in audible noise. The greater the error, or cumulative errors, the more noise you will hear. (This is yet another definition of noise – a technical definition applied to a specific type of system.)

Each binary digit (bit) is either a one or a zero. If you have one bit, you can either have 1 or 0 as a value, much like a light switch. If you have two bits, you double the possible values to four. You can determine the number of values possible based on the number of bits by taking using the number of bits as the power to 2. For n bits:

If you have 4 bits, you have 2-to-the-4th possible values –  2 x 2 x 2 x 2 = 16 possible values. If you have 8 bits, you have 2-to-the-8th or 256 values. 16 bits results in 65,536 values; 24 bits results in over 16 million values.

The standard CD sampling rate and bit resolution is 44.1 kHz (44,100) and 16 bits. DVDs typically use 48 kHz sampling rates and up to 24 bits for resolution. While the SR is entirely adequate for our hearing range, an increase in bit resolution is very helpful in more accurately representing lower amplitude audio signals, which helps improve the signal to noise ratio (SNR, the amplitude ratio of the highest amplitude sound to the amplitude of the noise floor). Roughly speaking, you gain 6 dB of SNR for every bit of resolution. 16-bit audio has a theoretical SNR of 16 x 6 = 96 dB, while 24-bit audio has a theoretical SNR of 144 dB. Actual performance depends on a lot of factors related to your audio components.

For now, we will work with 44.1 kHz, 16-bit files.

Summary of Sampling Rate and Bit Resolution

If you think of Sampling Rate as speed, then you can relate speed to frequency. Bit resolution is even more self evident. Resolution equals sharpness, and sharpness equals accuracy. Therefore, bit resolution relates to accuracy.