(must115) digital audio data

This post is both a summary and an expansion of topics in chapter 6 of An Introduction to Music Technology, by Hoskens.

comparing analog and digital signals

To understand the difference between analog and digital signals, you must understand that analog signals are continuous fluctuations, while digital signals are comprised of discrete values. Hoskens illustrates the difference as that between a light dimmer switch and a regular light switch. The dimmer is continuously variable between a minimum and maximum value. The switch is either on or off. The dimmer represent an analog system, and the switch a digital system.

digital recording: sampling and quantizing

To make a digital recording, you sample a fluctuating signal at regular intervals and store those samples as a series of numbers. How often you take a sample is called the sampling rate, and it is measured as a frequency: samples per second, in Hertz. Assigning a numerical value to each sample involves quantizing the value to assign it to a discrete number. How many values a digital system can represent between minimum and maximum is the sample resolution.

For CD-quality audio, the sampling rate is 44,100 Hz (44.1 kHz), and the sample resolution is 16 bits.

sampling rate and the Nyquist Frequency

To get an accurate representation of frequency of a signal, you must sample the signal at least twice a cycle to capture the positive fluctuation and the negative fluctuation. If you do not sample at least twice a cycle, the resulting signal heard on playback will be aliased. An aliased signal will be heard at a different frequency than the original signal. For accurate recordings, you need a sampling rate that is at least twice that of the highest frequency. Another way of looking at this situation is that any frequency above 1/2 of the sampling rate will be aliased.

We refer to 1/2 the sampling rate as the Nyquist Frequency. For CD-quality audio, that frequency is 44.1 kHz/2 = 22.05 kHz. Since normal human hearing goes up to 20 kHz, this frequency is generally adequate for digital recording. You can predict the aliased frequency of a signal above the Nyquist by finding the difference between the original frequency and the Nyquist Frequency, and then subtracting that difference from the Nyquist Frequency.

Original frequency – Nyquist = frequency difference
Nyquist – frequency difference = recorded/heard frequency

For example:

SR = 10,000
NF = 10,000/2 = 5000
Original frequency = 7000
7000 – 5000 = 2000
5000 – 2000 = 3000 Hz

Aliasing only applies to frequencies above the Nyquist Frequency, 1/2 of the sampling rate. To avoid aliasing in digital recordings, a lowpass filter is inserted before the ADC to remove all frequencies above the Nyquist from the signal. Since the recorded signal is a discrete series of values, the resulting waveform is stair-stepped, or jagged. If played back in its original form, it would introduce many high frequencies into the signal that weren’t recorded. Therefore, a lowpass filter (sometimes called a smoothing filter) is applied to the signal after the DAC on playback to remove frequencies above the Nyquist, since they were not part of the original recording.

quantizing and sample resolution

Each sample is stored as a discrete number. Any incoming amplitude value that falls between available values in a digital system has to be rounded or truncated to match an available value. The easiest way to think about this process is to compare floating point numbers (numbers with fractional components) and integer numbers. The value 37.3 would be represented as 37 if I could only use integers. 37.8 might be represented as 38 or 37, depending on if you rounded the number (changed it to the nearest integer) or truncated (the fractional value removed). In any case, the process of changing the incoming signal to a discrete value is called quantizing. In a CD-quality system, you have 16 bits of sample resolution, leading to 65,536 values. 24 bits provides 16,777,216 values.

The difference between the actual value and the quantized value will be heard as audio noise. Literally, the digital value is a mistake, and mistakes equal noise in the system. How much noise, or how much error, is called the signal-to-error or signal-to-noise ratio, and signifies the dynamic range of the system. More on that below.

binary numbers

Computers represent values as binary numbers, or binary digits – bits. A bit is either 0 or 1 (two choices, hence binary). For each bit added to your system, you double the range of possible values you can store. 1 bit = 2 choices (0 or 1). 2 bits = 4 choices (00, 01, 10, 11). 2 bits = 8 choices (00, 01, 10, 11, 100, 101, 110, 111).

Another way to think of binary numbers is that they represent a base 2 counting system. We normally use base 10 counting. Each place in base 10 represent a number times a power of 10. For example:

125 in base 10 equals
5 x 10-to-the-zero-power (1) = 5
2 x 10-to-the-first-power (10) = 20
1 x 10-to-the-second-power (100) = 100

Base 2 counting is similar, with each place representing a power of 2.

101 in base 2 equals
1 x 2-to-the-zero-power (1) = 1
0 x 2-to-the-first-power (2) = 0
1 x 2-to-the-second-power (4) = 4

101 in base 2 (binary) equals 5 in base 10 counting.

8 bits = 1 byte, which combined with a base 2 counting system, impacts how we talk about file size. Since a computer represents data in binary, a kilobyte (kb) is not 1000 bytes. A kb equals 1024 bytes. A megabyte (MB) equals 1024 kilobytes, and etc (gigabyte, terrabyte…)

more on signal-to-error ratio (signal-to-noise)

Since each added bit in a digital system provides twice the available values, each added bit provides 6 dB to the signal-to-noise ratio. The signal-to-noise ratio represents the range between the highest amplitude that can be recorded and the noise floor that results from quantization error. In a 16-bit system, 16×6 = 96 dB signal-to-noise ratio. In a 24-bit system, 24 x 6 = 144 dB signal-to-noise ratio.

high resolution digital recording and file size.

You can easily ascertain the file size of a digital recording of a certain length by multiplying the Sampling Rate x Sample Resolution x Length (seconds) x Channels (number). For example, 2 seconds of stereo CD audio:

44,1000 x 16 x 2 x 2 = 2,822,400 bits
2,822,400 bits / 8 (1 byte) = 352,800 bytes / 1024 (1 kb) = 344.53 kb

Using the formula, 1 minute of stereo CD audio equals approximately 10 MB.

Doubling the sampling rate would double the file size, but knowing what we know about pitch to frequency relationships, it would add only 1 octave to the available pitch range of the recording. (Not much!) However, adding 8 more bits of sample resolution, taking the bit resolution to 24, only adds 50% more data to the file but adds 2-to-the-eight power to the amplitude resolution. That additional resolution is 256 times better than 16 bit audio.

The key point here is that better bit resolution leads to a much more noticeable improvement in digital recording quality than increasing the sampling rate.