Daily Archives: December 5, 2008

computerMusic2 lectureNotes_cm2

(compMus2) Convolution

Convolution is a fundamental process in digital audio processing. Even if you do not specifically know that the process is happening, you know the effects of the process. Filtering, reverberation, and cross synthesis all illustrate convolution. For example, a filter convolves its impulse response (IR) with the input signal to produce filtered output. Sampling reverbs convolve impulse responses of physical spaces with input signals to produce the effect of playing a sound in the physical space.

Musical Uses of Convolution: Reverberation and Filtering/Cross Synthesis

Convolution can be used to simulate an arbitrary signal being played back in a specific physical space by sampling the impulse response of a space and convolving that IR with an arbitrary signal. Sampling a room requires a signal with (preferably) all frequencies. Often, a balloon popping or a starter pistol is used. The best way is to use a quick sine tone sweep. Convolution can be used to filter signals, either for cross synthesis purposes, or to simulate the characteristics of an audio system, such as a microphone or guitar amp.

The Math of Convolution

For fun, the equation:

 

convolution equation

convolution equation

For every sample in one signal (the arbitrary signal), multiply it by every sample in the IR b, and sum the results offset by each sample in a. The length of the output file will equal the length of signals a + b - 1 sample. Convolution is not multiplication. Multiplication in the time domain is amplitude modulation. (For each sample, multiply one sample from a times one sample from b.) Convolution of two audio signals is a series of multiplications, and a summation of those results. Each sample in one signal is multiplied by the entire set of samples in the second signal, offset in the original signal by the location of the sample being multiplied.

Implementation of Convolution

Implementing convolution in the time domain is very computationally expensive, and not practical as a process. To implement convolution as a digital signal process we rely on the Law of Convolution. The Law of Convolution states that convolution in the time domain is equal to multiplication in the frequency domain, and vice versa. Both signals are converted to the frequency domain via an FFT, and their resulting frequency spectra are multiplied.

Understanding convolution as multiplication of the frequency spectra is the easiest way to understand how convolution can be used to filter a signal. Shared frequency content between two signals will be resynthesized, but any frequency not found in both signals will be silenced (multiplying any number by 0 equals 0). Understanding convolution in the time domain is the easiest way to understand how convolution works as reverberation, since each sample in one signal will be scale and repeated for every sample in the other signal. The result of this operation is time smearing. It should be noted that however you understand convolution, the process of convolution is acting as both a filter and time smearing operation. Therefore, if your purpose is weighted towards filtering, your impulse should be very short. If you wish to simulate reverb, your impulse should be of a duration that matches with typical reverb times (0.8 and above, with a decaying amplitude envelope).

computerMusic2 lectureNotes_cm2

(compMus2) Phase Vocoding

Phase Vocoding allows for independent control of time duration and pitch. 

Time Expansion/Compression with Phase Vocoding

The conversion of an audio signal from the time domain to the frequency domain results in a series of frames containing bins of frequency and amplitude information. If you conceive of the FFT as producing a snapshot, a frozen picture of frequency/amplitude information for a short segment of time, then it is easy to understand time expansion/compression as similar to changing the frame rate of video playback. Individual pictures (the analysis frames) are not changed, only their rate of playback. 

Consider a simple math example. With an FFT size of 512 samples, each analysis segment lasts for approximately 11 ms. In the frequency domain, this 11 ms analysis segment represents one frame of frequency/amplitude bins. If during resynthesis (the inverse FFT) each frame is resynthesized at a rate of 11 ms per frame then the output signal is the same duration as the input signal. If the rate of frame resynthesis changes to 22 ms per frame (twice the original analysis duration), then the output signal will be twice as long as the original. If the rate changes to 44 ms per frame, then the output signal expands to four times the original length. This method of time expansion/contraction is completely analogous to slow motion (or fast motion) video. You are not adding more frames to the video playback when you slow down/expand time (like you would with granular synthesis); you are simply changing the playback rate of the frames you have already recorded/analyzed. 

Pitch Shifting with the Phase Vocoder

Phase vocoding shifts pitch through simple multiplication. You multiply the frequency information in all bins of every frame by the same transposition factor. Multiplying by two transposes the output resynthesis up one octave; multiplying by 0.5 transposes the output down one octave. If you are looking for precise semitone transposition you will need to calculate 2 to the-x/12th power, where x equals the number of semitones of transposition.