Since none of our texts provide information about the Fourier Transform, we will use the information in this post as our reference material in our Intro to Sonic Arts class (MUST 121).
In 1822, Jean Babtiste Joseph, Baron de Fourier developed the theorem that any periodic signal could be represented as the sum of individual sine waves. The number of sine waves needed could be infinite, and each sine wave would have its own frequency, amplitude, and initial phase (starting point in the wave cycle). The process of calculating the frequencies present in a signal is called the Fourier Transform. As mentioned in my previous post, using the Fourier transform converts a time domain audio signal into a frequency domain representation.
This brief definition of the theorem gives us our first problem with the transform. The transform works on periodic signals. In fact, it assumes that the signal being transformed is periodic. Periodic signals not only repeat at regular intervals, they are infinite in both directions (forward and backwards in time), which implies that the signals have no beginning or end. Setting aside the practical considerations of a signal that lasts forever, and has always existed, a regularly repeating signal doesn’t change frequency!
A related problem is that the FT has no way of knowing when a particular frequency starts within the analysis time segment. If a frequency appears at all during the analysis segment, it is calculated as being present for the whole segment. If you were to apply an FT as one analysis segment to the first movement of Beethoven’s Fifth Symphony, you would hear a single (very complex) chord from start to finish comprised of every note that sounded during the entire first movement.
Means of Calculating the Fourier Transform
The earliest forms of FT calculation were done by hand. Mechanical springs of various lengths and masses followed, which vibrated in sympathy with specific frequencies. Mechanical springs gave way to analog filters, which would allow you to measure the amplitude of a sound that passed through a filter’s frequency pass band. Finally, analog filters gave way to computer analysis. Since any computer operation involves a discrete series of values (rather than continuous analog time), computer FT’s are Discrete Fourier Transforms (DFT).
Since the FT itself cannot distinguish the start time of a given frequency within the analysis segment, FT’s are usually applied to very small time segments, in a series. This process of analysis is called the Short-Time Fourier Transform (STFT). The STFT is not necessarily a digital process. However, all DFT’s use the STFT.
The time segment used for calculation is taken by applying an amplitude window. This window, a very short amplitude envelope, is the same as what is used for granular synthesis. The window generally has tapered ends to eliminate the discontinuity between the end of the signal and its beginning (since the FT assumes that the signal is periodic).
The Fast Fourier Transform (FFT)
Even using a computer, a DFT requires an enormous amount of computation and is not practical to use. The discovery of a mathematical trick finally made the DFT a usable process. It was discovered that if the number of samples in your STFT window were a power of 2, you could greatly reduce the number of calculations needed to perform the analysis. Hence, the Fast Fourier Transform (FFT) was developed.
In the FFT, the size of the window in samples is the FFT size. The FFT size is equal to the number of analysis frequency bands evenly spaced between 0Hz and the sampling rate. You can calculate the frequency band spacing by taking the SR and dividing it by the FFT size. For example, with a SR of 44,100 Hz, an FFT size of 512 gives you a frequency band spacing of (44,100 / 512) = 86 Hz (approximately). If you used 1024 samples in your FFT, the frequency spacing would be about 43 Hz.
Given that we perceive pitch in an exponential relationship to frequency, the linear nature of the FFT presents a problem. Generally, this problem is compensated for by using a larger FFT size, which reduces the band spacing. Using 1024 samples yields a band spacing of about 43 Hz**; 8192 provides a roughly 5 Hz spacing between analysis bands. (**Note that SoundHack focuses its analysis bands below the Nyquist frequency, giving you a spacing of 21.5 Hz for an FFT size of 1024 samples.)
The Uncertainty Principle
While it would appear to be preferable to use as large an FFT size as possible for better frequency resolution, such an assumption is not always correct. With the FFT there is a tradeoff between time resolution and frequency resolution, similar to Heisenberg’s Uncertainty Principle. Heisenberg found that the more you looked for the velocity of an object, the less you knew about its position, and vice versa. For the FFT, the more look for frequency, the less you know about time. This uncertainty arises because the Fourier Transform can not distinguish the time difference between a frequency that appears at the beginning of a transform window, and one that appears halfway into the transform window (or any other time within the window). Any frequency appearing at any time within the window is analyzed as being present for the entire window. Since you add samples to the window to increase frequency resolution, you are also adding a greater period of time that is being analyzed, and consequently lowering the time resolution of the analysis.
For example, a 512-sample FFT window lasts approximately 12 ms (size/SR). Within that 12 ms window we lack time knowledge of events. If you double the window to 1024 samples, you double the time segment to approximately 24 ms. Each doubling of the window doubles the length of time for analysis, and halves our time resolution. At 4096 samples, our time resolution is reduced to approximately 93 ms, which is quite noticeable.
To work around this uncertainty problem you typically use overlapping analysis windows. However, overlapping windows can add an echo-type effect to the re synthesis, and will thicken the sound.
- FFT Size: The size of the analysis window, in samples. For the FFT, the size must be a power of 2. The size of the FFT will equal the number of frequency analysis bands, evenly spaced from 0 Hz to the Sampling Rate at multiples of SR/FFTsize. Half of the bands (up to the Nyquist Frequency) are usable.
- Window Type: The short-time amplitude envelope applied to the segment of audio being analyzed by the FFT. In general, bell-shaped envelopes are best for analysis. Unless you have done some research in this area, you can safely stick to a Hamming window.
- Bin: For one analysis segment, each frequency band being analyzed and its corresponding amplitude are represented together as a pair of numbers. This pair of numbers is a bin. Since the FFT size equals the number of analysis frequency bands, the FFT size will also equal the number of bins.
- Frame: The collection of bins for one analysis segment. If the FFT size is 512, then there are 512 frequency bands being analyzed, and consequently, 512 bins in the frame. The frame corresponds to the audio segment being analyzed at any given point in time. For purposes of understanding time manipulation via phase vocoding, you can also think of the frame as the frequency snap shot of an analysis window.
- Overlaps: the number of overlapping analysis windows applied to the input signal. More overlaps can provide greater time detail, but they can also produce noticeable echoes.
FFT Problems (applies to all versions of the FT)
- The FFT assumes the input is periodic, which implies infinity. Infinitely periodic signals don’t change pitch (or change any parameter).
- The spacing of frequency analysis bands is linear, while our perception of pitch is exponential.
- The Uncertainty Principle applies to measurements of frequency and time. Larger FFT sizes give better frequency resolution, but worsen the time resolution, and vice versa. The FFT cannot distinguish start times of frequency components within a window.