**Background**

In 1822, Jean Babtiste Joseph, Baron de Fourier developed the theorem any periodic signal could be represented as the sum of individual sine waves. The number of sine waves needed could be infinite, and each sine wave would have its own frequency, amplitude, and initial phase. The process of calculating the frequencies present in a signal is called the *Fourier Transform*. As mentioned in my previous post, using the Fourier transform converts a time domain audio signal into a frequency domain representation.

This brief definition of the theorem gives us our first problem with the transform. The transform works on *periodic* signals. In fact, it assumes that the signal being transformed is periodic. Periodic signals not only repeat at regular intervals, they are *infinite*, which implies that the signals have no beginning or end. Setting aside the practical considerations of a signal that lasts forever, and has always existed, a regularly repeating signal doesn’t change frequency!

A related problem is that the FT has no way of knowing when a particular frequency starts within the analysis time segment. If a frequency appears at all during the analysis segment, it is calculated as being present for the whole segment. If you were to apply an FT to the entire *Rite of Spring*, and then resynthesize the results, you would hear a single (very complex) chord from start to finish.

**Means of Calculating the Fourier Transform**

The earliest forms of FT calculation were done by hand. Mechanical springs gave way to analog filters, and finally, to computer analysis. Since any computer operation involves a discrete series of values (rather than continuous analog time), computer FT’s are *Discrete Fourier Transforms *(DFT).

Since the FT itself cannot distinguish the start time of a given frequency within the analysis segment, FT’s are usually applied to very small time segments, in a series. This process of analysis is called the *Short-Time Fourier Transform* (STFT). The STFT is not necessarily a digital process. However, all DFT’s use the STFT.

The time segment used for calculation is taken by applying an amplitude *window*. This window, a very short amplitude envelope, is the same as what is used for granular synthesis. The window generally has tapered ends to eliminate the discontinuity between the end of the signal and its beginning (since the FT assumes that the signal is periodic).

**The Fast Fourier Transform (FFT)**

Even using a computer, a DFT requires an enormous amount of computation and is not practical to use. The discovery of a mathematical trick finally made the DFT a usable process. It was discovered that if the number of samples in your STFT window were a power of 2, you could greatly reduce the number of calculations needed to perform the analysis. Hence, the *Fast Fourier Transform* (FFT) was developed.

In the FFT, the size of the window in samples is the *FFT size*. The FFT size is equal to the number of analysis frequency bands evenly spaced between 0Hz and the sampling rate. You can calculate the frequency band spacing by taking the SR and dividing it by the FFT size. For example, with a SR of 44,100 Hz, an FFT size of 512 gives you a frequency band spacing of (44,100 / 512) = 86 Hz (approximately). If you used 1024 samples in your FFT, the frequency spacing would be about 43 Hz.

Given that we perceive pitch in an exponential relationship to frequency, the linear nature of the FFT presents a problem. Generally, this problem is compensated for by using a larger FFT size, which reduces the band spacing. Using 2048 samples yields a band spacing of about 21.5 Hz; 8192 provides a roughly 5 Hz spacing between analysis bands.

**The Uncertainty Principle**

While it would appear to be preferable to use as large an FFT size as possible for better frequency resolution, such an assumption is not always correct. With the FFT there is a tradeoff between time resolution and frequency resolution, similar to Heisenberg’s *Uncertainty Principle*. Heisenberg found that the more you looked for the velocity of an object, the less you knew about its position, and vice versa. For the FFT, the more look for frequency, the less you know about time. This uncertainty arises because the Fourier Transform can not distinguish between a frequency that appears at the beginning of a transform window, and one that appears halfway into the transform window (or any other time within the window). Any frequency appearing at any time within the window is analyzed as being present for the entire window. Since you add samples to the window to increase frequency resolution, you are also adding a greater period of time that is being analyzed, and consequently lowering the time resolution of the analysis.

For example, a 512-sample FFT window lasts approximately 12 ms (size/SR). Within that 12 ms window we lack time knowledge of events. If you double the window to 1024 samples, you double the time segment to approximately 24 ms. Each doubling of the window doubles the length of time for analysis, and halves our time resolution. At 4096 samples, our time resolution is reduced to approximately 93 ms, which is quite noticeable.

To work around this uncertainty problem you typically use overlapping analysis windows. However, overlapping windows can add an echo-type effect to the re synthesis, and will thicken the sound.

**FFT Parameters**

- FFT Size: The size of the analysis window, in samples. For the FFT, the size must be a power of 2. The size of the FFT will equal the number of frequency analysis bands, evenly spaced from 0 Hz to the Sampling Rate at multiples of SR/FFTsize. Half of the bands (up to the Nyquist Frequency) are usable.
- Window Type: The short-time amplitude envelope applied to the segment of audio being analyzed by the FFT. In general, bell-shaped envelopes are best for analysis.
- Bin: For one analysis segment, each frequency band being analyzed and its corresponding amplitude are represented together as a pair of numbers. This pair of numbers is a bin. Since the FFT size equals the number of analysis frequency bands, the FFT size will also equal the number of bins.
- Frame: The collection of bins for one analysis segment. If the FFT size is 512, then there are 512 frequency bands being analyzed, and consequently, 512 bins in the frame. The frame corresponds to the audio segment being analyzed at any given point in time. For purposes of understanding time manipulation via phase vocoding, you can also think of the frame as the frequency
*snap shot* of an analysis window.
- Overlaps: the number of overlapping analysis windows applied to the input signal. More overlaps can provide greater time detail.
- Hop Size: the distance between the start of overlapping analysis windows. This hop size, or skip, is usually determined by spacing overlapping windows evenly at a distance of 1/#_overlaps times the FFT size.

**FFT Problems (applies to all versions of the FT)**

- The FFT assumes the input is periodic, which implies infinity. Infinitely periodic signals don’t change pitch.
- The spacing of frequency analysis bands is linear, while our perception of pitch is exponential.
- The Uncertainty Principle applies to measurements of frequency and time. Larger FFT sizes give better frequency resolution, but worsen the time resolution, and vice versa. The FFT cannot distinguish start times of frequency components within a window.