(compMus1) Time Compression/Expansion – Fourier Transform and Phase Vocoding

Somehow this didn’t make it to the blog after the lecture…

Classic “concrete” techniques

  • With classic tape techniques, the only way to change the duration of a recorded sound is to change the speed of the tape. (Which also changes pitch.)
  • Same is true if all you want is a pitch change. (Duration changes)

Computer processing techniques

Software offers two different options for changing duration independent of pitch:

  • Granular Synthesis, a process that slices (windows) time domain audio into very small (1 – 100 ms) segments, and
  • Phase Vocoding, a process that converts time domain audio into frequency domain representations.

Converting Domains

Any arbitrary periodic signal can be represented as a sum of many simultaneous sine waves.

Fourier Transform

  • Converts a time-domain representation into a frequency domain representation

Inverse Fourier Transform

  • Converts a frequency-domain representation into a time domain representation

Fast Fourier Transform (FFT)

  • The FFT takes a slice of time (a window) that is n samples in length, where n = some-power-of-2.
  • The number of samples in an FFT window = the number of frequency bands between 0 Hz and the Sampling Rate.
  • Only half the bands are usable. (why?)

How Phase Vocoding Works

  • Each FFT window represents a frame, or still picture, of analysis information (frequency domain content)
  • Time compression or expansion involves changing the playback rate of the frames (the conversion of frequency domain to time domain), which takes place during an inverse Fast Fourier Transform (iFFT)
  • Like changing the playback rate of film or video.
  • Pitch Shifting is an independent process.
  • X times all frequency bands (2 = octave up; 0.5 = octave down.

Phase Vocoding parameters

  • FFT size (window size)
    • Determines number of frequency bands
    • Determines length of time per analysis window ( FFT_Size / SR = Length in seconds)
  • Number of Overlaps
    • Determines onset of windows
    • Helps with time resolution
  • Window type
    • Describes the amplitude envelope applied to each time window
    • Can affect accuracy of measurements
    • For now, you can stick to a Hamming window
  • Time Scale (constant or function)
  • Pitch Scale (constant or function)

Problems with the Phase Vocoder

  • Frequency/Time trade-off – the Uncertainty principle
    • the more accurate you are measuring one parameter, the less accurate you are measuring the other.
    • Larger FFT size provides more frequency bands, but less information about start time of events, and vice versa.
  • Frequency bands are linearly spaced, but our perception of pitch is logarithmic.
  • Fourier Transform theory assumes a periodic signal.
    • Periodic signals have no beginning or end (infinity in both directions)
    • Implied in this assumption (as it relates to the FFT) is that a signal begins its period at the beginning of an analysis window, and that the end of the analysis window is a period end point of the signal. Windowing corrects for the unlikelihood of this happening.


Leave a Reply