Yearly Archives: 2008

assignments_musth1 lectureNotes_musth1 musicTheory1

(musTh1) Final Exam Grades Posted

Final Exam grades are now posted in Gradebook.

I DO NOT HAVE YOUR FINAL AVERAGES COMPUTED. There are still some old homeworks to grade, and I will be dropping your two lowest assignment grades. I have, however, entered zeros for any missing assignments. Some of these zeros will change as I get through the old work I have.

assignments_cm2 computerMusic2 lectureNotes_cm2

(compMus2) The Final Project BOX

The box for turning in your audio CD of your final project is now on the table outside studio 9.

photo-91

photo-10

computerMusic2 lectureNotes_cm2

(compMus2) Spectral Quiz Review

For Wednesday’s (12/10) quiz over spectral processing, review the previous posts on intro to spectral processing, the Fourier transform, phase vocoding, and convolution.

Important Concepts

The quiz is not limited to the listed items below, but these concepts will go a long way towards helping you master the important material.

Intro to Spectral Processing

  • Audio domains (time, frequency)
  • Converting between time and frequency domain
  • Missing, or unspecified elements in each domain
  • The difference between time domain processes and frequency domain processes, with the ability to name some processes in each domain. (This last concept is drawn from all the posts, along with previous work we’ve done in class.)

The Fourier Transform

  • What the theorem states (i.e., the part about any periodic signal could be represented….)
  • The different implementations of the FT (FT, DFT, STFT, FFT), and how these implementations relate to each other
  • The FFT, specifically relating to the computational benefits of using an FFT size that is a power of 2
  • The Uncertainty Principle as it applies to the FT
  • FFT parameters (size, window type, bin, frame, overlaps, hop size)
  • The relationship of the FFT size to the number of frequency bands being analyzed
  • Problems with the FFT (FT): periodic, spacing of bands, time/frequency trade-off

Phase Vocoding

  • What audio manipulations/processes can be accomplished with phase vocoding
  • How time compression/expansion works with phase vocoding (also, be able to compare this to granular synthesis)
  • How pitch shifting works with PV.
  • How you overcome (to some degree) the time/frequency trade-off

Convolution

  • Convolution as a fundamental process in digital audio processing
  • The musical uses of convolution
  • Be able to describe in words the basic process of convolution
  • Implementation of convolution (using spectral processing)
  • The Law of Convolution, and its usefulness for implementing convolution
  • Understanding how convolution works to filter signals and to apply reverberation.
computerMusic2 lectureNotes_cm2

(compMus2) Convolution

Convolution is a fundamental process in digital audio processing. Even if you do not specifically know that the process is happening, you know the effects of the process. Filtering, reverberation, and cross synthesis all illustrate convolution. For example, a filter convolves its impulse response (IR) with the input signal to produce filtered output. Sampling reverbs convolve impulse responses of physical spaces with input signals to produce the effect of playing a sound in the physical space.

Musical Uses of Convolution: Reverberation and Filtering/Cross Synthesis

Convolution can be used to simulate an arbitrary signal being played back in a specific physical space by sampling the impulse response of a space and convolving that IR with an arbitrary signal. Sampling a room requires a signal with (preferably) all frequencies. Often, a balloon popping or a starter pistol is used. The best way is to use a quick sine tone sweep. Convolution can be used to filter signals, either for cross synthesis purposes, or to simulate the characteristics of an audio system, such as a microphone or guitar amp.

The Math of Convolution

For fun, the equation:

 

convolution equation

convolution equation

For every sample in one signal (the arbitrary signal), multiply it by every sample in the IR b, and sum the results offset by each sample in a. The length of the output file will equal the length of signals a + b - 1 sample. Convolution is not multiplication. Multiplication in the time domain is amplitude modulation. (For each sample, multiply one sample from a times one sample from b.) Convolution of two audio signals is a series of multiplications, and a summation of those results. Each sample in one signal is multiplied by the entire set of samples in the second signal, offset in the original signal by the location of the sample being multiplied.

Implementation of Convolution

Implementing convolution in the time domain is very computationally expensive, and not practical as a process. To implement convolution as a digital signal process we rely on the Law of Convolution. The Law of Convolution states that convolution in the time domain is equal to multiplication in the frequency domain, and vice versa. Both signals are converted to the frequency domain via an FFT, and their resulting frequency spectra are multiplied.

Understanding convolution as multiplication of the frequency spectra is the easiest way to understand how convolution can be used to filter a signal. Shared frequency content between two signals will be resynthesized, but any frequency not found in both signals will be silenced (multiplying any number by 0 equals 0). Understanding convolution in the time domain is the easiest way to understand how convolution works as reverberation, since each sample in one signal will be scale and repeated for every sample in the other signal. The result of this operation is time smearing. It should be noted that however you understand convolution, the process of convolution is acting as both a filter and time smearing operation. Therefore, if your purpose is weighted towards filtering, your impulse should be very short. If you wish to simulate reverb, your impulse should be of a duration that matches with typical reverb times (0.8 and above, with a decaying amplitude envelope).

computerMusic2 lectureNotes_cm2

(compMus2) Phase Vocoding

Phase Vocoding allows for independent control of time duration and pitch. 

Time Expansion/Compression with Phase Vocoding

The conversion of an audio signal from the time domain to the frequency domain results in a series of frames containing bins of frequency and amplitude information. If you conceive of the FFT as producing a snapshot, a frozen picture of frequency/amplitude information for a short segment of time, then it is easy to understand time expansion/compression as similar to changing the frame rate of video playback. Individual pictures (the analysis frames) are not changed, only their rate of playback. 

Consider a simple math example. With an FFT size of 512 samples, each analysis segment lasts for approximately 11 ms. In the frequency domain, this 11 ms analysis segment represents one frame of frequency/amplitude bins. If during resynthesis (the inverse FFT) each frame is resynthesized at a rate of 11 ms per frame then the output signal is the same duration as the input signal. If the rate of frame resynthesis changes to 22 ms per frame (twice the original analysis duration), then the output signal will be twice as long as the original. If the rate changes to 44 ms per frame, then the output signal expands to four times the original length. This method of time expansion/contraction is completely analogous to slow motion (or fast motion) video. You are not adding more frames to the video playback when you slow down/expand time (like you would with granular synthesis); you are simply changing the playback rate of the frames you have already recorded/analyzed. 

Pitch Shifting with the Phase Vocoder

Phase vocoding shifts pitch through simple multiplication. You multiply the frequency information in all bins of every frame by the same transposition factor. Multiplying by two transposes the output resynthesis up one octave; multiplying by 0.5 transposes the output down one octave. If you are looking for precise semitone transposition you will need to calculate 2 to the-x/12th power, where x equals the number of semitones of transposition.

computerMusic2 lectureNotes_cm2

(compMus2) The Fourier Transform

Background

In 1822, Jean Babtiste Joseph, Baron de Fourier developed the theorem any periodic signal could be represented as the sum of individual sine waves. The number of sine waves needed could be infinite, and each sine wave would have its own frequency, amplitude, and initial phase. The process of calculating the frequencies present in a signal is called the Fourier Transform. As mentioned in my previous post, using the Fourier transform converts a time domain audio signal into a frequency domain representation. 

This brief definition of the theorem gives us our first problem with the transform. The transform works on periodic signals. In fact, it assumes that the signal being transformed is periodic. Periodic signals not only repeat at regular intervals, they are infinite, which implies that the signals have no beginning or end. Setting aside the practical considerations of a signal that lasts forever, and has always existed, a regularly repeating signal doesn’t change frequency! 

A related problem is that the FT has no way of knowing when a particular frequency starts within the analysis time segment. If a frequency appears at all during the analysis segment, it is calculated as being present for the whole segment. If you were to apply an FT to the entire Rite of Spring, and then resynthesize the results, you would hear a single (very complex) chord from start to finish.

Means of Calculating the Fourier Transform

The earliest forms of FT calculation were done by hand. Mechanical springs gave way to analog filters, and finally, to computer analysis. Since any computer operation involves a discrete series of values (rather than continuous analog time), computer FT’s are Discrete Fourier Transforms (DFT).

Since the FT itself cannot distinguish the start time of a given frequency within the analysis segment, FT’s are usually applied to very small time segments, in a series. This process of analysis is called the Short-Time Fourier Transform (STFT). The STFT is not necessarily a digital process. However, all DFT’s use the STFT. 

The time segment used for calculation is taken by applying an amplitude window. This window, a very short amplitude envelope, is the same as what is used for granular synthesis. The window generally has tapered ends to eliminate the discontinuity between the end of the signal and its beginning (since the FT assumes that the signal is periodic). 

The Fast Fourier Transform (FFT)

Even using a computer, a DFT requires an enormous amount of computation and is not practical to use. The discovery of a mathematical trick finally made the DFT a usable process. It was discovered that if the number of samples in your STFT window were a power of 2, you could greatly reduce the number of calculations needed to perform the analysis. Hence, the Fast Fourier Transform (FFT) was developed. 

In the FFT, the size of the window in samples is the FFT size. The FFT size is equal to the number of analysis frequency bands evenly spaced between 0Hz and the sampling rate. You can calculate the frequency band spacing by taking the SR and dividing it by the FFT size. For example, with a SR of 44,100 Hz, an FFT size of 512 gives you a frequency band spacing of (44,100 / 512) = 86 Hz (approximately). If you used 1024 samples in your FFT, the frequency spacing would be about 43 Hz. 

Given that we perceive pitch in an exponential relationship to frequency, the linear nature of the FFT presents a problem. Generally, this problem is compensated for by using a larger FFT size, which reduces the band spacing.  Using 2048 samples yields a band spacing of about 21.5 Hz; 8192 provides a roughly 5 Hz spacing between analysis bands. 

The Uncertainty Principle

While it would appear to be preferable to use as large an FFT size as possible for better frequency resolution, such an assumption is not always correct. With the FFT there is a tradeoff between time resolution and frequency resolution, similar to Heisenberg’s Uncertainty Principle. Heisenberg found that the more you looked for the velocity of an object, the less you knew about its position, and vice versa. For the FFT, the more look for frequency, the less you know about time. This uncertainty arises because the Fourier Transform can not distinguish between a frequency that appears at the beginning of a transform window, and one that appears halfway into the transform window (or any other time within the window). Any frequency appearing at any time within the window is analyzed as being present for the entire window. Since you add samples to the window to increase frequency resolution, you are also adding a greater period of time that is being analyzed, and consequently lowering the time resolution of the analysis. 

For example, a 512-sample FFT window lasts approximately 12 ms (size/SR). Within that 12 ms window we lack time knowledge of events. If you double the window to 1024 samples, you double the time segment to approximately 24 ms. Each doubling of the window doubles the length of time for analysis, and halves our time resolution. At 4096 samples, our time resolution is reduced to approximately 93 ms, which is quite noticeable. 

To work around this uncertainty problem you typically use overlapping analysis windows. However, overlapping windows can add an echo-type effect to the re synthesis, and will thicken the sound.

FFT Parameters

  • FFT Size: The size of the analysis window, in samples. For the FFT, the size must be a power of 2. The size of the FFT will equal the number of frequency analysis bands, evenly spaced from 0 Hz to the Sampling Rate at multiples of SR/FFTsize. Half of the bands (up to the Nyquist Frequency) are usable.
  • Window Type: The short-time amplitude envelope applied to the segment of audio being analyzed by the FFT. In general, bell-shaped envelopes are best for analysis.
  • Bin: For one analysis segment, each frequency band being analyzed and its corresponding amplitude are represented together as a pair of numbers. This pair of numbers is a bin. Since the FFT size equals the number of analysis frequency bands, the FFT size will also equal the number of bins.
  • Frame: The collection of bins for one analysis segment. If the FFT size is 512, then there are 512 frequency bands being analyzed, and consequently, 512 bins in the frame. The frame corresponds to the audio segment being analyzed at any given point in time. For purposes of understanding time manipulation via phase vocoding, you can also think of the frame as the frequency snap shot of an analysis window. 
  • Overlaps: the number of overlapping analysis windows applied to the input signal. More overlaps can provide greater time detail.
  • Hop Size: the distance between the start of overlapping analysis windows. This hop size, or skip, is usually determined by spacing overlapping windows evenly at a distance of 1/#_overlaps times the FFT size.

FFT Problems (applies to all versions of the FT)

  • The FFT assumes the input is periodic, which implies infinity. Infinitely periodic signals don’t change pitch.
  • The spacing of frequency analysis bands is linear, while our perception of pitch is exponential.
  • The Uncertainty Principle applies to measurements of frequency and time. Larger FFT sizes give better frequency resolution, but worsen the time resolution, and vice versa. The FFT cannot distinguish start times of frequency components within a window.
computerMusic2 lectureNotes_cm2

(compMus2) Spectral Processing Intro

Audio Domains

Up until this point, we’ve been talking about audio processing and synthesis in the time domain. Spectral processing takes place in the frequency domain. In the time domain, we represent sound as changing amplitude (y value) over time (x value). In the frequency domain, sound is represented as changing amplitude (y value) over frequency (x value). Two things are worth pointing out at this point. One, the property of the x axis is your domain; and two, neither domain represents both time and frequency. If you’re representing time you know nothing about frequency. Likewise, if you’re representing frequency you don’t have any time information. 

Converting Domains

To convert from a time domain representation of sound to a frequency domain representation, you use a process called the Fourier Transform. To reverse the process and convert from the frequency domain to the time domain you use an Inverse Fourier Transform.

lectureNotes_musth1 musicTheory1

(musTh1) Phrases, Periods, and more

The book chapter (Ch. 12: Phrase Structure and Grouping) is relatively clear, so I’m not going to rehash everything from it, or class, here. I’ll just post some useful things to remember.

Phrase lengths are typically multiple measures of two, with four being the most common. Since pickups balance out at beginning and end to make a full measure, you don’t count the pickups as a measure.

Phrases usually have a strong cadence to mark their end. If you don’t find a strong cadence, it probably isn’t a phrase ending. Remember that you don’t end a half-cadence on V7 — only on V (the triad).

Tempo and strength of cadences help to determine phrase endings. In very slow tempos it isn’t unusual to have two measure phrases. In faster tempos, eight measure phrases often appear.

Two phrases can group together to form a period. Periods have an antecedent and consequent relationship, usually through open and closed harmonic cadences. Additionally, we can specify if a period is parallel or non-parallel based on the thematic content of the two phrases. Double periods can occur when two periods from and antecedent/consequent relationship. Usually the first period will end on a half-cadence, and the second period will end on an authentic cadence. 

Phrases that don’t form period relationships can be referred to as phrase groups.

Miniature formal design refers to the grouping of phrases thematically. We’ve covered binary, song form, and ternary.

Phrases tend to repeat at the same length. We refer to this property as phrase periodicity. 

Extension of phrase length happens either through cadential extension with rhythmic emphasis on the final cadential harmony (most often), or through internal extension (something added or repeated not at the cadence).

Contraction of phrase length is also possible. This most often occurs from simply dropping measures (all or part) from a thematic repetition of a phrase. Elisions are different. Phrases that elide share the same ending measure (first phrase) and beginning measure (second phrase). 

Phrases can subdivide internally. Any subdivision of a phrase is a sub-phrase.

assignments_musth1 musicTheory1

(musTh1) Phrase Assignment

Due Friday, Nov. 14th.

In the workbook, pp. 86 – 88, complete #1: D, E, F, G, and H, but according to the following instructions (not those in the workbook).

Mark phrases, periods, harmonies at cadences, and scale degrees at melodic cadences for each example. Do these diagrammatic markings in the workbook. 

Do not answer the workbook questions for each example. Answer the following questions. Put your answers on a separate sheet and turn in with the workbook pages.

  1. 1D: This excerpt comprises two phrases. Do the two phrases form a period relationship, and if so, indicate if it is parallel or non-parallel?
  2. 1E: Same as 1D. Indicate if period, and what type.
  3. 1F: This excerpt contains more than two phrases. Indicate any period relationships. Are there any relationships beyond that of the period? If so, indicate by name.
  4. 1G: This excerpt is a single phrase. In what two ways does this phrase vary beyond the typical four-measure length?
  5. 1H: There are two, four-measure phrases in this example. How does each phrase divide in measures (within itself)? What is the name for such divisions?
computerMusic2 lectureNotes_cm2

(compMus2) Granular Synthesis Review

Overview

  • Any sound can be thought of as containing discrete particles/time segments (grains)
  • Duration of an individual grain is short – usually 1 ms to 100 ms.
  • Within an individual grain, sound parameters are fixed. Change occurs as you progress from grain to grain.

Parameters of Individual Grains

  • Playback speed
  • Index location (location in soundfile used to create grain)
  • (maximum) amplitude
  • Grain envelope
  • Duration
  • Panning

Parameters of Grain Combinations (Macro Controls)

  • Frequency of grains (grains per second)
  • Fixed or random rate of grain production
  • Density of grains (the number of grains happening at one time)
  • Number of grain streams (can be related to density)

Windows (Grain Envelopes)

  • A window is a short-time amplitude envelope.
  • The window shape can be chosen to emphasize legato connections between grains, discontinuity between grains, or anywhere in between.

Overlaps and Streams

  • A stream is the individual series of grains occurring one after another.
  • Multiple streams involve overlapping envelopes.
  • Overlapping envelopes generally produce a smoother amplitude output.

High-Level (Macro) Organization

  • The number of parameters to control, and the number of grains per second, require some type of macro control.
  • Pitch-Synchronous organization analyzes the sound file ahead of time to set parameters so that a specified pitch will result. The parameter settings of individual grain parameters are linked. Kontakt tone machine uses pitch-synchronous organization.
  • Asynchronous organization means that all grain parameters are specified independently of each other. Control functions are usually specified to change parameters over time. 
  • Quasi-synchronous organization indicates that some, but not all, parameters are linked. It is the most common organization offered in the programs we use (Cecilia and Kontakt time machine). Most often, grain duration determines the frequency of grains, as grains are created in succession. This organization leads to a type of AM synthesis.