
|
Tufts University COMP 175 Computer Graphics Final Project |
|
Music Visualization Application Author: Gerhard Stoeckel |
|
Abstract This project aims to unite human auditory and visual sensory inputs by exploring various graphical representations of sound, and animating the generated imagery in response to audio input. Introduction Hearing and vision are intimately coupled in human perception. The effect that music has on an individual can be enhanced dramatically by visual stimulus, especially when the auditory and visual stimuli are coordinated in a meaningful way. The goal of this project is to generate animated imagery that abstractly represents, and responds dynamically to changes in sound. Commercial Applications There are many commercial applications for music visualization. Two examples are Windows Media Player and MilkDrop. Images produced are typically abstract and involve fractals or an oscilloscopic effect, though they are more often composites of several techniques. Most commercial applications produce animated imagery, but few display a noticeable correspondence to the sounds being played. Synchronization is generally sacrificed for the complexity and liquidity of the images. Sound Analysis Various components can be used to characterize sound. In this application, a discrete Fast Fourier Transform (FFT) is used to obtain the spectral distribution of audio samples. The magnitude of the spectral breakdown also yields information about the loudness or amplitude of the sound. Sample data are analyzed and images are produced and animated in real-time. Because processor speed is limited, certain compromises had to be made between the amount of sample data being analyzed and the frame rate of the animated images. Inherent in a Fourier transform is the relationship between the amount of sample data and the fidelity of the spectral breakdown. A larger volume of sample data provides more spectral resolution, but requires more time to process. Conversely, a smaller data set can be analyzed faster, but with a reduction in spectral resolution. Another concern is the desired refresh rate of the animated imagery. For animations to appear fluid, the image must be redrawn frequently and the motion of objects between refreshed frames must be relatively small. Since this software is analyzing the data as it is played, the sample rate of the sound being played imposes a limit on the refresh rate for a given data sample size. Thus, trades must be made between graphical refresh rate and data set size for the FFT analysis. As stated previously, the data set size is related to the spectral resolution, so transitively, a relationship exists between refresh rate and spectral resolution. Finally, the grid of points drawn in the graphical animation correspond to the magnitude of the FFT at different frequencies for different data samples in time. As such, increased spectral resolution results in additional grid points. A larger number of grid points requires more time to process and draw. If the grid becomes too large, it will require more time to process than is available before the next sound data is ready for processing. On the other hand, a smaller number of grid points limits the spectral resolution or bandwidth that can be displayed. With an increased number of grid points, the spacing between adjacent grid points also becomes an issue. A tighter mesh may produce smoother surfaces, but the image becomes cluttered and chaotic when rendering individual shapes, as are used in some of the visualizations (discussed next). Features My sound visualization application animates a time history of the spectral distribution of audio input from a .wav format file or microphone/line input. Input mode can be changed by clicking the input menu items. Each grid point exists in a three dimensional space representing frequency, time and spectral density. The points can be solidly colored or a color contour can be derived based on the magnitude of the spectral density. I designed the contour coloring algorithm to associate longer wavelength colors with larger spectral density. Color mode can be varied using menu items and a color selection dialog box. Visualizations can be created using a variety of different representations, shown to the right. Certain graphics allow for clearer visualization of particular types of music, though it is often based on individual preference. I found the surface visualization particularly disappointing, and attempted the use of a NURBS surface. However, due to the large number of grid points and extreme non-uniformity of the surface, determining the necessary order and knot locations proved exceedingly difficult. Instead, I applied a smoothing function to the data to generate a more flowing surface. The visualizations are drawn using a perspective view and can be rotated using the mouse while sample data is analyzed and displayed. Additionally, an auto rotate feature is available to automatically spin the animation about the vertical axis of the screen at varying rates. To accomplish the concurrency of music playback, sound analysis and user interaction, and to address the challenging processing demands discussed in the previous section, I designed the application to use multiple threads. Additionally, this application leant itself nicely to parallelization, and hardware is trending in a direction that accommodates this. Multi-threading allows the application to smoothly animate the imagery while maintaining uninterrupted playback. Without it, the application is subject to graphical and musical stutter. Conclusion The synchronization of auditory and visual input can be
done in a variety of ways. This project
took one approach to this goal that I believe worked particularly well with piano music; producing visually stimulating graphics due to the relative clarity of individual notes and phrases. However, the abstract representation of sound is an art.
As such, it is subjective and one size does not fit all. |
|
Screenshot of MilkDrop 1.04d |
|
Screenshot of Windows Media Player 10 |

|
Application using point visualization |
|
Application using wireframe visualization |
|
Application using surface visualization |
|
Application using cylindrical visualization |

|
Application using spike visualization |
|
Application using ball visualization |