Understanding Audio Spectrograms

Spectrograms are a fundamental tool in audio signal processing, offering a visual representation of the spectrum of frequencies in a sound signal as they vary with time. This powerful visual aid is used extensively in music analysis, speech processing, and various other fields of audio processing. Spectrograms can be generated in several forms, each providing unique insights into the structure and characteristics of an audio signal. In this post, we'll explore the different types of spectrograms and demonstrate how to generate them using the librosa library in Python.

Types of Spectrograms

Linear-Frequency Spectrogram: This is the most basic form of a spectrogram, displaying the frequency spectrum on a linear scale. It is particularly useful for analyzing harmonic content and other detailed aspects of an audio signal.

Log-Frequency Spectrogram: Unlike the linear-frequency spectrogram, this type plots frequencies on a logarithmic scale. This is more aligned with how humans perceive sound, making it useful for music and speech analysis, where distinguishing between lower frequencies is more important.

Mel-Spectrogram: The Mel scale is another perceptually motivated approach, designed to mimic the human ear's response more closely than the linear or logarithmic frequency scales. A Mel-spectrogram uses the Mel scale to represent frequency, offering a highly effective representation for speech and music recognition tasks.

Constant-Q Spectrogram: This spectrogram uses a logarithmic spacing of frequencies, similar to the log-frequency spectrogram, but with a constant ratio between the center frequencies of the filters used to create the spectrogram. It's particularly useful for music analysis because it aligns well with the musical scales and chords.

Generating Spectrograms with Librosa To illustrate how to generate these spectrograms, we'll first need to install librosa and then load an audio file for analysis. Below are code snippets for each type of spectrogram mentioned:

Prerequisites

import librosa
import librosa.display
import matplotlib.pyplot as plt

# Load an audio file
audio_path = 'path/to/your/audio/file.wav'
y, sr = librosa.load(audio_path)

Linear-Frequency Spectrogram

D = librosa.amplitude_to_db(np.abs(librosa.stft(y)), ref=np.max)
plt.figure(figsize=(10, 4))
librosa.display.specshow(D, sr=sr, x_axis='time', y_axis='linear')
plt.colorbar(format='%+2.0f dB')
plt.title('Linear-frequency power spectrogram')
plt.show()

Log Frequency Spectrogram

D = librosa.amplitude_to_db(np.abs(librosa.stft(y)), ref=np.max)
plt.figure(figsize=(10, 4))
librosa.display.specshow(D, sr=sr, x_axis='time', y_axis='log')
plt.colorbar(format='%+2.0f dB')
plt.title('Log-frequency power spectrogram')
plt.show()

Mel-Spectrogram

S = librosa.feature.melspectrogram(y, sr=sr, n_mels=128)
S_DB = librosa.amplitude_to_db(S, ref=np.max)
plt.figure(figsize=(10, 4))
librosa.display.specshow(S_DB, sr=sr, x_axis='time', y_axis='mel')
plt.colorbar(format='%+2.0f dB')
plt.title('Mel-frequency spectrogram')
plt.show()

These snippets will generate spectrograms from an audio file, providing a visual representation of its frequency content over time. Experimenting with these spectrograms can offer deep insights into the characteristics of audio signals, aiding in various analysis and processing tasks.

Spectrograms are indispensable in the world of audio analysis, serving as a bridge between raw audio data and our understanding of its content. Whether it's for music analysis, speech recognition, or environmental sound classification, mastering the generation and interpretation of spectrograms is a valuable skill in digital signal processing.