SoundFrequencyMapperFFT Tutorial: From Waveform to Frequency Bins
This tutorial walks through building a SoundFrequencyMapperFFT — a straightforward pipeline that converts an audio waveform into frequency bins using the Fast Fourier Transform (FFT). It covers concepts, windowing, FFT implementation, mapping bins to frequencies, and a simple C++ example using the FFTW library. Assumptions: mono PCM audio (floating-point), sample rate 44.1 kHz, and buffer sizes that are powers of two.
1. Overview — pipeline steps
- Capture or load PCM waveform samples.
- Apply a window function to reduce spectral leakage.
- Compute FFT on the windowed samples.
- Convert complex FFT output to magnitude (frequency spectrum).
- Map FFT bins to frequency values and group into desired bins (linear or logarithmic).
- Post-process (smoothing, peak detection, normalization) and visualize or use for analysis.
2. Key concepts
- Sample rate (Fs): number of samples per second (Hz).
- Frame / window size (N): number of samples per FFT; determines frequency resolution Δf = Fs / N.
- Nyquist frequency: Fs/2, the highest representable frequency. FFT outputs N complex bins; positive frequencies are in bins 0..N/2.
- Windowing: common windows — Hann, Hamming, Blackman — reduce leakage at the cost of widening peaks.
- Magnitude spectrum: sqrt(re^2 + im^2) or use hypot(re, im); often converted to dB via 20log10(mag).
- Bin frequency: bin k corresponds to k(Fs/N). For real input, use k = 0..N/2.
3. Choosing N and hop size
- Frequency resolution Δf = Fs/N. For 1 Hz resolution at 44.1 kHz, N=44100 (large, high latency).
- Time resolution trades off with frequency resolution. Choose hop size (overlap) typically 50%–75% of N for smoother spectrograms.
4. Window functions
Use a Hann window for general-purpose analysis: w[n] = 0.5(1 – cos(2πn/(N-1))), n=0..N-1. Apply sample-wise: xw[n] = x[n] * w[n].
5. FFT implementation notes
- Use an efficient FFT library (FFTW, KissFFT, Intel MKL).
- For real-valued input, use real-to-complex transforms to save computation and memory.
- Normalize FFT output depending on library: some scale by N, others do not.
6. Mapping bins to frequency bins (linear and logarithmic)
- Linear mapping: each FFT bin is its own frequency. For grouping, sum or average magnitudes across target bin ranges.
- Logarithmic (octave or mel-like): map FFT bins to log-spaced bins — useful for audio perception and visualization. Compute target bin edges in Hz, then include FFT bins whose center frequencies fall within each edge range.
7. Post-processing
- Convert magnitudes to dB: magdb = 20*log10(mag + ε).
- Smoothing: exponential moving average per bin: s[t] = α*s[t-1] + (1-α)*current.
- Peak detection: find local maxima across neighboring bins and/or frames.
8. C++ example using FFTW
- Requirements: FFTW library installed. Link with -lfftw3 -lm.
- Example code (mono float input, N power of two, Hann window, real-to-complex FFT, linear mapping to N/2+1 bins):
cpp
#include#include #include #include constexpr int N = 4096; constexpr double Fs = 44100.0; std::vector<double> make_hann(int N){ std::vector<double> w(N); for(int n=0;n<N;n++) w[n]=0.5(1.0 - cos(2.0M_PIn/(N-1))); return w; } int main(){ // Example: fill ‘in’ with N samples from audio source std::vector<double> in(N); // TODO: replace with real audio read for(int i=0;i<N;i++) in[i]=0.0; auto window = make_hann(N); std::vector<double> inw(N); for(int i=0;i<N;i++) inw[i]=in[i]window[i]; int outSize = N/2 + 1; std::vector<fftw_complex> out(outSize); fftw_plan plan = fftw_plan_dft_r2c_1d(N, inw.data(), out.data(), FFTW_ESTIMATE); fftw_execute(plan); std::vector<double> magn(outSize); for(int k=0;k<outSize;k++){ double re = out[k][0]; double im = out[k][1]; magn[k] = sqrt(rere + imim) / N; // normalize by N } // Map bin k to frequency f = k * Fs / N for(int k=0;k<outSize;k++){ double freq = k * Fs / N; printf(“Bin %d: f=%.2f Hz, mag=%.6f “, k, freq, magn[k]); } fftw_destroy_plan(plan); return 0; }
9. Practical tips
- Use overlap-add or circular buffers for continuous streaming.
- Apply pre-emphasis (high-pass) if focusing on high-frequency content.
- Choose window and N based on trade-offs (speech vs music vs instrumentation).
- For latency-sensitive apps, prefer smaller N and efficient libraries; consider hardware FFTs.
10. Further extensions
- Compute mel-spectrograms by mapping FFT magnitudes to mel bands.
- Use peak-picking and harmonic grouping for pitch detection.
- Visualize with a log-frequency spectrogram or real-time waterfall display.
This provides a compact, actionable path from raw waveform to frequency bins using FFT. Use the C++ example with an actual audio input source and adjust N, window, and mapping to fit your application.
Leave a Reply