Fun With Spectrograms/Spectrographs
A sound spectrogram is an array of sound frequencies plotted over
time. The x axis is time, the y axis is frequency (going from low to
high), and the brightness of a pixel x, y represents the loudness of
that frequency at that time.
I claim that The Speech Recognition Problem is fundamentally The
Vision Problem, and both are really just The Squiggly Line Problem
(going from raw sensor data ("squiggly lines", in Paul Cohen's
terminology) to higher level perception and conceptual knowledge). My
hypothesis is that with a few weeks' training. I'll be able to
translate spectrographs (no background noise), like the one below into
their English phrases. (I bet my advisor $10 that I can do this.)
In learning to read spectrograms, I hope to gain insight on how one
might solve The Squiggly Line Problem. For example, I've come to the
realization that one uses both bottom up (discerning individual
phonemes) and top down (looking at the overall structure of a word)
recognition.
I would assume that a real time spectrogram might be useful for deaf
or hearing impaired people. (I know that if I were to suddenly lose my
hearing (but not my sight), I'd want to have one of these handy.)
Here's a sample spectrogram of me saying "Chicky check,
microphone check, chicky check-a, sibilance sibilance.".
You can also go the other way: Here is a sound.
and the spectrogram it produces:
Here is the Jave source code. This will give
you a real time spectrogram from your computer's microphone (assuming
it has one). For better performance, I also made a version that uses native code for the FFT.