Logo   Information, Signal, Images et ViSion C.N.R.S.   GdR   M.E.S.R.

An example

Let us have a look at the result obtained by applying the STFT on a speech signal. The signal we consider is a speech signal containing the word 'GABOR', recorded on 338points with a sampling frequency of 1kHz (with respect to the Shannon criterion) (see fig. 3.3).
     >> load gabor
     >> time=0:337; subplot(211); plot(time,gabor); 
     >> dsp=fftshift(abs(fft(gabor)).^2);
     >> freq=(-169:168)/338*1000; subplot(212); plot(freq,dsp);
Figure 3.3: Speech signal corresponding to the word 'GABOR'. Time signal (first plot) and its energy spectral density (second plot)
\begin{figure}
\epsfxsize =10cm\epsfysize =6cm
\centerline{\epsfbox{figure/at1fig3.eps}}\end{figure}
We can not say from this representation what part of the word is responsible for that peak around 140Hz.

Now if we look at the squared modulus of the STFT of this signal, using a hamming analysis window of 85points, we can see some interesting features (the time-frequency matrix is loaded from the MAT-file because it takes a long time to be calculated ; we represent only the frequency domain where the signal is present) (see fig. 3.4):

     >> contour(time,(0:127)/256*1000,tfr); grid;
     >> xlabel('Time [ms]'); ylabel('Frequency [Hz]'); 
     >> title('Squared modulus of the STFT of the word GABOR');
Figure 3.4: Speech signal analyzed in the time-frequency plane
\begin{figure}
\epsfxsize =10cm\epsfysize =8cm
\centerline{\epsfbox{figure/at1fig4.eps}}\end{figure}
The first pattern in the time-frequency plane, located between 30ms and 60ms, and centered around 150Hz, corresponds to the first syllable 'GA'. The second pattern, located between 150ms and 250ms, corresponds to the last syllable 'BOR', and we can see that its mean frequency is decreasing from 140Hz to 110Hz with time. Harmonics corresponding to these two fundamental signals are also present at higher frequencies, but with a lower amplitude.

Eric Chassande-Mottin 2005-10-26

© GdR ISIS - Contact