Comparing the Effectiveness/Robustness of Gammatone and LP Methods with the direct use of FFT

  • 1 Polytechnic University of Tirana Faculty of Mathematical and Physical Engineering, Tirane, Albania


In this paper we evaluate the growth of Automatic Speech Recognition systems in respect to the various forms of spectral analysis ways used. A straightforward analysis of platter and Gammatone filter banks used for spectral analysis compared with the direct use of FFT spectral values is taken into account. This analysis was supported understanding the effectiveness of existing Automatic Speech Recognition systems that are specifically targeted on platter and Gammatone filter banks compared with FFT spectral values. We discover that warping the FFT spectrum directly, instead of using filter bank averaging, provides an additional precise approximation to the sensory activity scales. Direct use of FFT spectral values are even as effective as using either Gammatone or Linear Prediction filter banks, as long as the feature extracted from the FFT spectral values takes into consideration a Gammatone or platter like frequency scale. Computing speech signals using FFT or filter bank spectral features and utilizing a method supported by a sliding block of spectral features, is shown to be simpler in terms of ASR accuracy.



  1. R.Klevansand R.Rodman, “Voice Recognition, Artech House, Boston, London 1997.
  2. Samudravijaya K. Speech and Speaker recognition tutorial TIFR Mumbai 400005.
  3. Silveira, M. A., Schroeder, C. P., Paulo, J., Lustosa da Costa, C., De Oliveira, C. D.,
  4. Trentin, E., Gori, M. (2001) A Survey of Hybrid ANN/HMM Models for Automatic Speech Recognition, Neurocomputing 37(1), pp. 91-126.
  5. L. R. Rabiner and B. H. Juang,Fundamentals of Speech Recognition. Englewood Cliffs, New Jersy: Prentice-Hall, 1993.
  6. L. R. Rabiner,Digital Signal Processing. IEEE Press, 1972.
  7. B. S. Atal and S. L. Hanauer, “Speech analys is and synthesis by linear prediction of the speech wave,”J. Acoust. Soc. Am., vol. 50, pp. 637–655, Aug. 1971.
  8. J. Makhoul, “Linear pred iction: A tutorial review,”Proc. IEEE, vol. 63, pp. 561–580, Apr. 1975.
  9. B. S. Atal and M. R. Schroeder, “Linear prediction analysis of speech based on a pole-zero representation,”J. Acoust. Soc. Am., vol. 64, no. 5, pp. 1310–1318, 1978.
  10. Alam, M. J., Kenny, P. Dumouchel, P., O‟Shaughnessy, D. (2014) Robust speech recognition using warped DFT based cepstral features in clean and multistyle training. IEEE.
  11. Piccone, J. (1992) Signal Modelling Techniques in Speech Recognition.

Article full text

Download PDF