On frequency averaging for spectral analysis in speech recognition
Document typeConference report
PublisherRobert H. Mannel and Jordi Robert-Ribes
Rights accessOpen Access
Many speech recognition systems use logarithmic filter-bank energies or a linear transformation of them to represent the speech signal. Usually, each of those energies is routinely computed as a weighted average of the periodogram samples that lie in the corresponding frequency band. In this work, we attempt to gain an insight into the statistical properties of the frequency-averaged periodogram (FAP) from which those energies are samples. Thus, we have shown that the FAP is statistically and asymptotically equivalent to a multiwindow estimator that arises from the Thomson’s optimization approach and uses orthogonal sinusoids as windows. The FAP and other multiwindow estimators are tested in a speech recognition application, observing the influence of several design factors. Particularly, a technique that is computationally simple like the FAP’s one, and which is equivalent to use multiple cosine windows, appears as an alternative to be taken into consideration
CitationNadeu, C., Galindo, F., Padrell, J. On frequency averaging for spectral analysis in speech recognition. A: International Conference on Spoken Language Processing. "ICSLP 98: the 5th International Conference on Spoken Language Processing; incorporating the 7th Australian International Speech Science and Technology Conference; Sydney Convention Centre, Sydney, Australia, 30th November-4th December 1998". Sidney: Robert H. Mannel and Jordi Robert-Ribes, 1998, p. 1071-1074.
ISBN1 876346 17 5