Filtering the time sequences of spectral parameters for speech recognition
Tipus de documentArticle
Condicions d'accésAccés restringit per política de l'editorial
In automatic speech recognition, the signal is usually represented by a set of time sequences of spectral parameters (TSSPs) that model the temporal evolution of the spectral envelope frame-to-frame. Those sequences are then filtered either to make them more robust to environmental conditions or to compute differential parameters (dynamic features) which enhance discrimination. In this paper, we apply frequency analysis to TSSPs in order to provide an interpretation framework for the various types of parameter filters used so far. Thus, the analysis of the average long-term spectrum of the successfully filtered sequences reveals a combined effect of equalization and band selection that provides insights into TSSP filtering. Also, we show in the paper that, when supplementary differential parameters are not used, the recognition rate can be improved even for clean speech, just by properly filtering the TSSPs. To support this claim, a number of experimental results are presented, both using whole-word and subword based models. The empirically optimum filters attenuate the low-pass band and emphasize a higher band so that the peak of the average long-term spectrum of the output of these filters lies at around the average syllable rate of the employed database (˜3 Hz).
CitacióNadeu, C., Paches, P., Biing-Hwang, J. Filtering the time sequences of spectral parameters for speech recognition. "Speech communication", Setembre 1997, vol. 22, núm. 4, p. 315-332.
|Filtering the t ... for speech recognition.pdf||262.2Kb||Accés restringit|