Filtering the time sequences of spectral parameters for speech recognition
Visualitza/Obre
Filtering the time sequences of spectral parameters for speech recognition.pdf (262,2Kb) (Accés restringit)
Sol·licita una còpia a l'autor
Què és aquest botó?
Aquest botó permet demanar una còpia d'un document restringit a l'autor. Es mostra quan:
- Disposem del correu electrònic de l'autor
- El document té una mida inferior a 20 Mb
- Es tracta d'un document d'accés restringit per decisió de l'autor o d'un document d'accés restringit per política de l'editorial
10.1016/S0167-6393(97)00030-7
Inclou dades d'ús des de 2022
Cita com:
hdl:2117/97902
Tipus de documentArticle
Data publicació1997-09
Condicions d'accésAccés restringit per política de l'editorial
Llevat que s'hi indiqui el contrari, els
continguts d'aquesta obra estan subjectes a la llicència de Creative Commons
:
Reconeixement-NoComercial-SenseObraDerivada 3.0 Espanya
Abstract
In automatic speech recognition, the signal is usually represented by a set of time sequences of spectral parameters (TSSPs) that model the temporal evolution of the spectral envelope frame-to-frame. Those sequences are then filtered either to make them more robust to environmental conditions or to compute differential parameters (dynamic features) which enhance discrimination. In this paper, we apply frequency analysis to TSSPs in order to provide an interpretation framework for the various types of parameter filters used so far. Thus, the analysis of the average long-term spectrum of the successfully filtered sequences reveals a combined effect of equalization and band selection that provides insights into TSSP filtering. Also, we show in the paper that, when supplementary differential parameters are not used, the recognition rate can be improved even for clean speech, just by properly filtering the TSSPs. To support this claim, a number of experimental results are presented, both using whole-word and subword based models. The empirically optimum filters attenuate the low-pass band and emphasize a higher band so that the peak of the average long-term spectrum of the output of these filters lies at around the average syllable rate of the employed database (˜3 Hz).
CitacióNadeu, C., Paches, P., Biing-Hwang, J. Filtering the time sequences of spectral parameters for speech recognition. "Speech communication", Setembre 1997, vol. 22, núm. 4, p. 315-332.
ISSN0167-6393
Fitxers | Descripció | Mida | Format | Visualitza |
---|---|---|---|---|
Filtering the t ... for speech recognition.pdf | 262,2Kb | Accés restringit |