|
E-prints UPC >
Altres >
Enviament des de DRAC >
Empreu aquest identificador per citar o enllaçar aquest ítem:
http://hdl.handle.net/2117/15678
|
Ítem no disponible en accés obert per política de l'editorial
| Arxiu |
Descripció |
Mida | Format |
| The demiphone An efficient contextual subword unit for continuous speech recognition.pdf | | 125.28 kB | Adobe PDF |  |
|
| Citació: | Mariño, J. [et al.]. The demiphone: An efficient contextual subword unit for continuous speech recognition. "Speech communication", Setembre 2000, vol. 32, núm. 3, p. 187-197. |
| Títol: | The demiphone: An efficient contextual subword unit for continuous speech recognition |
| Autor: | Mariño Acebal, José Bernardo ; Nogueiras Rodríguez, Albino ; Pachés-Leal, Pau; Bonafonte Cávez, Antonio  |
| Data: | set-2000 |
| Tipus de document: | Article |
| Resum: | In this paper, we introduce the demiphone as a context-dependent phonetic unit for continuous speech recognition. A phoneme is divided into two parts: a left demiphone that accounts for the left coarticulation and a right demiphone that copes with the right-hand side context. This unit discards the dependence between the effects of both side contexts, but it models the transition between phonemes as the triphone does. By concatenating a left demiphone and a right demiphone a triphone can be built, although the left and the right-context coarticulations are modeled independently. The main appeal of this unit stems from its reduced number (respect to the number of triphones) and its capability to model left and right contexts unseen together in the training material. Thus, the demiphone shares in a simple way the advantages of a smoothed parameter estimation with the ability of generalization. In the present work, the demiphone is motivated and experimentally supported. Furthermore, demiphones are compared with triphones smoothed and generalized by decision-tree state-tying, accepted as the most powerful tool for coarticulation modeling at the present state of the art. The main conclusion of our work is that the demiphone simplifies the recognition system and yields a better performance than the triphone, at least for small or moderate size databases. This result may be explained by the ability of the demiphone to provide an excellent trade-off between a detailed coarticulation modeling and a proper parameter estimation. |
| ISSN: | 0167-6393 |
| URI: | http://hdl.handle.net/2117/15678 |
| Versió de l'editor: | 10.1016/S0167-6393(00)00010-8 |
| Versió de l'editor: | http://www.sciencedirect.com/science/article/pii/S0167639300000108 |
| Apareix a les col·leccions: | Altres. Enviament des de DRAC Departament de Teoria del Senyal i Comunicacions. Articles de revista VEU - Grup de Tractament de la Parla. Articles de revista
|
| Comparteix: |
|
Queda prohibida la reproducció, transformació, distribució i comunicació pública d'aquesta obra. Es permet, en tot cas, la reproducció per a ús privat sempre i quan la còpia que se'n faci no sigui objecte d'utilització col·lectiva ni lucrativa (art. 31.2 del Reial Decret Legislatiu 1/1996, de 12 d'abril, pel qual s'aprova el Text Refós de la Llei de Propietat Intel·lectual, http://bibliotecnica.upc.es/sepi/legislacio.asp).
Per a qualsevol ús que es vulgui fer diferent al permès, dirigiu-vos a: sepi@upc.edu
|