The demiphone: An efficient contextual subword unit for continuous speech recognition

Mariño Acebal, José Bernardo; Nogueiras Rodríguez, Albino; Pachès Leal, Pau; Bonafonte Cávez, Antonio

doi:10.1016/S0167-6393(00)00010-8

Visualitza/Obre

The demiphone An efficient contextual subword unit for continuous speech recognition.pdf (125,2Kb) (Accés restringit) Sol·licita una còpia a l'autor

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Mariño Acebal, José Bernardo

Nogueiras Rodríguez, Albino

Pachès Leal, Pau

Bonafonte Cávez, Antonio

Tipus de documentArticle

Data publicació2000-09

Condicions d'accésAccés restringit per política de l'editorial

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets

Abstract

In this paper, we introduce the demiphone as a context-dependent phonetic unit for continuous speech recognition. A phoneme is divided into two parts: a left demiphone that accounts for the left coarticulation and a right demiphone that copes with the right-hand side context. This unit discards the dependence between the effects of both side contexts, but it models the transition between phonemes as the triphone does. By concatenating a left demiphone and a right demiphone a triphone can be built, although the left and the right-context coarticulations are modeled independently. The main appeal of this unit stems from its reduced number (respect to the number of triphones) and its capability to model left and right contexts unseen together in the training material. Thus, the demiphone shares in a simple way the advantages of a smoothed parameter estimation with the ability of generalization. In the present work, the demiphone is motivated and experimentally supported. Furthermore, demiphones are compared with triphones smoothed and generalized by decision-tree state-tying, accepted as the most powerful tool for coarticulation modeling at the present state of the art. The main conclusion of our work is that the demiphone simplifies the recognition system and yields a better performance than the triphone, at least for small or moderate size databases. This result may be explained by the ability of the demiphone to provide an excellent trade-off between a detailed coarticulation modeling and a proper parameter estimation.

CitacióMariño, J. [et al.]. The demiphone: An efficient contextual subword unit for continuous speech recognition. "Speech communication", Setembre 2000, vol. 32, núm. 3, p. 187-197.

URIhttp://hdl.handle.net/2117/15678

DOI10.1016/S0167-6393(00)00010-8

ISSN0167-6393

Versió de l'editorhttp://www.sciencedirect.com/science/article/pii/S0167639300000108

Col·leccions

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
The demiphone A ... ous speech recognition.pdf		125,2Kb	PDF	Accés restringit

UPCommons. Portal del coneixement obert de la UPC

The demiphone: An efficient contextual subword unit for continuous speech recognition

Visualitza/Obre

Explora