Modelling the effects of spontaneous speech in speech recognition

Shulz, Henrik; Rodríguez Fonollosa, José Adrián

dc.contributor.author	Shulz, Henrik
dc.contributor.author	Rodríguez Fonollosa, José Adrián
dc.contributor.other	Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions
dc.date.accessioned	2013-09-25T13:06:43Z
dc.date.available	2013-09-25T13:06:43Z
dc.date.created	2013
dc.date.issued	2013
dc.identifier.citation	Shulz, H.; Fonollosa, José A. R. Modelling the effects of spontaneous speech in speech recognition. A: Speech Processing Conference. "2013 Speech Processing Conference: conference proceedings: July 1-2, 2013: AFEKA, Tel-Aviv Academic College of Engineering". Tel-Aviv: 2013.
dc.identifier.uri	http://hdl.handle.net/2117/20204
dc.description.abstract	Intrinsic variability of the speaker in spontaneous speech remains a challenge to state of the art Automatic speech recognition (ASR). While planned speech exhibits a moderate variability, the significant variability of spontaneous speech is caused by situation, context, intention, emotion and listeners. This conditioning of speech is observable in terms of speaking rate and in feature space. We analysed broadcast news (BN) and broadcast conversational (BC) speech in terms of phoneme rate (PR) and feature space reduction (FSR), and contrasted both with the planned speech data. Strong statistically significant differences were revealed. We cluster the speech segments with respect to their degree of PR and FSR forming a set of variability classes, and induce the variability classes into the Hidden-Markov-Model (HMM) based acoustic model (AM). In recognition we follow two approaches: the first considers the variability class as context variable, the second relies on prior estimation of the variability class after the first pass of a multi-pass recognition system. Beside explicit modelling of the intrinsic speech variability of the speaker, we furthermore segregate the general speaker specific characteristics by means of speaker adaptive training (SAT) into feature space transforms using ConstrainedMaximumLikelihood Linear Regression (CMLLR), and apply the adaptive approach in third pass recognition. By approaching to model both within speaker variation and between speaker variation in spontaneous speech, we address two fundamental sources of speech variability that determine the performance of ASR systems.
dc.language.iso	eng
dc.rights	Attribution-NonCommercial-NoDerivs 3.0 Spain
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/es/
dc.subject	Àrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic
dc.subject.lcsh	Automatic speech recognition
dc.title	Modelling the effects of spontaneous speech in speech recognition
dc.type	Conference report
dc.subject.lemac	Reconeixement automàtic de la parla
dc.contributor.group	Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla
dc.description.peerreviewed	Peer Reviewed
dc.relation.publisherversion	https://events.eventact.com/afeka/aclp2012/Modelling%20the%20Effects%20of%20Spontaneous%20Speech%20in%20Speech%20Recognition_Schulz%20et%20al.pdf
dc.rights.access	Open Access
local.identifier.drac	12674265
dc.description.version	Postprint (published version)
local.citation.author	Shulz, H.; Fonollosa, José A. R.
local.citation.contributor	Speech Processing Conference
local.citation.pubplace	Tel-Aviv
local.citation.publicationName	2013 Speech Processing Conference: conference proceedings: July 1-2, 2013: AFEKA, Tel-Aviv Academic College of Engineering

Fitxers d'aquest items

Nom:: Modelling the Effects of Spont ...
Mida:: 171,4Kb
Format:: PDF
Descripció:: article

Visualitza/Obre

Aquest ítem apareix a les col·leccions següents

Ponències/Comunicacions de congressos [437]
Ponències/Comunicacions de congressos [3.332]

Mostra el registre d'ítem simple

UPCommons. Portal del coneixement obert de la UPC

Modelling the effects of spontaneous speech in speech recognition

Fitxers d'aquest items

Aquest ítem apareix a les col·leccions següents

Explora