Robust feature extraction for multimodal speaker ID system – The experts’ room

Hernanz Nogueras, Sergi

dc.contributor	Narayanan, Shrikanth
dc.contributor.author	Hernanz Nogueras, Sergi
dc.date.accessioned	2010-01-18T10:33:02Z
dc.date.available	2010-01-18T10:33:02Z
dc.date.issued	2005-05
dc.identifier.uri	http://hdl.handle.net/2099.1/8362
dc.description	Projecte final de carrera fet en col.laboració amb l'University of Southern California
dc.description.abstract	All along the current project, the speaker recognition is being reviewed. First simulations in this work use the latest ‘state of the art’ algorithms, and later new approaches and lots of modifications are used. Multimodality is the main idea to achieve better results. The new multimodal data supplied to the speaker recognition system will be articulatory features and video+voice source localization in the meeting room scenario. Some articulatory features have not been widely used for speech analysis so the correct extraction methods are still not developed. On the other hand, voice source and video spatial localization algorithms are known and only the integration methods have to be defined. Theoretical review and a study about integration will follow before finally selecting an algorithm. Machine learning techniques are applied to extract articulatory features, which perform a surprisingly right classification. The usability of those feature extractor outputs for the speaker recognition issue is not that clear, but very important conclusions are set about how the extraction process can affect the posterior usage and how other extraction methods could be approached. During the work, articulatory features demonstrate to be less affected by noise than the baseline MFCC+GMM approach, but the correct extraction methods are still not available. Even using the baseline extraction methods based on MLP, a classification is possible using the articulatory features, and complementarities with baseline methods are demonstrated. The improvement of the whole system adding articulatory features is very small, but demonstrates their usability. The whole process of the articulatory feature integration can surely be reviewed expecting successful results in the future. Due to an extended analysis of how noise poisons the speech features, very concrete conclusions are set about noise rejection and affection. By plotting how the system works against different SNR conditions, behaviors of some methods are explained. In low SNR conditions, very simple changes in the algorithms improve the overall performance, and reveal the lack of noiseoriented design of the baseline. The most of the methods approached in the current work were finally applied to the meeting room scenario at USC. An encouraging but small performance increase was achieved, and so the aim of the current work was considered realized. The trade-off between the spent effort and the small improvement is to be reviewed with further approaches and work.
dc.language.iso	eng
dc.publisher	Universitat Politècnica de Catalunya
dc.rights	Attribution-NonCommercial-NoDerivs 3.0 Spain
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/es/
dc.subject	Àrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic
dc.subject.lcsh	Speech processing systems
dc.title	Robust feature extraction for multimodal speaker ID system – The experts’ room
dc.type	Master thesis (pre-Bologna period)
dc.subject.lemac	Processament de la parla
dc.rights.access	Open Access
dc.audience.educationlevel	Estudis de primer/segon cicle
dc.audience.mediator	Escola Tècnica Superior d'Enginyeria de Telecomunicació de Barcelona
dc.audience.degree	ENGINYERIA DE TELECOMUNICACIÓ (Pla 1992)

Fitxers d'aquest items

Nom:: MEMORIA-25-08-09-FINAL.pdf
Mida:: 1,356Mb
Format:: PDF

Visualitza/Obre

Aquest ítem apareix a les col·leccions següents

Enginyeria de Telecomunicació (Pla 1992) [1.590]

Mostra el registre d'ítem simple

UPCommons. Portal del coneixement obert de la UPC

Robust feature extraction for multimodal speaker ID system – The experts’ room

Fitxers d'aquest items

Aquest ítem apareix a les col·leccions següents

Explora