Show simple item record

dc.contributorNarayanan, Shrikanth
dc.contributor.authorHernanz Nogueras, Sergi
dc.descriptionProjecte final de carrera fet en col.laboració amb l'University of Southern California
dc.description.abstractAll along the current project, the speaker recognition is being reviewed. First simulations in this work use the latest ‘state of the art’ algorithms, and later new approaches and lots of modifications are used. Multimodality is the main idea to achieve better results. The new multimodal data supplied to the speaker recognition system will be articulatory features and video+voice source localization in the meeting room scenario. Some articulatory features have not been widely used for speech analysis so the correct extraction methods are still not developed. On the other hand, voice source and video spatial localization algorithms are known and only the integration methods have to be defined. Theoretical review and a study about integration will follow before finally selecting an algorithm. Machine learning techniques are applied to extract articulatory features, which perform a surprisingly right classification. The usability of those feature extractor outputs for the speaker recognition issue is not that clear, but very important conclusions are set about how the extraction process can affect the posterior usage and how other extraction methods could be approached. During the work, articulatory features demonstrate to be less affected by noise than the baseline MFCC+GMM approach, but the correct extraction methods are still not available. Even using the baseline extraction methods based on MLP, a classification is possible using the articulatory features, and complementarities with baseline methods are demonstrated. The improvement of the whole system adding articulatory features is very small, but demonstrates their usability. The whole process of the articulatory feature integration can surely be reviewed expecting successful results in the future. Due to an extended analysis of how noise poisons the speech features, very concrete conclusions are set about noise rejection and affection. By plotting how the system works against different SNR conditions, behaviors of some methods are explained. In low SNR conditions, very simple changes in the algorithms improve the overall performance, and reveal the lack of noiseoriented design of the baseline. The most of the methods approached in the current work were finally applied to the meeting room scenario at USC. An encouraging but small performance increase was achieved, and so the aim of the current work was considered realized. The trade-off between the spent effort and the small improvement is to be reviewed with further approaches and work.
dc.publisherUniversitat Politècnica de Catalunya
dc.rightsAttribution-NonCommercial-NoDerivs 3.0 Spain
dc.subjectÀrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic
dc.subject.lcshSpeech processing systems
dc.titleRobust feature extraction for multimodal speaker ID system – The experts’ room
dc.typeMaster thesis (pre-Bologna period)
dc.subject.lemacProcessament de la parla
dc.rights.accessOpen Access
dc.audience.educationlevelEstudis de primer/segon cicle
dc.audience.mediatorEscola Tècnica Superior d'Enginyeria de Telecomunicació de Barcelona
dc.audience.degreeENGINYERIA DE TELECOMUNICACIÓ (Pla 1992)

Files in this item


This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivs 3.0 Spain
Except where otherwise noted, content on this work is licensed under a Creative Commons license : Attribution-NonCommercial-NoDerivs 3.0 Spain