Deep learning backend for single and multisession i-vector speaker recognition

dc.contributor.authorGhahabi Esfahani, Omid
dc.contributor.authorHernando Pericás, Francisco Javier
dc.contributor.groupUniversitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions
dc.date.accessioned2017-05-10T14:28:54Z
dc.date.available2017-05-10T14:28:54Z
dc.date.issued2017-04-01
dc.description.abstractThe lack of labeled background data makes a big performance gap between cosine and Probabilistic Linear Discriminant Analysis (PLDA) scoring baseline techniques for i-vectors in speaker recognition. Although there are some unsupervised clustering techniques to estimate the labels, they cannot accurately predict the true labels and they also assume that there are several samples from the same speaker in the background data that could not be true in reality. In this paper, the authors make use of Deep Learning (DL) to fill this performance gap given unlabeled background data. To this goal, the authors have proposed an impostor selection algorithm and a universal model adaptation process in a hybrid system based on deep belief networks and deep neural networks to discriminatively model each target speaker. In order to have more insight into the behavior of DL techniques in both single- and multisession speaker enrollment tasks, some experiments have been carried out in this paper in both scenarios. Experiments on National Institute of Standards and Technology 2014 i-vector challenge show that 46% of this performance gap, in terms of minimum of the decision cost function, is filled by the proposed DL-based system. Furthermore, the score combination of the proposed DL-based system and PLDA with estimated labels covers 79% of this gap.
dc.description.peerreviewedPeer Reviewed
dc.description.versionPostprint (published version)
dc.format.extent11 p.
dc.identifier.citationGhahabi, O., Hernando, J. Deep learning backend for single and multisession i-vector speaker recognition. "IEEE-ACM Transactions on Audio Speech and Language Processing", 1 Abril 2017, vol. 25, núm. 4, p. 807-817.
dc.identifier.doi10.1109/TASLP.2017.2661705
dc.identifier.issn2329-9290
dc.identifier.urihttps://hdl.handle.net/2117/104282
dc.language.isoeng
dc.relation.publisherversionhttp://ieeexplore.ieee.org/document/7847321/?reload=true
dc.rights.accessOpen Access
dc.subjectÀrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic
dc.subject.lcshAutomatic speech recognition
dc.subject.lemacReconeixement automàtic de la parla
dc.subject.otherDeep learning
dc.subject.otherDeep neural network
dc.subject.otherDeep belief network
dc.subject.otherI-vector
dc.subject.otherspeaker recognition
dc.titleDeep learning backend for single and multisession i-vector speaker recognition
dc.typeArticle
dspace.entity.typePublication
local.citation.authorGhahabi, O.; Hernando, J.
local.citation.endingPage817
local.citation.number4
local.citation.publicationNameIEEE-ACM Transactions on Audio Speech and Language Processing
local.citation.startingPage807
local.citation.volume25
local.identifier.drac20329220

Fitxers

Paquet original

Mostrant 1 - 1 de 1
Carregant...
Miniatura
Nom:
07847321.pdf
Mida:
1.12 MB
Format:
Adobe Portable Document Format
Descripció:
Versió publicada pel l'editor. En accés obert a IEEE