Now showing items 1-8 of 8

  • Auto-encoding nearest neighbor i-vectors for speaker verification 

    Khan, Umair; India Massana, Miquel Àngel; Hernando Pericás, Francisco Javier (International Speech Communication Association (ISCA), 2019)
    Conference lecture
    Open Access
    In the last years, i-vectors followed by cosine or PLDA scoringtechniques were the state-of-the-art approach in speaker veri-fication. PLDA requires labeled background data, and thereexists a significant performance gap ...
  • I-vector transformation using k-nearest neighbors for speaker verification 

    Khan, Umair; India Massana, Miquel Àngel; Hernando Pericás, Francisco Javier (Institute of Electrical and Electronics Engineers (IEEE), 2020)
    Conference report
    Restricted access - publisher's policy
    Probabilistic Linear Discriminant Analysis (PLDA) is the most efficient backend for i-vectors. However, it requires labeled background data which can be difficult to access in practice. Unlike PLDA, cosine scoring avoids ...
  • LSTM neural network-based speaker segmentation using acoustic and language modelling 

    India Massana, Miquel Àngel; Rodríguez Fonollosa, José Adrián; Hernando Pericás, Francisco Javier (International Speech Communication Association (ISCA), 2017)
    Conference lecture
    Open Access
    This paper presents a new speaker change detection system based on Long Short-Term Memory (LSTM) neural networks using acoustic data and linguistic content. Language modelling is combined with two different ...
  • Self multi-head attention for speaker recognition 

    India Massana, Miquel Àngel; Safari, Pooyan; Hernando Pericás, Francisco Javier (International Speech Communication Association (ISCA), 2019)
    Conference lecture
    Open Access
    Most state-of-the-art Deep Learning (DL) approaches forspeaker recognition work on a short utterance level. Given thespeech signal, these algorithms extract a sequence of speakerembeddings from short segments and those are ...
  • Towards large scale multimedia indexing: a case study on person discovery in broadcast news 

    Le, Nam; Bredin, Herve; Sergent, Gabriel; India Massana, Miquel Àngel; López-Otero, Paula; Barras, Claude; Guinaudeau, Camille; Gravier, Guillaume; Barbosa da Fonseca, Gabriel; Lyon Freire, Izabela; Patrocinio Jr., Zenilton; Jamil F. Guimarães, Silvio; Martí Juan, Gerard; Morros Rubió, Josep Ramon; Hernando Pericás, Francisco Javier; Docio-Fernández, Laura; García-Mateo, Carmen; Meignier, Sylvain; Odobez, Jean-Marc (Association for Computing Machinery (ACM), 2017)
    Conference report
    Restricted access - publisher's policy
    The rapid growth of multimedia databases and the human interest in their peers make indices representing the location and identity of people in audio-visual documents essential for searching archives. Person discovery ...
  • UPC multimodal speaker diarization system for the 2018 Albayzin challenge 

    India Massana, Miquel Àngel; Sagastiberri, Itziar; Palau Puigdevall, Ponç; Sayrol Clols, Elisa; Morros Rubió, Josep Ramon; Hernando Pericás, Francisco Javier (International Speech Communication Association (ISCA), 2018)
    Conference report
    Open Access
    This paper presents the UPC system proposed for the Multimodal Speaker Diarization task of the 2018 Albayzin Challenge. This approach works by processing individually the speech and the image signal. In the speech domain, ...
  • UPC System for the 2015 MediaEval Multimodal Person Discovery in Broadcast TV Task 

    India Massana, Miquel Àngel (Universitat Politècnica de Catalunya, 2015-12-03)
    Master thesis (pre-Bologna period)
    Open Access
    This project verses about the system that UPC developed to participate in the Multimodal Person Discovery in Broadcast TV task in MediaEval 2015. The main objective of this task is to answer the two questions: Who speaks ...
  • UPC system for the 2016 MediaEval multimodal person discovery in broadcast TV task 

    India Massana, Miquel Àngel; Martí Juan, Gerard; Sayrol Clols, Elisa; Morros Rubió, Josep Ramon; Hernando Pericás, Francisco Javier; Cortillas, Carla; Bouritsas, Giorgos (CEUR-WS.org, 2016)
    Conference lecture
    Open Access
    The UPC system works by extracting monomodal signal segments (face tracks, speech segments) that overlap with the person names overlaid in the video signal. These segments are assigned directly with the name of the person ...