Now showing items 1-10 of 10

    • Auto-encoding nearest neighbor i-vectors for speaker verification 

      Khan, Umair; India Massana, Miquel Àngel; Hernando Pericás, Francisco Javier (International Speech Communication Association (ISCA), 2019)
      Conference lecture
      Open Access
      In the last years, i-vectors followed by cosine or PLDA scoringtechniques were the state-of-the-art approach in speaker veri-fication. PLDA requires labeled background data, and thereexists a significant performance gap ...
    • Double multi-head attention for speaker verification 

      India Massana, Miquel Àngel; Safari, Pooyan; Hernando Pericás, Francisco Javier (Institute of Electrical and Electronics Engineers (IEEE), 2021)
      Conference report
      Open Access
      Most state-of-the-art Deep Learning systems for text-independent speaker verification are based on speaker embedding extractors. These architectures are commonly composed of a feature extractor front-end together with a ...
    • I-vector transformation using k-nearest neighbors for speaker verification 

      Khan, Umair; India Massana, Miquel Àngel; Hernando Pericás, Francisco Javier (Institute of Electrical and Electronics Engineers (IEEE), 2020)
      Conference report
      Restricted access - publisher's policy
      Probabilistic Linear Discriminant Analysis (PLDA) is the most efficient backend for i-vectors. However, it requires labeled background data which can be difficult to access in practice. Unlike PLDA, cosine scoring avoids ...
    • LSTM neural network-based speaker segmentation using acoustic and language modelling 

      India Massana, Miquel Àngel; Rodríguez Fonollosa, José Adrián; Hernando Pericás, Francisco Javier (International Speech Communication Association (ISCA), 2017)
      Conference lecture
      Open Access
      This paper presents a new speaker change detection system based on Long Short-Term Memory (LSTM) neural networks using acoustic data and linguistic content. Language modelling is combined with two different ...
    • Self multi-head attention for speaker recognition 

      India Massana, Miquel Àngel; Safari, Pooyan; Hernando Pericás, Francisco Javier (International Speech Communication Association (ISCA), 2019)
      Conference lecture
      Open Access
      Most state-of-the-art Deep Learning (DL) approaches forspeaker recognition work on a short utterance level. Given thespeech signal, these algorithms extract a sequence of speakerembeddings from short segments and those are ...
    • Self-attention encoding and pooling for speaker recognition 

      Safari, Pooyan; India Massana, Miquel Àngel; Hernando Pericás, Francisco Javier (International Speech Communication Association (ISCA), 2020)
      Conference report
      Open Access
      The computing power of mobile devices limits the end-user applications in terms of storage size, processing, memory and energy consumption. These limitations motivate researchers for the design of more efficient deep models. ...
    • Towards large scale multimedia indexing: a case study on person discovery in broadcast news 

      Le, Nam; Bredin, Herve; Sergent, Gabriel; India Massana, Miquel Àngel; López-Otero, Paula; Barras, Claude; Guinaudeau, Camille; Gravier, Guillaume; Barbosa da Fonseca, Gabriel; Lyon Freire, Izabela; Patrocinio Jr., Zenilton; Jamil F. Guimarães, Silvio; Martí Juan, Gerard; Morros Rubió, Josep Ramon; Hernando Pericás, Francisco Javier; Docio-Fernández, Laura; García-Mateo, Carmen; Meignier, Sylvain; Odobez, Jean-Marc (Association for Computing Machinery (ACM), 2017)
      Conference report
      Restricted access - publisher's policy
      The rapid growth of multimedia databases and the human interest in their peers make indices representing the location and identity of people in audio-visual documents essential for searching archives. Person discovery ...
    • UPC multimodal speaker diarization system for the 2018 Albayzin challenge 

      India Massana, Miquel Àngel; Sagastiberri, Itziar; Palau Puigdevall, Ponç; Sayrol Clols, Elisa; Morros Rubió, Josep Ramon; Hernando Pericás, Francisco Javier (International Speech Communication Association (ISCA), 2018)
      Conference report
      Open Access
      This paper presents the UPC system proposed for the Multimodal Speaker Diarization task of the 2018 Albayzin Challenge. This approach works by processing individually the speech and the image signal. In the speech domain, ...
    • UPC System for the 2015 MediaEval Multimodal Person Discovery in Broadcast TV Task 

      India Massana, Miquel Àngel (Universitat Politècnica de Catalunya, 2015-12-03)
      Master thesis (pre-Bologna period)
      Open Access
      This project verses about the system that UPC developed to participate in the Multimodal Person Discovery in Broadcast TV task in MediaEval 2015. The main objective of this task is to answer the two questions: Who speaks ...
    • UPC system for the 2016 MediaEval multimodal person discovery in broadcast TV task 

      India Massana, Miquel Àngel; Martí Juan, Gerard; Sayrol Clols, Elisa; Morros Rubió, Josep Ramon; Hernando Pericás, Francisco Javier; Cortillas, Carla; Bouritsas, Giorgos (CEUR-WS.org, 2016)
      Conference lecture
      Open Access
      The UPC system works by extracting monomodal signal segments (face tracks, speech segments) that overlap with the person names overlaid in the video signal. These segments are assigned directly with the name of the person ...