Ara es mostren els items 1-13 de 13

    • Auto-encoding nearest neighbor i-vectors for speaker verification 

      Khan, Umair; India Massana, Miquel Àngel; Hernando Pericás, Francisco Javier (International Speech Communication Association (ISCA), 2019)
      Comunicació de congrés
      Accés obert
      In the last years, i-vectors followed by cosine or PLDA scoringtechniques were the state-of-the-art approach in speaker veri-fication. PLDA requires labeled background data, and thereexists a significant performance gap ...
    • Double multi-head attention for speaker verification 

      India Massana, Miquel Àngel; Safari, Pooyan; Hernando Pericás, Francisco Javier (Institute of Electrical and Electronics Engineers (IEEE), 2021)
      Text en actes de congrés
      Accés obert
      Most state-of-the-art Deep Learning systems for text-independent speaker verification are based on speaker embedding extractors. These architectures are commonly composed of a feature extractor front-end together with a ...
    • I-vector transformation using k-nearest neighbors for speaker verification 

      Khan, Umair; India Massana, Miquel Àngel; Hernando Pericás, Francisco Javier (Institute of Electrical and Electronics Engineers (IEEE), 2020)
      Text en actes de congrés
      Accés restringit per política de l'editorial
      Probabilistic Linear Discriminant Analysis (PLDA) is the most efficient backend for i-vectors. However, it requires labeled background data which can be difficult to access in practice. Unlike PLDA, cosine scoring avoids ...
    • Language modelling for speaker diarization in telephonic interviews 

      India Massana, Miquel Àngel; Hernando Pericás, Francisco Javier; Rodríguez Fonollosa, José Adrián (Elsevier, 2023-03)
      Article
      Accés obert
      The aim of this paper is to investigate the benefit of combining both language and acoustic modelling for speaker diarization. Although conventional systems only use acoustic features, in some scenarios linguistic data ...
    • LSTM neural network-based speaker segmentation using acoustic and language modelling 

      India Massana, Miquel Àngel; Rodríguez Fonollosa, José Adrián; Hernando Pericás, Francisco Javier (International Speech Communication Association (ISCA), 2017)
      Comunicació de congrés
      Accés obert
      This paper presents a new speaker change detection system based on Long Short-Term Memory (LSTM) neural networks using acoustic data and linguistic content. Language modelling is combined with two different ...
    • Self attention networks in speaker recognition 

      Safari, Pooyan; India Massana, Miquel Àngel; Hernando Pericás, Francisco Javier (Multidisciplinary Digital Publishing Institute, 2023-05-24)
      Article
      Accés obert
      Recently, there has been a significant surge of interest in Self-Attention Networks (SANs) based on the Transformer architecture. This can be attributed to their notable ability for parallelization and their impressive ...
    • Self multi-head attention for speaker recognition 

      India Massana, Miquel Àngel; Safari, Pooyan; Hernando Pericás, Francisco Javier (International Speech Communication Association (ISCA), 2019)
      Comunicació de congrés
      Accés obert
      Most state-of-the-art Deep Learning (DL) approaches forspeaker recognition work on a short utterance level. Given thespeech signal, these algorithms extract a sequence of speakerembeddings from short segments and those are ...
    • Self-attention encoding and pooling for speaker recognition 

      Safari, Pooyan; India Massana, Miquel Àngel; Hernando Pericás, Francisco Javier (International Speech Communication Association (ISCA), 2020)
      Text en actes de congrés
      Accés obert
      The computing power of mobile devices limits the end-user applications in terms of storage size, processing, memory and energy consumption. These limitations motivate researchers for the design of more efficient deep models. ...
    • Speaker characterization by means of attention pooling 

      Costa, Federico; India Massana, Miquel Àngel; Hernando Pericás, Francisco Javier (International Speech Communication Association (ISCA), 2022)
      Comunicació de congrés
      Accés obert
      State-of-the-art Deep Learning systems for speaker verification are commonly based on speaker embedding extractors. These architectures are usually composed of a feature extractor front-end together with a pooling layer ...
    • Towards large scale multimedia indexing: a case study on person discovery in broadcast news 

      Le, Nam; Bredin, Herve; Sergent, Gabriel; India Massana, Miquel Àngel; López-Otero, Paula; Barras, Claude; Guinaudeau, Camille; Gravier, Guillaume; Barbosa da Fonseca, Gabriel; Lyon Freire, Izabela; Patrocinio Jr., Zenilton; Jamil F. Guimarães, Silvio; Martí Juan, Gerard; Morros Rubió, Josep Ramon; Hernando Pericás, Francisco Javier; Docio-Fernández, Laura; García-Mateo, Carmen; Meignier, Sylvain; Odobez, Jean-Marc (Association for Computing Machinery (ACM), 2017)
      Text en actes de congrés
      Accés restringit per política de l'editorial
      The rapid growth of multimedia databases and the human interest in their peers make indices representing the location and identity of people in audio-visual documents essential for searching archives. Person discovery ...
    • UPC multimodal speaker diarization system for the 2018 Albayzin challenge 

      India Massana, Miquel Àngel; Sagastiberri, Itziar; Palau Puigdevall, Ponç; Sayrol Clols, Elisa; Morros Rubió, Josep Ramon; Hernando Pericás, Francisco Javier (International Speech Communication Association (ISCA), 2018)
      Text en actes de congrés
      Accés obert
      This paper presents the UPC system proposed for the Multimodal Speaker Diarization task of the 2018 Albayzin Challenge. This approach works by processing individually the speech and the image signal. In the speech domain, ...
    • UPC System for the 2015 MediaEval Multimodal Person Discovery in Broadcast TV Task 

      India Massana, Miquel Àngel (Universitat Politècnica de Catalunya, 2015-12-03)
      Projecte/Treball Final de Carrera
      Accés obert
      This project verses about the system that UPC developed to participate in the Multimodal Person Discovery in Broadcast TV task in MediaEval 2015. The main objective of this task is to answer the two questions: Who speaks ...
    • UPC system for the 2016 MediaEval multimodal person discovery in broadcast TV task 

      India Massana, Miquel Àngel; Martí Juan, Gerard; Sayrol Clols, Elisa; Morros Rubió, Josep Ramon; Hernando Pericás, Francisco Javier; Cortillas, Carla; Bouritsas, Giorgos (CEUR-WS.org, 2016)
      Comunicació de congrés
      Accés obert
      The UPC system works by extracting monomodal signal segments (face tracks, speech segments) that overlap with the person names overlaid in the video signal. These segments are assigned directly with the name of the person ...