L'àmbit de recerca del grup 'VEU' és el tractament de la parla. Investiguem tecnologies que permeten l'extracció d'informació que la veu conté: reconeixement del que es diu, l'idioma o el dialecte, característiques del parlant -qui és, la seva edat, el sexe, l'estat emocional-, la direcció del so. També treballem en la caracterització general de l'àudio, per determinar quan hi ha veu i quan hi ha altres esdeveniments acústics com música o sorolls diversos. Les tecnologies de la parla possibiliten generar veu -síntesis de veu- o modificar les seves

Recent Submissions

  • From bilingual to multilingual neural machine translation by incremental training 

    Escolano Peinado, Carlos; Ruiz Costa-Jussà, Marta; Rodríguez Fonollosa, José Adrián (Association for Computational Linguistics, 2019)
    Conference lecture
    Open Access
    Multilingual Neural Machine Translation approaches are based on the use of task specific models and the addition of one more language can only be done by retraining the whole system. In this work, we propose a new training ...
  • Multilingual, multi-scale and multi-layer visualization of sequence-based intermediate representations 

    Escolano Peinado, Carlos; Ruiz Costa-Jussà, Marta; Lacroux, Elora; Vázquez Alcocer, Pere Pau (Association for Computational Linguistics, 2019)
    Conference report
    Restricted access - publisher's policy
    The main alternatives nowadays to dealwith sequences are Recurrent Neural Net-works (RNN), Convolutional Neural Networks(CNN) architectures and the Transformer. Inthis context, RNN’s, CNN’s and Transformerhave most commonly ...
  • Time-domain speech enhancement using generative adversarial networks 

    Pascual de la Puente, Santiago; Serra, Joan; Bonafonte Cávez, Antonio (2019-11-01)
    Article
    Restricted access - publisher's policy
    Speech enhancement improves recorded voice utterances to eliminate noise that might be impeding their intelligibility or compromising their quality. Typical speech enhancement systems are based on regression approaches ...
  • Exploring efficient neural architectures for linguistic-acoustic mapping in text-to-speech 

    Pascual de la Puente, Santiago; Serra, Joan; Bonafonte Cávez, Antonio (Multidisciplinary Digital Publishing Institute, 2019-08-17)
    Article
    Open Access
    Conversion from text to speech relies on the accurate mapping from linguistic to acoustic symbol sequences, for which current practice employs recurrent statistical models such as recurrent neural networks. Despite the ...
  • Restricted Boltzmann machine vectors for speaker clustering and tracking tasks in TV broadcast shows 

    Khan, Umair; Safari, Pooyan; Hernando Pericás, Francisco Javier (Multidisciplinary Digital Publishing Institute, 2019-07-09)
    Article
    Open Access
    Restricted Boltzmann Machines (RBMs) have shown success in both the front-end and backend of speaker verification systems. In this paper, we propose applying RBMs to the front-end for the tasks of speaker clustering and ...
  • Self multi-head attention for speaker recognition 

    India Massana, Miquel Àngel; Safari, Pooyan; Hernando Pericás, Francisco Javier (International Speech Communication Association (ISCA), 2019)
    Conference lecture
    Open Access
    Most state-of-the-art Deep Learning (DL) approaches forspeaker recognition work on a short utterance level. Given thespeech signal, these algorithms extract a sequence of speakerembeddings from short segments and those are ...
  • Auto-encoding nearest neighbor i-vectors for speaker verification 

    Khan, Umair; India Massana, Miquel Àngel; Hernando Pericás, Francisco Javier (International Speech Communication Association (ISCA), 2019)
    Conference lecture
    Open Access
    In the last years, i-vectors followed by cosine or PLDA scoringtechniques were the state-of-the-art approach in speaker veri-fication. PLDA requires labeled background data, and thereexists a significant performance gap ...
  • The AXIOM project: IoT on heterogeneous embedded platforms 

    Filgueras Izquierdo, Antonio; Vidal, Miquel; Mateu, Marc; Jiménez González, Daniel; Álvarez Martínez, Carlos; Martorell Bofill, Xavier; Ayguadé Parra, Eduard; Theodoropoulos, Dimitris; Pnevmatikatos, Dionisis; Gai, Paolo; Garzarella, Stefano; Oro de Herta, David; Hernando Pericás, Francisco Javier; Bettin, Nicola; Pomella, Alberto; Giorgi, Roberto (Institute of Electrical and Electronics Engineers (IEEE), 2019-11-11)
    Article
    Open Access
    The AXIOM project aims at providing an environment for Cyber-Physical Systems. Smart Video Surveillance targets public environments, involving real-time face detection in crowds. Smart Home Living targets home environments ...
  • Extraction of the underlying structure of systematic risk from non-Gaussian multivariate financial time series using independent component analysis: Evidence from the Mexican stock exchange 

    Ladrón de Guevara Cortés, Rogelio; Torra Porras, Salvador; Monte Moreno, Enrique (2018-01-01)
    Article
    Open Access
    Regarding the problems related to multivariate non-Gaussianity of financial time series, i.e., unreliable results in extraction of underlying risk factors -via Principal Component Analysis or Factor Analysis-, we use ...
  • DNN speaker embeddings using autoencoder pre-training 

    Khan, Umair; Hernando Pericás, Francisco Javier (Institute of Electrical and Electronics Engineers (IEEE), 2019)
    Conference lecture
    Restricted access - publisher's policy
    Over the last years, i-vectors have been the state-of-the-art approach in speaker recognition. Recent improvements in deep learning have increased the discriminative quality of i-vectors. However, deep learning architectures ...
  • Conditional distribution variability measures for causality detection 

    Rodríguez Fonollosa, José Adrián (Springer, 2019)
    Part of book or chapter of book
    Restricted access - publisher's policy
    In this paper we derive variability measures for the conditional probability distributions of a pair of random variables, and we study its application in the inference of causal-effect relationships. We also study the ...
  • Electron density retrieval from truncated Radio Occultation GNSS data 

    Lyu, Haixia; Hernández Pajares, Manuel; Monte Moreno, Enrique; Cardellach Galí, Estel (2019-06-01)
    Article
    Open Access
    This paper summarizes the definition and validation of two complementary new strategies, to invert incomplete Global Navigation Satellite System Radio-Occultation (RO) ionospheric measurements, such as the ones to be ...

View more