L'àmbit de recerca del grup 'VEU' és el tractament de la parla. Investiguem tecnologies que permeten l'extracció d'informació que la veu conté: reconeixement del que es diu, l'idioma o el dialecte, característiques del parlant -qui és, la seva edat, el sexe, l'estat emocional-, la direcció del so. També treballem en la caracterització general de l'àudio, per determinar quan hi ha veu i quan hi ha altres esdeveniments acústics com música o sorolls diversos. Les tecnologies de la parla possibiliten generar veu -síntesis de veu- o modificar les seves

Recent Submissions

  • Corpus for cyberbullying prevention 

    Moreno Bilbao, M. Asunción; Bonafonte Cávez, Antonio; Jauk, Igor; Tarrés, Laia; Pereira, Victor (International Speech Communication Association (ISCA), 2018)
    Conference report
    Open Access
    Cyberbullying is the use of digital media to harass a person or group of people, through personal attacks, disclosure of confidential or false information, among other means. That is to say, it ...
  • Incorporation of acoustic sensors in the regulation of a mobile robot 

    Luna Aguilar, Christian Alejandro; Morales Diaz, América; Castelán, Mario; Nadeu Camprubí, Climent (2019-01-17)
    Article
    Restricted access - publisher's policy
    This article introduces the incorporation of acoustic sensors for the localization of a mobile robot. The robot is considered as a sound source and its position is located applying a Time Delay of Arrival (TDOA) method. ...
  • UPC multimodal speaker diarization system for the 2018 Albayzin challenge 

    India Massana, Miquel Àngel; Sagastiberri, Itziar; Palau Puigdevall, Ponç; Sayrol Clols, Elisa; Morros Rubió, Josep Ramon; Hernando Pericás, Francisco Javier (International Speech Communication Association (ISCA), 2018)
    Conference report
    Open Access
    This paper presents the UPC system proposed for the Multimodal Speaker Diarization task of the 2018 Albayzin Challenge. This approach works by processing individually the speech and the image signal. In the speech domain, ...
  • Restricted Boltzmann Machine vectors for speaker clustering 

    Khan, Umair; Safari, Pooyan; Hernando Pericás, Francisco Javier (International Speech Communication Association (ISCA), 2018)
    Conference lecture
    Open Access
    Restricted Boltzmann Machines (RBMs) have been used both in the front-end and backend of speaker verification systems. In this work, we apply RBMs as a front-end in the context of speaker clustering. Speakers' utterances ...
  • Knowledge sharing in the health scenario 

    LLuch Ariet, Magi; Brugues de la Torre, Albert; Vallverdú Bayés, Sisco; Pegueroles Vallés, Josep R. (2014-11-28)
    Article
    Open Access
    The understanding of certain data often requires the collection of similar data from different places to be analysed and interpreted. Interoperability standards and ontologies, are facilitating data interchange around the ...
  • End-to-end speech translation with the transformer 

    Cross Vila, Laura; Escolano Peinado, Carlos; Rodríguez Fonollosa, José Adrián; Ruiz Costa-Jussà, Marta (Antonio Bonafonte, Jordi Luque and Francesc Alías Pujol, 2018)
    Conference lecture
    Restricted access - publisher's policy
    Speech Translation has been traditionally addressed with the concatenation of two tasks: Speech Recognition and Machine Translation. This approach has the main drawback that errors are concatenated. Recently, neural ...
  • The TALP-UPC machine translation systems for WMT18 news translation shared task 

    Casas, Noe; Escolano Peinado, Carlos; Ruiz Costa-Jussà, Marta; Rodríguez Fonollosa, José Adrián (Association for Computational Linguistics, 2018)
    Conference lecture
    Restricted access - publisher's policy
    In this article we describe the TALP-UPC research group participation in the WMT18 news shared translation task for FinnishEnglish and Estonian-English within the multi-lingual subtrack. All of our primary submissions ...
  • Neural machine translation with the transformer and multi-source romance languages for the biomedical WMT 2018 task 

    Tubay, Brian; Ruiz Costa-Jussà, Marta (2018)
    Conference lecture
    Restricted access - publisher's policy
  • A neural approach to language variety translation 

    Ruiz Costa-Jussà, Marta; Zampieri, Marcos; Pal, Santanu (Association for Computational Linguistics, 2018)
    Conference lecture
    Restricted access - publisher's policy
    In this paper we present the first neural-based machine translation system trained to translate between standard national varieties of the same language. We take the pair Brazilian - European Portuguese as an example and ...
  • From feature to paradigm: Deep learning in machine translation (Extended Abstract) 

    Ruiz Costa-Jussà, Marta (2018)
    Conference lecture
    Restricted access - publisher's policy
    n the last years, deep learning algorithms have highly revolutionized several areas including speech, image and natural language processing. The specific field of Machine Translation (MT) has not remained invariant. ...
  • Synthesis using speaker adaptation from speech recognition DB 

    Oller Moreno, Sergio; Moreno Bilbao, M. Asunción; Bonafonte Cávez, Antonio (Universidad de Vigo, 2010)
    Conference lecture
    Open Access
    This paper deals with the creation of multiple voices from a Hidden Markov Model based speech synthesis system (HTS). More than 150 Catalan synthetic voices were built using Hidden Markov Models (HMM) and speaker adaptation ...
  • Visualizing punctuation restoration in speech transcripts with prosograph 

    Oktem, A.; Farrús, M.; Bonafonte Cávez, Antonio (International Speech Communication Association (ISCA), 2018)
    Conference report
    Open Access
    We have developed a neural architecture that tests the effect of lexical, morphosyntactic and prosodic features in restoring punctuation in speech transcriptions. Having outperformed a baseline model in terms of precision ...

View more