L'àmbit de recerca del grup 'VEU' és el tractament de la parla. Investiguem tecnologies que permeten l'extracció d'informació que la veu conté: reconeixement del que es diu, l'idioma o el dialecte, característiques del parlant -qui és, la seva edat, el sexe, l'estat emocional-, la direcció del so. També treballem en la caracterització general de l'àudio, per determinar quan hi ha veu i quan hi ha altres esdeveniments acústics com música o sorolls diversos. Les tecnologies de la parla possibiliten generar veu -síntesis de veu- o modificar les seves

http://futur.upc.edu/VEU

Enviaments recents

  • Machine and deep learning approaches to localization and range estimation of underwater acoustic sources 

    Houégnigan, Ludwig; Safari, Pooyan; Nadeu Camprubí, Climent; André, Michel; Van der Schaar, Mike Connor Roger Malcolm (Institute of Electrical and Electronics Engineers (IEEE), 2017)
    Comunicació de congrés
    Accés restringit per política de l'editorial
    This paper introduces ongoing experiments and early results for the underwater localization and range estimation of acoustic sources. Beyond classical results obtained for direction of arrival estimation, results concerning ...
  • Search engine for multilingual audiovisual contents 

    Pérez, José David; Bonafonte Cávez, Antonio; Ruiz Costa-Jussà, Marta; Cardenal, Antonio; Rodríguez Fonollosa, José Adrián; Moreno Bilbao, M. Asunción; Navas, Eva; Rodríguez Banga, Eduardo (2012)
    Comunicació de congrés
    Accés obert
    This paper describes the BUCEADOR search engine, a web server that allows retrieving. multimedia documents (text, audio, video) in different languages. All the documents are translated into the user language and are ...
  • N-gram-based machine translation 

    Mariño Acebal, José Bernardo; Banchs, Rafael E.; Crego, Josep Maria; de Gispert Ramis, Adrià; Lambert, Patrik; Rodríguez Fonollosa, José Adrián; Ruiz Costa-Jussà, Marta (2006-12-01)
    Article
    Accés obert
    This article describes in detail an n-gram approach to statistical machine translation. This ap- proach consists of a log-linear combination of a translation model based on n-grams of bilingual units, which are referred ...
  • Multi-output RNN-LSTM for multiple speaker speech synthesis and adaptation 

    Pascual, Santiago; Bonafonte Cávez, Antonio (Institute of Electrical and Electronics Engineers (IEEE), 2016)
    Text en actes de congrés
    Accés restringit per política de l'editorial
    Deep Learning has been applied successfully to speech processing. In this paper we propose an architecture for speech synthesis using multiple speakers. Some hidden layers are shared by all the speakers, while there is a ...
  • A bilingual Spanish-Catalan database of units for concatenative synthesis 

    Esquerra Llucià, Ignasi; Bonafonte Cávez, Antonio; Vallverdú Bayés, Sisco; Febrer Godayol, Albert (1998)
    Text en actes de congrés
    Accés obert
    Different databases of phonetic units are required in multilingual Text-to-Speech systems based on concatenative synthesis. We are currently developing a TTS system able to convert text either in Catalan and Spanish, with ...
  • Phoneme recognition with statistical modeling of the prediction error of neural networks 

    Freitag, Fèlix; Monte Moreno, Enrique (International Speech Communication Association (ISCA), 1998)
    Text en actes de congrés
    Accés obert
    This paper presents a speech recognition system which incorporates predictive neural networks. The neural networks are used to predict observation vectors of speech. The prediction error vectors are modeled on the state ...
  • Feature decorrelation methods in speech recognition. A comparative study 

    Batlle Mont, Eloi; Nadeu Camprubí, Climent; Rodríguez Fonollosa, José Adrián (International Speech Communication Association (ISCA), 1998)
    Text en actes de congrés
    Accés obert
    In this paper we study various decorrelation methods for the features used in speech recognition and we compare the performance of each one by running several tests with a speech database. First of all we study the ...
  • Acoustic feature prediction from semantic features for expressive speech using deep neural networks 

    Jauk, Igor; Bonafonte Cávez, Antonio; Pascual, Santiago (Institute of Electrical and Electronics Engineers (IEEE), 2016)
    Text en actes de congrés
    Accés restringit per política de l'editorial
    The goal of the study is to predict acoustic features of expressive speech from semantic vector space representations. Though a lot of successful work was invested in expressiveness analysis and prediction, the results ...
  • A differentiable BLEU loss. Analysis and first results 

    Casas, Noe; Rodríguez Fonollosa, José Adrián; Ruiz Costa-Jussà, Marta (2018)
    Text en actes de congrés
    Accés obert
    In natural language generation tasks, like neural machine translation and image captioning, there is usually a mismatch between the optimized loss and the de facto evaluation criterion, namely token-level maximum likelihood ...
  • From feature to paradigm: deep learning in machine translation 

    Ruiz Costa-Jussà, Marta (2018-04-01)
    Article
    Accés obert
    In the last years, deep learning algorithms have highly revolutionized several areas including speech, image and natural language processing. The specific field of Machine Translation (MT) has not remained invariant. ...
  • English-catalan neural machine translation in the biomedical domain through the cascade approach 

    Ruiz Costa-Jussà, Marta; Casas, Noe; Melero, Maite (2018)
    Comunicació de congrés
    Accés restringit per política de l'editorial
    This paper describes the methodology followed to build a neural machine translation system in the biomedical domain for the English-Catalan language pair. This task can be considered a low-resourced task from the point of ...
  • Recent activities of IAG working group “Ionosphere Prediction” 

    Erdogan, Eren; Hoque, Mainul; García Rigo, Alberto; Cueto, M.; Schmidt, Michael; Jakowski, Norbert; Berdermann, Jens; Monte Moreno, Enrique; Hernández Pajares, Manuel (2018)
    Comunicació de congrés
    Accés obert
    Ionospheric disturbances pose, for instance, an increasing risk on economy, national security, satellite and airline operations, communications networks and the navigation systems. Constructing ...

Mostra'n més