Enviaments recents

  • Visualizing punctuation restoration in speech transcripts with prosograph 

    Oktem, A.; Farrús, M.; Bonafonte Cávez, Antonio (International Speech Communication Association (ISCA), 2018)
    Text en actes de congrés
    Accés obert
    We have developed a neural architecture that tests the effect of lexical, morphosyntactic and prosodic features in restoring punctuation in speech transcriptions. Having outperformed a baseline model in terms of precision ...
  • Expressive speech synthesis using sentiment embeddings 

    Jauk, Igor; Lorenzo Trueba, J.; Yamagishi, J.; Bonafonte Cávez, Antonio (International Speech Communication Association (ISCA), 2018)
    Text en actes de congrés
    Accés obert
    In this paper we present a DNN based speech synthesis system trained on an audiobook including sentiment features predicted by the Stanford sentiment parser. The baseline system uses DNN to predict acoustic parameters based ...
  • Spanish statistical parametric speech synthesis using a neural vocoder 

    Bonafonte Cávez, Antonio; Pascual de la Puente, Santiago; Dorca, G. (International Speech Communication Association (ISCA), 2018)
    Text en actes de congrés
    Accés obert
    During the 2000s decade, unit-selection based text-to-speech was the dominant commercial technology. Meanwhile, the TTS research community has made a big effort to push statistical-parametric speech synthesis to get similar ...
  • A conversation analysis framework using speech recognition and naïve bayes classification for construction process monitoring 

    Zhang, T.; Lee, Y. C.; Zhu, Y.; Hernando Pericás, Francisco Javier (American Society of Civil Engineers (ASCE), 2018)
    Text en actes de congrés
    Accés restringit per política de l'editorial
    At a dynamic construction site, conversations convey vital information including construction activities, operation status, and task performance. Even though because of information security, recording the entire conversations ...
  • Language and noise transfer in speech enhancement generative adversarial network 

    Pascual de la Puente, Santiago; Park, Maruchan; Serra, Joan; Bonafonte Cávez, Antonio; Ahn, Kang-hun (Institute of Electrical and Electronics Engineers (IEEE), 2018)
    Text en actes de congrés
    Accés restringit per política de l'editorial
    Speech enhancement deep learning systems usually require large amounts of training data to operate in broad conditions or real applications. This makes the adaptability of those systems into new, low resource environments ...
  • A geometric proxy of economic uncertainty based on the disagreement in survey expectations 

    Claveria González, Oscar; Monte Moreno, Enrique; Torra Porras, Salvador (2018)
    Comunicació de congrés
    Accés obert
    In this study we present a geometric approach to proxy economic uncertainty. We design a positional indicator of disagreement among survey-based agents' expectations about the state of the economy. Previous dispersion-based ...
  • Machine and deep learning approaches to localization and range estimation of underwater acoustic sources 

    Houégnigan, Ludwig; Safari, Pooyan; Nadeu Camprubí, Climent; André, Michel; Van der Schaar, Mike Connor Roger Malcolm (Institute of Electrical and Electronics Engineers (IEEE), 2017)
    Comunicació de congrés
    Accés restringit per política de l'editorial
    This paper introduces ongoing experiments and early results for the underwater localization and range estimation of acoustic sources. Beyond classical results obtained for direction of arrival estimation, results concerning ...
  • Search engine for multilingual audiovisual contents 

    Pérez, José David; Bonafonte Cávez, Antonio; Ruiz Costa-Jussà, Marta; Cardenal, Antonio; Rodríguez Fonollosa, José Adrián; Moreno Bilbao, M. Asunción; Navas, Eva; Rodríguez Banga, Eduardo (2012)
    Comunicació de congrés
    Accés obert
    This paper describes the BUCEADOR search engine, a web server that allows retrieving. multimedia documents (text, audio, video) in different languages. All the documents are translated into the user language and are ...
  • Multi-output RNN-LSTM for multiple speaker speech synthesis and adaptation 

    Pascual, Santiago; Bonafonte Cávez, Antonio (Institute of Electrical and Electronics Engineers (IEEE), 2016)
    Text en actes de congrés
    Accés restringit per política de l'editorial
    Deep Learning has been applied successfully to speech processing. In this paper we propose an architecture for speech synthesis using multiple speakers. Some hidden layers are shared by all the speakers, while there is a ...
  • A bilingual Spanish-Catalan database of units for concatenative synthesis 

    Esquerra Llucià, Ignasi; Bonafonte Cávez, Antonio; Vallverdú Bayés, Sisco; Febrer Godayol, Albert (1998)
    Text en actes de congrés
    Accés obert
    Different databases of phonetic units are required in multilingual Text-to-Speech systems based on concatenative synthesis. We are currently developing a TTS system able to convert text either in Catalan and Spanish, with ...
  • Phoneme recognition with statistical modeling of the prediction error of neural networks 

    Freitag, Fèlix; Monte Moreno, Enrique (International Speech Communication Association (ISCA), 1998)
    Text en actes de congrés
    Accés obert
    This paper presents a speech recognition system which incorporates predictive neural networks. The neural networks are used to predict observation vectors of speech. The prediction error vectors are modeled on the state ...
  • Feature decorrelation methods in speech recognition. A comparative study 

    Batlle Mont, Eloi; Nadeu Camprubí, Climent; Rodríguez Fonollosa, José Adrián (International Speech Communication Association (ISCA), 1998)
    Text en actes de congrés
    Accés obert
    In this paper we study various decorrelation methods for the features used in speech recognition and we compare the performance of each one by running several tests with a speech database. First of all we study the ...

Mostra'n més