Now showing items 21-40 of 49

  • Introducing nativization to Spanish TTS systems 

    Polyakova, Tatyana; Bonafonte Cávez, Antonio (2011-06)
    Article
    Restricted access - publisher's policy
    In the modern world, speech technologies must be flexible and adaptable to any framework. Mass media globalization introduces multilingualism as a challenge for the most popular speech applications such as text-to-speech ...
  • Language and noise transfer in speech enhancement generative adversarial network 

    Pascual de la Puente, Santiago; Park, Maruchan; Serra, Joan; Bonafonte Cávez, Antonio; Ahn, Kang-hun (Institute of Electrical and Electronics Engineers (IEEE), 2018)
    Conference report
    Restricted access - publisher's policy
    Speech enhancement deep learning systems usually require large amounts of training data to operate in broad conditions or real applications. This makes the adaptability of those systems into new, low resource environments ...
  • Language modeling using X-grams 

    Bonafonte Cávez, Antonio; Mariño Acebal, José Bernardo (H. TIMOTHY BRUMMELL, WILLIAM IDSARDI CITATION DELAWARE, NEW CASTLE, DELAWARE, 1996)
    Conference report
    Open Access
    In this paper, an extension of n-grams, called x-grams, is proposed. In this extension, the memory of the model (n) is not fixed a priori. Instead, large memories are accepted first, and merging criteria are then applied ...
  • Multi-output RNN-LSTM for multiple speaker speech synthesis and adaptation 

    Pascual, Santiago; Bonafonte Cávez, Antonio (Institute of Electrical and Electronics Engineers (IEEE), 2016)
    Conference report
    Restricted access - publisher's policy
    Deep Learning has been applied successfully to speech processing. In this paper we propose an architecture for speech synthesis using multiple speakers. Some hidden layers are shared by all the speakers, while there is a ...
  • Multi-output RNN-LSTM for multiple speaker speech synthesis with a-interpolation model 

    Pascual, Santiago; Bonafonte Cávez, Antonio (Institute of Electrical and Electronics Engineers (IEEE), 2016)
    Conference report
    Open Access
    Deep Learning has been applied successfully to speech processing. In this paper we propose an architecture for speech synthesis using multiple speakers. Some hidden layers are shared by all the speakers, while there is a ...
  • Nativization of English words in Spanish using analogy 

    Polyakova, Tatyana; Bonafonte Cávez, Antonio (2010)
    Conference report
    Open Access
    Nowadays modern speech technologies need to be flexible and adaptable to any framework. Mass media globalization introduces the challenge of multilingualism into most popular speech applications such as text-to-speech ...
  • Out-of-vocabulary word modelling and rejection for keyword spotting 

    Lleida Solano, Eduardo; Mariño, José B.; Salavedra Molí, Josep; Bonafonte Cávez, Antonio; Monte Moreno, Enrique (International Speech Communication Association (ISCA), 1993)
    Conference report
    Restricted access - publisher's policy
    This paper presents a combination of out-of-vocabulary (OOV) word modeling and rejection techniques in an attempt to accept utterances embedding a keyword and reject utterances with nonkeywords. The goal of this research ...
  • Parametric modeling of PDF using a convolution of one-sided exponentials: application to HMM 

    Vidal Manzano, José; Bonafonte Cávez, Antonio; Rodríguez Fonollosa, José Adrián (European Association for Signal Processing (EURASIP), 1994)
    Conference report
    Open Access
  • Prosodic and spectral iVectors for expressive speech synthesis 

    Jauk, Igor; Bonafonte Cávez, Antonio (Institute of Electrical and Electronics Engineers (IEEE), 2016)
    Conference lecture
    Open Access
    This work presents a study on the suitability of prosodic andacoustic features, with a special focus on i-vectors, in expressivespeech analysis and synthesis. For each utterance of two dif-ferent databases, a laboratory ...
  • Prosodic break prediction with RNNs 

    Pascual de la Puente, Santiago; Bonafonte Cávez, Antonio (Springer, 2016)
    Conference report
    Restricted access - publisher's policy
    Prosodic breaks prediction from text is a fundamental task to obtain naturalness in text to speech applications. In this work we build a data-driven break predictor out of linguistic features like the Part of Speech (POS) ...
  • Rational characteristic functions and markov chains 

    Vidal Manzano, José; Bonafonte Cávez, Antonio; Losada, N; Rodríguez Fonollosa, José Adrián; Rodríguez Fonollosa, Javier (. S.N., 1995)
    Conference report
    Open Access
    Abstract 1 We investigate in this paper how to estimate the density function of a random variable using a parametric ARMA model for its characteristic function. The choice of this model is motivated by the fact that this ...
  • Recent work on the FESTCAT database for speech synthesis 

    Bonafonte Cávez, Antonio; Esquerra Llucià, Ignasi; Aguilar, Lourdes; Oller Moreno, Sergio; Moreno Bilbao, M. Asunción (2009)
    Conference report
    Open Access
    This paper presents our work around the FESTCAT project, whose main goal was the development of voices for the Festival suite in Catalan. In the first year, we produced the corpus and the speech data needed for build ...
  • Recognition of numbers by using demisyllables and hidden Markov models 

    Mariño Acebal, José Bernardo; Bonafonte Cávez, Antonio; Moreno Bilbao, M. Asunción; Lleida Solano, Eduardo; Nadeu Camprubí, Climent; Monte Moreno, Enrique (Elsevier, 1990)
    Conference report
    Open Access
  • Reconocimiento del habla continua mediante modelos ocultos de Markov utilizando la técnica de búsqueda en haz 

    Lleida Solano, Eduardo; Mariño Acebal, José Bernardo; Bonafonte Cávez, Antonio (Universidad de Málaga, 1992)
    Conference report
    Open Access
  • Search engine for multilingual audiovisual contents 

    Pérez, José David; Bonafonte Cávez, Antonio; Ruiz Costa-Jussà, Marta; Cardenal, Antonio; Rodríguez Fonollosa, José Adrián; Moreno Bilbao, M. Asunción; Navas, Eva; Rodríguez Banga, Eduardo (2012)
    Conference lecture
    Open Access
    This paper describes the BUCEADOR search engine, a web server that allows retrieving. multimedia documents (text, audio, video) in different languages. All the documents are translated into the user language and are ...
  • SETHOS: the UPC speech understanding system 

    Bonafonte Cávez, Antonio; Mariño Acebal, José Bernardo; Nogueiras Rodríguez, Albino (H. TIMOTHY BRUMMELL, WILLIAM IDSARDI CITATION DELAWARE, NEW CASTLE, DELAWARE, 1996)
    Conference lecture
    Restricted access - publisher's policy
    In EuroSpeech'95, the authors presented the first version of Sethos, the speech understanding system which has been developed at the UPC. In this paper some improvements are incorporated at different levels of Sethos: ...
  • Spanish statistical parametric speech synthesis using a neural vocoder 

    Bonafonte Cávez, Antonio; Pascual de la Puente, Santiago; Dorca, G. (International Speech Communication Association (ISCA), 2018)
    Conference report
    Open Access
    During the 2000s decade, unit-selection based text-to-speech was the dominant commercial technology. Meanwhile, the TTS research community has made a big effort to push statistical-parametric speech synthesis to get similar ...
  • Speech emotion recognition using hidden Markov models 

    Nogueiras Rodríguez, Albino; Mariño Acebal, José Bernardo; Bonafonte Cávez, Antonio; Moreno Bilbao, M. Asunción (2001)
    Conference lecture
    Restricted access - publisher's policy
    This paper introduces a first approach to emotion recognition using RAMSES, the UPC’s speech recognition system. The approach is based on standard speech recognition technology using hidden semi-continuous Markov models. ...
  • Study of subword units for spanish speech recognition 

    Bonafonte Cávez, Antonio; Estany, Rafael; Vives, Eugenio (ESCA - J.M. PARDO, E. ENRIQUEZ, J. ORTEGA, J. FERREIROS GTM-UPM, 1995)
    Conference report
    Open Access
    This paper studies different sets of subword speech units to be used for recognizing Spanish. In particular it compares context dependent phones, syllables and demisyllables. It shows how context dependent units can ...
  • Synthesis of filled pauses based on a disfluent speech model 

    Adell Roig, Jordi; Bonafonte Cávez, Antonio; Escudero Mancebo, David (2010)
    Conference lecture
    Restricted access - publisher's policy