Now showing items 1-4 of 4

  • Acoustic feature prediction from semantic features for expressive speech using deep neural networks 

    Jauk, Igor; Bonafonte Cávez, Antonio; Pascual, Santiago (Institute of Electrical and Electronics Engineers (IEEE), 2016)
    Conference report
    Restricted access - publisher's policy
    The goal of the study is to predict acoustic features of expressive speech from semantic vector space representations. Though a lot of successful work was invested in expressiveness analysis and prediction, the results ...
  • Multi-output RNN-LSTM for multiple speaker speech synthesis and adaptation 

    Pascual, Santiago; Bonafonte Cávez, Antonio (Institute of Electrical and Electronics Engineers (IEEE), 2016)
    Conference report
    Restricted access - publisher's policy
    Deep Learning has been applied successfully to speech processing. In this paper we propose an architecture for speech synthesis using multiple speakers. Some hidden layers are shared by all the speakers, while there is a ...
  • Multi-output RNN-LSTM for multiple speaker speech synthesis with a-interpolation model 

    Pascual, Santiago; Bonafonte Cávez, Antonio (Institute of Electrical and Electronics Engineers (IEEE), 2016)
    Conference report
    Open Access
    Deep Learning has been applied successfully to speech processing. In this paper we propose an architecture for speech synthesis using multiple speakers. Some hidden layers are shared by all the speakers, while there is a ...
  • Temporal activity detection in untrimmed videos with recurrent neural networks 

    Montes, Alberto; Salvador Aguilera, Amaia; Pascual, Santiago; Giró Nieto, Xavier (2016)
    Conference lecture
    Open Access
    This work proposes a simple pipeline to classify and temporally localize activities in untrimmed videos. Our system uses features from a 3D Convolutional Neural Network (C3D) as input to train a a recurrent neural network ...