Now showing items 21-40 of 153

  • Building synthetic voices in the METANET framework 

    Garcia Casademont, Emília; Bonafonte Cávez, Antonio; Moreno Bilbao, M. Asunción (2012)
    Conference lecture
    Open Access
    METANET4U is a European project aiming at supporting language technology for European languages and multilingualism. It is a project in the META-NET Network of Excellence, a cluster of projects aiming at fostering the ...
  • Casc per a personal de serveis d'emergència controlat per veu 

    Boada Torre-Marín, Jordi (Universitat Politècnica de Catalunya, 2017-07)
    Bachelor thesis
    Open Access
    Aquest projecte ha pretès la creació d’un prototip d’equip de protecció individual que permeti la interacció amb diferents actuadors mitjançant un control per veu, facilitant i fent més segures les accions del portador en ...
  • Channel selection measures for multi-microphone speech recognition 

    Nadeu Camprubí, Climent; Wolf, Martin (2014-02-01)
    Article
    Restricted access - publisher's policy
    Automatic speech recognition in a room with distant microphones is strongly affected by noise and reverberation. In scenarios where the speech signal is captured by several arbitrarily located microphones the degree of ...
  • Channel selection using N-best hypothesis for multi-microphone ASR 

    Wolf, Martin; Nadeu Camprubí, Climent (2013)
    Conference report
    Restricted access - publisher's policy
    If speech is captured by several arbitrarily-located microphones in a room, the degree of distortion by noise and reverberation may vary strongly from one channel to another. Channel selection for automatic speech recognition ...
  • Characterization of Speech Recognition Systems on GPU Architectures 

    Segura Salvador, Albert (Universitat Politècnica de Catalunya, 2016-07-04)
    Master thesis
    Open Access
    This master thesis characterizes the performance and energy bottlenecks of speech recognition systems when running on modern GPU, with the aim of providing useful information for designing future GPU architectures, as well ...
  • Collaborative voting of 3D features for robust gesture estimation 

    van Sabben Alsina, Daniel; Ruiz Hidalgo, Javier; Suau Cuadros, Xavier; Casas Pla, Josep Ramon (Institute of Electrical and Electronics Engineers (IEEE), 2017)
    Conference lecture
    Open Access
    Human body analysis raises special interest because it enables a wide range of interactive applications. In this paper we present a gesture estimator that discriminates body poses in depth images. A novel collaborative ...
  • Combining phrase and neural-based machine translation: what worked and did not 

    Ruiz Costa-Jussà, Marta; Rodríguez Fonollosa, José Adrián (2017)
    Article
    Restricted access - publisher's policy
    Phrase-based machine translation assumes that all words are at the same distance and translates them using feature functions that approximate the probability at different levels. On the other hand, neural machine translation ...
  • Computation reuse in DNNs by exploiting input similarity 

    Riera Villanueva, Marc; Arnau Montañés, José María; González Colás, Antonio María (Institute of Electrical and Electronics Engineers (IEEE), 2018)
    Conference report
    Restricted access - publisher's policy
    In recent years, Deep Neural Networks (DNNs) have achieved tremendous success for diverse problems such as classification and decision making. Efficient support for DNNs on CPUs, GPUs and accelerators has become a prolific ...
  • Controlador de dispositivos por reconocimiento de voz (CDRV) 

    Roca Nonell, Aleix (Universitat Politècnica de Catalunya, 2014-12-11)
    Master thesis (pre-Bologna period)
    Open Access
    CDRV (Controlador de dipositivos por reconocimiento de voz) es un dispositiu capaç de controlar altres dispositius mitjançant la veu. Concretament, per aquest projecte, s'ha adaptat per controlar una butaca reclinable.
  • Controlling 3D holographic contents by personal devices 

    Barroso Laguna, Axel (Universitat Politècnica de Catalunya, 2014-12)
    Master thesis (pre-Bologna period)
    Open Access
    Covenantee:  Politecnico di Torino
    [ANGLÈS] Foremost, this project explains the different sensors of a personal device (e.g. smartphone). After that, this study shows how to interact with holographic scenes. These scenes have been created by Blender. Blender ...
  • Corpus for cyberbullying prevention 

    Moreno Bilbao, M. Asunción; Bonafonte Cávez, Antonio; Jauk, Igor; Tarrés, Laia; Pereira, Victor (International Speech Communication Association (ISCA), 2018)
    Conference report
    Open Access
    Cyberbullying is the use of digital media to harass a person or group of people, through personal attacks, disclosure of confidential or false information, among other means. That is to say, it ...
  • Corpus selection 

    Adda, Gilles; Barras, Claude; Kernal Ekenel, Hazim; Morros Rubió, Josep Ramon; Hernando Pericás, Francisco Javier (2013-03-31)
    External research report
    Open Access
    Entregable del proyecto Collaborative Annotation of multi-MOdal, MultI-Lingual and multi-mEdia documents. This document describes the different corpora that will be used during the Camomile project
  • Creating expressive synthetic voices by unsupervised clustering of audiobooks 

    Jauk, Igor; Bonafonte Cávez, Antonio; López Otero, Paula; Docio Fernández, Laura (International Speech Communication Association (ISCA), 2015)
    Conference lecture
    Restricted access - publisher's policy
    In this work we design an approach for automatic feature selection and voice creation for expressive synthesis. Our approach is guided by two main goals: (1) increasing the flexibility of expressive voice creation and (2) ...
  • Deep learning backend for single and multisession i-vector speaker recognition 

    Ghahabi, Omid; Hernando Pericás, Francisco Javier (2017-04-01)
    Article
    Open Access
    The lack of labeled background data makes a big performance gap between cosine and Probabilistic Linear Discriminant Analysis (PLDA) scoring baseline techniques for i-vectors in speaker recognition. Although there are some ...
  • Deep Neural Networks for Channel Compensated i-Vectors in Speaker Recognition 

    Jiménez Sanfiz, Albert (Universitat Politècnica de Catalunya, 2014-06)
    Bachelor thesis
    Open Access
    This thesis explores the application of channel-compensation techniques in speaker verification and the posterior combination with deep learning technologies. The idea is to reduce the degradation of the performance due ...
  • Deep neural networks for i-vector language identification of short utterances in cars 

    Ghahabi Esfahani, Omid; Bonafonte Cávez, Antonio; Hernando Pericás, Francisco Javier; Moreno Bilbao, M. Asunción (International Speech Communication Association (ISCA), 2016)
    Conference report
    Restricted access - publisher's policy
    This paper is focused on the application of the Language Identification (LID) technology for intelligent vehicles. We cope with short sentences or words spoken in moving cars in four languages: English, Spanish, German, ...
  • DeepVoice: tecnologías de aprendizaje profundo aplicadas al procesado de voz y audio 

    Ruiz Costa-Jussà, Marta; Rodríguez Fonollosa, José Adrián (2017-09-01)
    Article
    Open Access
    This project proposes the development of new deep learning methods for speech and audio processing, exploring new applications and continuing the initial work of the research team and the international community. Research ...
  • DeepVoice: tecnologías de aprendizaje profundo aplicadas al procesado de voz y audio 

    Ruiz Costa-Jussà, Marta; Rodríguez Fonollosa, José Adrián (2017-09-22)
    Article
    Open Access
    Este proyecto propone el desarrollo de nuevas arquitecturas para el procesado de la voz y el audio mediante métodos de aprendizaje profundo, explorando también nuevas aplicaciones y dando continuidad al trabajo inicial del ...
  • Demisyllable based Spanish Number Recognition Experiments 

    Mariño Acebal, José Bernardo; Nadeu Camprubí, Climent; Lleida Solano, Eduardo (1987)
    Conference report
    Open Access
    The main features of our demisyllable based continuous speech recognition system (RAMSES) are showed. Special attention is paid to demisyllable definition and the syntactic constraints used with the dynamic programming ...
  • Design and implementation of SIP VoIP Adapter 

    Guixà Ibàñez, Adrià (Universitat Politècnica de Catalunya, 2009-12-15)
    Master thesis (pre-Bologna period)
    Open Access
    The SIP VoIP Adapter is a Java application that is able to establish a SIP communication acting as a User Agent, which uses an external device as a sound device, to play and acquire the audio from the call established ...