VEU - Grup de Tractament de la Parla
http://hdl.handle.net/2117/3746
2017-08-23T14:00:55ZUsing x-gram for efficient speech recognition
http://hdl.handle.net/2117/106909
Using x-gram for efficient speech recognition
Bonafonte Cávez, Antonio; Mariño Acebal, José Bernardo
X-grams are a generalization of the n-grams, where the number of previous conditioning words is different for each case and decided from the training data. X-grams reduce perplexity with respect to trigrams and need less number of parameters. In this paper, the representation of the x-grams using finite state automata is considered. This representation leads to a new model, the non-deterministic x-grams, an approximation that is much more efficient, suffering small degradation on the modeling capability. Empirical experiments for a continuous speech recognition task show how, for each ending word, the number of transitions is reduced from 1222 (the size of the lexicon) to around 66.
2017-07-27T08:50:55ZBonafonte Cávez, AntonioMariño Acebal, José BernardoX-grams are a generalization of the n-grams, where the number of previous conditioning words is different for each case and decided from the training data. X-grams reduce perplexity with respect to trigrams and need less number of parameters. In this paper, the representation of the x-grams using finite state automata is considered. This representation leads to a new model, the non-deterministic x-grams, an approximation that is much more efficient, suffering small degradation on the modeling capability. Empirical experiments for a continuous speech recognition task show how, for each ending word, the number of transitions is reduced from 1222 (the size of the lexicon) to around 66.Restricted Boltzmann machines for vector representation of speech in speaker recognition
http://hdl.handle.net/2117/106743
Restricted Boltzmann machines for vector representation of speech in speaker recognition
Ghahabi, Omid; Hernando Pericás, Francisco Javier
Over the last few years, i-vectors have been the state-of-the-art technique in speaker recognition. Recent advances in Deep Learning (DL) technology have improved the quality of i-vectors but the DL techniques in use are computationally expensive and need phonetically labeled background data. The aim of this work is to develop an efficient alternative vector representation of speech by keeping the computational cost as low as possible and avoiding phonetic labels, which are not always accessible. The proposed vectors will be based on both Gaussian Mixture Models (GMM) and Restricted Boltzmann Machines (RBM) and will be referred to as GMM–RBM vectors. The role of RBM is to learn the total speaker and session variability among background GMM supervectors. This RBM, which will be referred to as Universal RBM (URBM), will then be used to transform unseen supervectors to the proposed low dimensional vectors. The use of different activation functions for training the URBM and different transformation functions for extracting the proposed vectors are investigated. At the end, a variant of Rectified Linear Units (ReLU) which is referred to as variable ReLU (VReLU) is proposed. Experiments on the core test condition 5 of NIST SRE 2010 show that comparable results with conventional i-vectors are achieved with a clearly lower computational load in the vector extraction process.
2017-07-24T11:10:26ZGhahabi, OmidHernando Pericás, Francisco JavierOver the last few years, i-vectors have been the state-of-the-art technique in speaker recognition. Recent advances in Deep Learning (DL) technology have improved the quality of i-vectors but the DL techniques in use are computationally expensive and need phonetically labeled background data. The aim of this work is to develop an efficient alternative vector representation of speech by keeping the computational cost as low as possible and avoiding phonetic labels, which are not always accessible. The proposed vectors will be based on both Gaussian Mixture Models (GMM) and Restricted Boltzmann Machines (RBM) and will be referred to as GMM–RBM vectors. The role of RBM is to learn the total speaker and session variability among background GMM supervectors. This RBM, which will be referred to as Universal RBM (URBM), will then be used to transform unseen supervectors to the proposed low dimensional vectors. The use of different activation functions for training the URBM and different transformation functions for extracting the proposed vectors are investigated. At the end, a variant of Rectified Linear Units (ReLU) which is referred to as variable ReLU (VReLU) is proposed. Experiments on the core test condition 5 of NIST SRE 2010 show that comparable results with conventional i-vectors are achieved with a clearly lower computational load in the vector extraction process.Codificación no parametrica de voz basada en redes neuronales sobredimensionadas
http://hdl.handle.net/2117/106661
Codificación no parametrica de voz basada en redes neuronales sobredimensionadas
Hernández Abrego, Gustavo Adolfo; Batlle Mont, Eloi; Antón Haro, Carles; Monte Moreno, Enrique
2017-07-20T12:10:55ZHernández Abrego, Gustavo AdolfoBatlle Mont, EloiAntón Haro, CarlesMonte Moreno, EnriqueImplementación sobre TMS320C31 del codificador de voz "Half rate" descrito en la recomendación GSM 6.20
http://hdl.handle.net/2117/106659
Implementación sobre TMS320C31 del codificador de voz "Half rate" descrito en la recomendación GSM 6.20
Antón Haro, Carles; González, Jaime; Rodríguez Fonollosa, José Adrián
2017-07-20T12:07:54ZAntón Haro, CarlesGonzález, JaimeRodríguez Fonollosa, José AdriánReconocimiento del habla con cuantificadores vectoriales optimos diseñados mediante algoritmos genéticos
http://hdl.handle.net/2117/106642
Reconocimiento del habla con cuantificadores vectoriales optimos diseñados mediante algoritmos genéticos
Fuster González, Jesús; Monte Moreno, Enrique
2017-07-20T10:00:16ZFuster González, JesúsMonte Moreno, EnriqueOptimización de los parámetros de un hmm mediante entrenamiento discriminativo
http://hdl.handle.net/2117/106445
Optimización de los parámetros de un hmm mediante entrenamiento discriminativo
Esteban, F J; Monte Moreno, Enrique
2017-07-14T10:58:09ZEsteban, F JMonte Moreno, EnriqueReconocimiento de fonemas mediante redes neuronales predictivas
http://hdl.handle.net/2117/106403
Reconocimiento de fonemas mediante redes neuronales predictivas
Freitag, Fèlix; Navarro Moldes, Leandro; Enric, Monte; Monte Moreno, Enrique
2017-07-14T08:13:46ZFreitag, FèlixNavarro Moldes, LeandroEnric, MonteMonte Moreno, EnriqueMaximum likelihood based discriminative training of acoustic models
http://hdl.handle.net/2117/106166
Maximum likelihood based discriminative training of acoustic models
Nogueiras Rodríguez, Albino; Mariño Acebal, José Bernardo
In this paper, a framework for discriminative training of acoustic models based on Generalised Probabilistic Descent (GPD) method is presented. The key feature of our proposal, Maximum Likelihood based Discriminative Training of Acoustic Models (MLDT), is the use of maximum likelihood trained HMM's instead of the original speech signal. We focus our attention in performing discriminative training applied to a discrete hidden Markov models continuos speech recogniser, achieving a 4.6% error rate reduction on a Spanish speaker-independent phoneme recognition task.
2017-07-05T08:58:58ZNogueiras Rodríguez, AlbinoMariño Acebal, José BernardoIn this paper, a framework for discriminative training of acoustic models based on Generalised Probabilistic Descent (GPD) method is presented. The key feature of our proposal, Maximum Likelihood based Discriminative Training of Acoustic Models (MLDT), is the use of maximum likelihood trained HMM's instead of the original speech signal. We focus our attention in performing discriminative training applied to a discrete hidden Markov models continuos speech recogniser, achieving a 4.6% error rate reduction on a Spanish speaker-independent phoneme recognition task.Estadisticas de orden superior en reconocimiento de voz estudio comparativo
http://hdl.handle.net/2117/106109
Estadisticas de orden superior en reconocimiento de voz estudio comparativo
Tortola, S; Moreno Bilbao, M. Asunción; Vidal Manzano, José
2017-07-03T13:44:04ZTortola, SMoreno Bilbao, M. AsunciónVidal Manzano, JoséCodificador de voz multipulso a 9,6 kbps en tiempo real sobre el TMS320C30
http://hdl.handle.net/2117/106037
Codificador de voz multipulso a 9,6 kbps en tiempo real sobre el TMS320C30
Moreno Bilbao, M. Asunción; Vallverdú Bayés, Sisco
2017-06-30T11:35:52ZMoreno Bilbao, M. AsunciónVallverdú Bayés, Sisco