Age prediction by voice using deep learning

View/Open
Cita com:
hdl:2117/386585
Document typeMaster thesis
Date2023-01-30
Rights accessOpen Access
All rights reserved. This work is protected by the corresponding intellectual and industrial
property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public
communication or transformation of this work are prohibited without permission of the copyright holder
Abstract
One of the main topics in artificial intelligence is the speech characterization. Moreover, it is a field of study with the minimal scope when the Catalan language is involved in. In this project, we try to perform an age classification by decades firstly in the Catalan CommonVoice Dataset and then add the Spanish Dataset and English Dataset to have more data. To reach our purpose Deep Learning techniques are used to implement the classifier. The most common backbones are used such as Resnet and VGG. Furthermore, we use an attention encoder to encode the Mel-Spectrogram features. In contrast to statistical pooling methods like average pooling, Attention Pooling layers and various Attention Mechanisms are used in all backbones to perform pooling and reduce the dimensionality of the feature vector derived from the Front-End architecture. In this study, we will compare two different models, the first with an AM-Softmax in the final layer and the other with an AM-Softmax combined with Ordinal Regression.
SubjectsDeep learning, Artificial intelligence, Automatic speech recognition, Aprenentatge profund, Intel·ligència artificial, Reconeixement automàtic de la parla
DegreeMÀSTER UNIVERSITARI EN TECNOLOGIES AVANÇADES DE TELECOMUNICACIÓ (Pla 2019)
Files | Description | Size | Format | View |
---|---|---|---|---|
memoria_tfm_david linde.pdf | 3,064Mb | View/Open |