Extracción de características y clasificación automática de señales de audio
Tutor / director / evaluatorTarrés Ruiz, Francisco
Document typeBachelor thesis
Rights accessOpen Access
The purpose it is to make the development of an algorithm that is able to extract the features of audio segments for further classification in speech, music or music and speech files. For this we have analysed the most referenced classification algorithms in the literature, and their characteristics, performance and computational complexity evaluated during training and recognition phases. From these studies, it has been decided to implement a system based on low-level features and a statistical classifier system. MATLAB has been chosen as the development tool since the applications in mind did not require a real-time training system. The coefficients we use to characterize the different types of signal are called MFCC, an acronym for the Mel Frequency Cepstral Coefficient. And as training algorithm a Gaussian Mixture Model (GMM) will be used for each audio class, which will change the number of Gaussians to model and will evaluate the best configuration. In addition to the MFCC, we will also implement the MFCC Deltas and MFCC Delta-Deltas that will provide us with information about the dynamic properties of audio; because MFCC only provide information within a window where the signal is considered stationary. As discussed below, the obtained results are enough satisfactory for the algorithm to run on a real case. Charts and graphs of average error rate for the different configurations of inputs that have been tested and reported. The results information is expanded in the annex.