Deep learning at Dublin city university
CovenanteeDublin City University
Document typeMaster thesis
Rights accessRestricted access - confidentiality agreement
Speech recognition involves generating sequences of words that match what is being said in recordings of speech. In recent years, machine learning techniques are increasingly being used in speech recognition mainly due to the widespread availability of training data and the decrease in cost related to large scale computation resources. These two factors made feasible the use of a powerful machine learning technique - deep learning - to create end-to-end speech recognition systems. This, compared to classical methods used in this field, does not require an extensive knowledge of phonetics. When listening to any kind of speech, humans use prior knowledge about the topic (politics, medicine, sports, etc.) of the speech for better understanding. In contrast, speech recognition systems do not usually use this prior knowledge. The use of contextual information to improve an automatic speech recognition system is explored in this thesis. The output of this thesis will be used by the company Vilynx to transcribe speech from videos that, among others, contain general, sport, and entertainment news. Contextual information is extracted from the video title and video description.