Robust Part of Speech Tagging
Tutor / director / evaluatorMàrquez Villodre, Lluís
Document typeMaster thesis
Rights accessOpen Access
Generally, NLP tools use well-formed and annotated data to learn patterns by using machine learning techniques. However, in this work we will focus on the language used in an on-line platform for machine translation. In this area it is usual to have a framework such the following: a web-page which offer a service of translation between pairs of languages. The problem is that the casual users utilize the service to translate any type of text (cut and paste, single words, bad formatting, snipets, informal language, pre-traductions, etc.). Hence, in this situation we will find very often words with mistakes that make the system provides a bad translation because it is not able to understand the input.The main goal of our work is, once we have identified the problem of dealing with non-standard-input is to develop a robust PoS tagger from the SVMTagger.