Dealing with input noise in statistical machine translation
Document typeConference lecture
Rights accessOpen Access
Misspelled words have a direct impact on the final quality obtained by Statistical Machine Translation (SMT) systems as the input becomes noisy and unpredictable. This paper presents some improvement strategies for translating real-life noisy input. The proposed strategies are based on a preprocessing step consisting in a character-based translator (MT) from noisy into cleaned text. The use of a character-level translator allows us to provide various spelling alternatives in a lattice format to the final bilingual translator. Therefore, the final MT is the one that decides the best path to be translated. The different hypotheses are obtained under the assumption of a noisy channel model for this task. This paper shows the experiments done with real-life noisy input and a standard phrase-based SMT system from English into Spanish.
CitationFormiga, L.; Fonollosa, José A. R. Dealing with input noise in statistical machine translation. A: International Conference on Computational Linguistics. "Proceedings of COLING 2012: Technical Papers : 8-15 December 2012, Mumbai, India". Mumbai: 2012, p. 319-328.