Dealing with input noise in statistical machine translation
Visualitza/Obre
Estadístiques de LA Referencia / Recolecta
Inclou dades d'ús des de 2022
Cita com:
hdl:2117/18279
Tipus de documentComunicació de congrés
Data publicació2012
Condicions d'accésAccés obert
Llevat que s'hi indiqui el contrari, els
continguts d'aquesta obra estan subjectes a la llicència de Creative Commons
:
Reconeixement-NoComercial-SenseObraDerivada 3.0 Espanya
Abstract
Misspelled words have a direct impact on the final quality obtained by Statistical Machine
Translation (SMT) systems as the input becomes noisy and unpredictable. This paper presents
some improvement strategies for translating real-life noisy input. The proposed strategies
are based on a preprocessing step consisting in a character-based translator (MT) from noisy
into cleaned text. The use of a character-level translator allows us to provide various spelling
alternatives in a lattice format to the final bilingual translator. Therefore, the final MT is the
one that decides the best path to be translated. The different hypotheses are obtained under
the assumption of a noisy channel model for this task. This paper shows the experiments done
with real-life noisy input and a standard phrase-based SMT system from English into Spanish.
CitacióFormiga, L.; Fonollosa, José A. R. Dealing with input noise in statistical machine translation. A: International Conference on Computational Linguistics. "Proceedings of COLING 2012: Technical Papers : 8-15 December 2012, Mumbai, India". Mumbai: 2012, p. 319-328.
Versió de l'editorhttp://aclweb.org/anthology-new/C/C12/C12-2032.pdf
Fitxers | Descripció | Mida | Format | Visualitza |
---|---|---|---|---|
POSTERS032.pdf | Article | 120,2Kb | Visualitza/Obre |