Leveraging online user feedback to improve statistical machine translation

Formiga, Lluís; Barrón-Cedeño, Alberto; Marquez, Lluis; Henriquez, Carlos A; Mariño Acebal, José Bernardo

doi:10.1613/jair.4716

Visualitza/Obre

JAIR.vol54.2015.jma-preprint.pdf (503,5Kb)

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Formiga, Lluís

Barrón-Cedeño, Alberto

Marquez, Lluis

Henriquez, Carlos A

Mariño Acebal, José Bernardo

Tipus de documentArticle

Data publicació2015-09-01

Condicions d'accésAccés obert

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets

Abstract

In this article we present a three-step methodology for dynamically improving a statistical machine translation (SMT) system by incorporating human feedback in the form of free edits on the system translations. We target at feedback provided by casual users, which is typically error-prone. Thus, we first propose a filtering step to automatically identify the better user-edited translations and discard the useless ones. A second step produces a pivot-based alignment between source and user-edited sentences, focusing on the errors made by the system. Finally, a third step produces a new translation model and combines it linearly with the one from the original system. We perform a thorough evaluation on a real-world dataset collected from the Reverso.net translation service and show that every step in our methodology contributes significantly to improve a general purpose SMT system. Interestingly, the quality improvement is not only due to the increase of lexical coverage, but to a better lexical selection, reordering, and morphology. Finally, we show the robustness of the methodology by applying it to a different scenario, in which the new examples come from an automatically Web-crawled parallel corpus. Using exactly the same architecture and models provides again a significant improvement of the translation quality of a general purpose baseline SMT system.

CitacióFormiga, L., Barron-Cedeño, A., Marquez, L., Henriquez, C., Mariño, J.B. Leveraging online user feedback to improve statistical machine translation. "Journal of artificial intelligence research", 01 Setembre 2015, vol. 54, p. 159-192.

URIhttp://hdl.handle.net/2117/86200

DOI10.1613/jair.4716

ISSN1076-9757

Versió de l'editorhttp://jair.org/media/4716/live-4716-8890-jair.pdf

Col·leccions

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
JAIR.vol54.2015.jma-preprint.pdf		503,5Kb	PDF	Visualitza/Obre

UPCommons. Portal del coneixement obert de la UPC

Leveraging online user feedback to improve statistical machine translation

Visualitza/Obre

Explora