Discriminative learning within Arabic statistical machine translation

España Bonet, Cristina; Giménez, Jesús; Màrquez Villodre, Lluís

Visualitza/Obre

R09-3.ps (509,2Kb)

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

España Bonet, Cristina

Giménez, Jesús

Màrquez Villodre, Lluís

Tipus de documentReport de recerca

Data publicació2009-01

Condicions d'accésAccés obert

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets

Abstract

Written Arabic is a especially ambiguous due to the lack of diacritisation of texts, and this makes the translation harder for automatic systems that do not take into account the context of phrases. Here, we use a standard Phrase-Based Statistical Machine Translation architecture to build an Arabic-to-English translation system, but we extend it by incorporating a local discriminative phrase selection model which addresses this semantic ambiguity. Local classifiers are trained using both linguistic information and context to translate a phrase, and this significantly increases the accuracy in phrase selection with respect to the most frequent translation traditionally considered. These classifiers are integrated into the translation system so that the global task gets benefits from the discriminative learning. As a result, we obtain improvements in the full translation of Arabic documents at the lexical, syntactic and semantic levels as measured by an heterogeneous set of automatic metrics.

CitacióEspaña-Bonet, C., Giménez, J., Márquez, L. "Discriminative learning within Arabic statistical machine translation". 2009.

Forma partLSI-09-3-R

URIhttp://hdl.handle.net/2117/86942

Col·leccions

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
R09-3.ps		509,2Kb	Postscript	Visualitza/Obre

UPCommons. Portal del coneixement obert de la UPC

Discriminative learning within Arabic statistical machine translation

Visualitza/Obre

Explora