Overcoming statistical machine translation limitations: error analysis and proposed solutions for the Catalan–Spanish language pair

Mariño Acebal, José Bernardo; Farrús Cabeceran, Mireia; Ruiz Costa-Jussà, Marta; Poch, Marc; Hernández Huerta, Adolfo; Herníquez, Carlos; Rodríguez Fonollosa, José Adrián

doi:10.1007/s10579-011-9137-0

Visualitza/Obre

Mireia2011_LRE.pdf (387,1Kb)

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Mariño Acebal, José Bernardo

Farrús Cabeceran, Mireia

Ruiz Costa-Jussà, Marta

Poch, Marc

Hernández Huerta, Adolfo

Herníquez, Carlos

Rodríguez Fonollosa, José Adrián

Tipus de documentArticle

Data publicació2011-02-20

Condicions d'accésAccés obert

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets

Abstract

This work aims to improve anN-gram-based statistical machine translation system between the Catalan and Spanish languages, trained with an aligned Spanish– Catalan parallel corpus consisting of 1.7 million sentences taken from El Periódico newspaper. Starting from a linguistic error analysis above this baseline system, orthographic, morphological, lexical, semantic and syntactic problems are approached using a set of techniques. The proposed solutions include the development and application of additional statistical techniques, text pre- and post-processing tasks, and rules based on the use of grammatical categories, as well as lexical categorization. The performance of the improved system is clearly increased, as is shown in both human and automatic evaluations of the system, with a gain of about 1.1 points BLEU observed in the Spanish-to-Catalan direction of translation, and a gain of about 0.5 points in the reverse direction. The final system is freely available online as a linguistic resource

CitacióMariño, J. [et al.]. Overcoming statistical machine translation limitations: error analysis and proposed solutions for the Catalan–Spanish language pair. "Language resources and evaluation", 20 Febrer 2011, vol. 45, núm. 2, p. 181-208.

URIhttp://hdl.handle.net/2117/12676

DOI10.1007/s10579-011-9137-0

ISSN1574-020X

Versió de l'editorhttp://dx.doi.org/10.1007/s10579-011-9137-0

Col·leccions

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
Mireia2011_LRE.pdf		387,1Kb	PDF	Visualitza/Obre

UPCommons. Portal del coneixement obert de la UPC

Overcoming statistical machine translation limitations: error analysis and proposed solutions for the Catalan–Spanish language pair

Visualitza/Obre

Explora