Mostra el registre d'ítem simple

dc.contributor.authorMariño Acebal, José Bernardo
dc.contributor.authorFarrús Cabeceran, Mireia
dc.contributor.authorRuiz Costa-Jussà, Marta
dc.contributor.authorPoch, Marc
dc.contributor.authorHernández Huerta, Adolfo
dc.contributor.authorHerníquez, Carlos
dc.contributor.authorRodríguez Fonollosa, José Adrián
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions
dc.date.accessioned2011-05-30T16:07:41Z
dc.date.available2011-05-30T16:07:41Z
dc.date.created2011-02-20
dc.date.issued2011-02-20
dc.identifier.citationMariño, J. [et al.]. Overcoming statistical machine translation limitations: error analysis and proposed solutions for the Catalan–Spanish language pair. "Language resources and evaluation", 20 Febrer 2011, vol. 45, núm. 2, p. 181-208.
dc.identifier.issn1574-020X
dc.identifier.urihttp://hdl.handle.net/2117/12676
dc.description.abstractThis work aims to improve anN-gram-based statistical machine translation system between the Catalan and Spanish languages, trained with an aligned Spanish– Catalan parallel corpus consisting of 1.7 million sentences taken from El Periódico newspaper. Starting from a linguistic error analysis above this baseline system, orthographic, morphological, lexical, semantic and syntactic problems are approached using a set of techniques. The proposed solutions include the development and application of additional statistical techniques, text pre- and post-processing tasks, and rules based on the use of grammatical categories, as well as lexical categorization. The performance of the improved system is clearly increased, as is shown in both human and automatic evaluations of the system, with a gain of about 1.1 points BLEU observed in the Spanish-to-Catalan direction of translation, and a gain of about 0.5 points in the reverse direction. The final system is freely available online as a linguistic resource
dc.format.extent28 p.
dc.language.isoeng
dc.subjectÀrees temàtiques de la UPC::Enginyeria de la telecomunicació
dc.subject.lcshStatistical machine translation
dc.subject.lcshN-gram-based translation
dc.subject.lcshLinguistic knowledge
dc.subject.lcshGrammatical categories
dc.subject.lcshSignal theory (Telecommunication)
dc.subject.lcshLanguage and Speech Technologies
dc.titleOvercoming statistical machine translation limitations: error analysis and proposed solutions for the Catalan–Spanish language pair
dc.typeArticle
dc.subject.lemacParla
dc.contributor.groupUniversitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla
dc.identifier.doi10.1007/s10579-011-9137-0
dc.relation.publisherversionhttp://dx.doi.org/10.1007/s10579-011-9137-0
dc.rights.accessOpen Access
local.identifier.drac5767412
dc.description.versionPostprint (published version)
local.citation.authorMariño, J.; Farrus, M.; Costa-Jussà, M. R.; Poch, M.; Hernandez, A.; Herníquez, C.; Fonollosa, José A. R.
local.citation.publicationNameLanguage resources and evaluation
local.citation.volume45
local.citation.number2
local.citation.startingPage181
local.citation.endingPage208


Fitxers d'aquest items

Thumbnail

Aquest ítem apareix a les col·leccions següents

Mostra el registre d'ítem simple