Mostra el registre d'ítem simple
Overcoming statistical machine translation limitations: error analysis and proposed solutions for the Catalan–Spanish language pair
dc.contributor.author | Mariño Acebal, José Bernardo |
dc.contributor.author | Farrús Cabeceran, Mireia |
dc.contributor.author | Ruiz Costa-Jussà, Marta |
dc.contributor.author | Poch, Marc |
dc.contributor.author | Hernández Huerta, Adolfo |
dc.contributor.author | Herníquez, Carlos |
dc.contributor.author | Rodríguez Fonollosa, José Adrián |
dc.contributor.other | Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions |
dc.date.accessioned | 2011-05-30T16:07:41Z |
dc.date.available | 2011-05-30T16:07:41Z |
dc.date.created | 2011-02-20 |
dc.date.issued | 2011-02-20 |
dc.identifier.citation | Mariño, J. [et al.]. Overcoming statistical machine translation limitations: error analysis and proposed solutions for the Catalan–Spanish language pair. "Language resources and evaluation", 20 Febrer 2011, vol. 45, núm. 2, p. 181-208. |
dc.identifier.issn | 1574-020X |
dc.identifier.uri | http://hdl.handle.net/2117/12676 |
dc.description.abstract | This work aims to improve anN-gram-based statistical machine translation system between the Catalan and Spanish languages, trained with an aligned Spanish– Catalan parallel corpus consisting of 1.7 million sentences taken from El Periódico newspaper. Starting from a linguistic error analysis above this baseline system, orthographic, morphological, lexical, semantic and syntactic problems are approached using a set of techniques. The proposed solutions include the development and application of additional statistical techniques, text pre- and post-processing tasks, and rules based on the use of grammatical categories, as well as lexical categorization. The performance of the improved system is clearly increased, as is shown in both human and automatic evaluations of the system, with a gain of about 1.1 points BLEU observed in the Spanish-to-Catalan direction of translation, and a gain of about 0.5 points in the reverse direction. The final system is freely available online as a linguistic resource |
dc.format.extent | 28 p. |
dc.language.iso | eng |
dc.subject | Àrees temàtiques de la UPC::Enginyeria de la telecomunicació |
dc.subject.lcsh | Statistical machine translation |
dc.subject.lcsh | N-gram-based translation |
dc.subject.lcsh | Linguistic knowledge |
dc.subject.lcsh | Grammatical categories |
dc.subject.lcsh | Signal theory (Telecommunication) |
dc.subject.lcsh | Language and Speech Technologies |
dc.title | Overcoming statistical machine translation limitations: error analysis and proposed solutions for the Catalan–Spanish language pair |
dc.type | Article |
dc.subject.lemac | Parla |
dc.contributor.group | Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla |
dc.identifier.doi | 10.1007/s10579-011-9137-0 |
dc.relation.publisherversion | http://dx.doi.org/10.1007/s10579-011-9137-0 |
dc.rights.access | Open Access |
local.identifier.drac | 5767412 |
dc.description.version | Postprint (published version) |
local.citation.author | Mariño, J.; Farrus, M.; Costa-Jussà, M. R.; Poch, M.; Hernandez, A.; Herníquez, C.; Fonollosa, José A. R. |
local.citation.publicationName | Language resources and evaluation |
local.citation.volume | 45 |
local.citation.number | 2 |
local.citation.startingPage | 181 |
local.citation.endingPage | 208 |
Fitxers d'aquest items
Aquest ítem apareix a les col·leccions següents
-
Articles de revista [172]
-
Articles de revista [2.526]