Mostra el registre d'ítem simple
Segmentation strategies to face morphology challenges in Brazilian-Portuguese/English statistical machine translation and its integration in cross-language information retrieval
dc.contributor.author | Ruiz Costa-Jussà, Marta |
dc.contributor.other | Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions |
dc.date.accessioned | 2016-01-08T14:51:21Z |
dc.date.available | 2016-01-08T14:51:21Z |
dc.date.issued | 2015-06-01 |
dc.identifier.citation | Costa-jussà, M. R. Segmentation strategies to face morphology challenges in Brazilian-Portuguese/English statistical machine translation and its integration in cross-language information retrieval. "Computacion y sistemas", 01 Juny 2015, vol. 19, núm. 2, p. 357-370. |
dc.identifier.issn | 2007-9737 |
dc.identifier.uri | http://hdl.handle.net/2117/81165 |
dc.description.abstract | The use of morphology is particularly interesting in the context of statistical machine translation in order to reduce data sparseness and compensate any lack of training corpus. In this work, we propose several approaches to introduce morphology knowledge into a standard phrase-based machine translation system. We provide word segmentation using two different tools (COGROO and MORFESSOR) which allow to reduce the vocabulary and data sparseness. Then, we add to these segmentations the morphological information of a POS language model. We combine all these approaches using a Minimum Bayes Risk strategy. Experiments show significant improvements from the enhanced system over the baseline system on Brazilian Portuguese/English language pair. Finally, we report a case study about the impact of enhancing the statistical machine translation system with morphology in a cross-language application system such as ONAIR which allows users to look for information in video fragments through queries in natural language. |
dc.format.extent | 14 p. |
dc.language.iso | eng |
dc.rights.uri | http://creativecommons.org/licenses/by/3.0/es/ |
dc.subject | Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Llenguatge natural |
dc.subject.lcsh | Grammar, Comparative and general--Morphology |
dc.subject.lcsh | Machine translating |
dc.subject.lcsh | Portuguese language |
dc.subject.lcsh | English language |
dc.subject.other | Morphology |
dc.subject.other | Factored-based machine translation |
dc.subject.other | Cross-language information retrieval |
dc.title | Segmentation strategies to face morphology challenges in Brazilian-Portuguese/English statistical machine translation and its integration in cross-language information retrieval |
dc.type | Article |
dc.subject.lemac | Gramàtica comparada i general -- Morfologia |
dc.subject.lemac | Traducció automàtica |
dc.subject.lemac | Portuguès |
dc.subject.lemac | Anglès |
dc.identifier.doi | 10.13053/CyS-19-2-1550 |
dc.description.peerreviewed | Peer Reviewed |
dc.rights.access | Open Access |
local.identifier.drac | 17370655 |
dc.description.version | Postprint (published version) |
local.citation.author | Costa-jussà, M. R. |
local.citation.publicationName | Computacion y sistemas |
local.citation.volume | 19 |
local.citation.number | 2 |
local.citation.startingPage | 357 |
local.citation.endingPage | 370 |
Fitxers d'aquest items
Aquest ítem apareix a les col·leccions següents
-
Articles de revista [2.526]