Mostra el registre d'ítem simple

dc.contributor.authorRuiz Costa-Jussà, Marta
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions
dc.date.accessioned2016-01-08T14:51:21Z
dc.date.available2016-01-08T14:51:21Z
dc.date.issued2015-06-01
dc.identifier.citationCosta-jussà, M. R. Segmentation strategies to face morphology challenges in Brazilian-Portuguese/English statistical machine translation and its integration in cross-language information retrieval. "Computacion y sistemas", 01 Juny 2015, vol. 19, núm. 2, p. 357-370.
dc.identifier.issn2007-9737
dc.identifier.urihttp://hdl.handle.net/2117/81165
dc.description.abstractThe use of morphology is particularly interesting in the context of statistical machine translation in order to reduce data sparseness and compensate any lack of training corpus. In this work, we propose several approaches to introduce morphology knowledge into a standard phrase-based machine translation system. We provide word segmentation using two different tools (COGROO and MORFESSOR) which allow to reduce the vocabulary and data sparseness. Then, we add to these segmentations the morphological information of a POS language model. We combine all these approaches using a Minimum Bayes Risk strategy. Experiments show significant improvements from the enhanced system over the baseline system on Brazilian Portuguese/English language pair. Finally, we report a case study about the impact of enhancing the statistical machine translation system with morphology in a cross-language application system such as ONAIR which allows users to look for information in video fragments through queries in natural language.
dc.format.extent14 p.
dc.language.isoeng
dc.rights.urihttp://creativecommons.org/licenses/by/3.0/es/
dc.subjectÀrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Llenguatge natural
dc.subject.lcshGrammar, Comparative and general--Morphology
dc.subject.lcshMachine translating
dc.subject.lcshPortuguese language
dc.subject.lcshEnglish language
dc.subject.otherMorphology
dc.subject.otherFactored-based machine translation
dc.subject.otherCross-language information retrieval
dc.titleSegmentation strategies to face morphology challenges in Brazilian-Portuguese/English statistical machine translation and its integration in cross-language information retrieval
dc.typeArticle
dc.subject.lemacGramàtica comparada i general -- Morfologia
dc.subject.lemacTraducció automàtica
dc.subject.lemacPortuguès
dc.subject.lemacAnglès
dc.identifier.doi10.13053/CyS-19-2-1550
dc.description.peerreviewedPeer Reviewed
dc.rights.accessOpen Access
local.identifier.drac17370655
dc.description.versionPostprint (published version)
local.citation.authorCosta-jussà, M. R.
local.citation.publicationNameComputacion y sistemas
local.citation.volume19
local.citation.number2
local.citation.startingPage357
local.citation.endingPage370


Fitxers d'aquest items

Thumbnail

Aquest ítem apareix a les col·leccions següents

Mostra el registre d'ítem simple