Mostra el registre d'ítem simple

dc.contributor.authorBarrón-Cedeño, Alberto
dc.contributor.authorEspaña Bonet, Cristina
dc.contributor.authorBoldoba Trapote, Josu
dc.contributor.authorMárquez Villodre, Luís
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament de Ciències de la Computació
dc.date.accessioned2015-09-04T08:34:56Z
dc.date.available2015-09-04T08:34:56Z
dc.date.issued2015
dc.identifier.citationBarron-Cedeño, A., España-Bonet, C., Boldoba, J., Márquez , L. A factory of comparable corpora from Wikipedia. A: Workshop on Building and Using Comparable Corpora. "Proceedings of the Eighth Workshop on Building and Using Comparable Corpora". Beijing: Association for Computational Linguistics, 2015, p. 3-13.
dc.identifier.isbn978-1-941643-60-0
dc.identifier.urihttp://hdl.handle.net/2117/76611
dc.description.abstractMultiple approaches to grab comparable data from the Web have been developed up to date. Nevertheless, coming out with a high-quality comparable corpus of a specific topic is not straightforward. We present a model for the automatic extraction of comparable texts in multiple languages and on specific topics from Wikipedia. In order to prove the value of the model, we automatically extract parallel sentences from the comparable collections and use them to train statistical machine translation engines for specific domains. Our experiments on the English–Spanish pair in the domains of Computer Science, Science, and Sports show that our in-domain translator performs significantly better than a generic one when translating in-domain Wikipedia articles. Moreover, we show that these corpora can help when translating out-of-domain texts
dc.format.extent11 p.
dc.language.isoeng
dc.publisherAssociation for Computational Linguistics
dc.subjectÀrees temàtiques de la UPC::Informàtica
dc.subject.lcshComputational linguistics
dc.subject.lcshWikipedia
dc.subject.othercomparable corpora
dc.subject.otherWikipedia
dc.subject.othermultilingual
dc.subject.otherparallel corpora
dc.subject.othertranslation
dc.titleA factory of comparable corpora from Wikipedia
dc.typeConference report
dc.subject.lemacLingüística computacional -- Metodologia
dc.contributor.groupUniversitat Politècnica de Catalunya. GPLN - Grup de Processament del Llenguatge Natural
dc.description.peerreviewedPeer Reviewed
dc.relation.publisherversionhttp://aclweb.org/anthology/W/W15/W15-3402.pdf
dc.rights.accessOpen Access
local.identifier.drac16835606
dc.description.versionPostprint (published version)
local.citation.authorBarron-Cedeño, A.; España-Bonet, C.; Boldoba, J.; Márquez, L.
local.citation.contributorWorkshop on Building and Using Comparable Corpora
local.citation.pubplaceBeijing
local.citation.publicationNameProceedings of the Eighth Workshop on Building and Using Comparable Corpora
local.citation.startingPage3
local.citation.endingPage13


Fitxers d'aquest items

Thumbnail

Aquest ítem apareix a les col·leccions següents

Mostra el registre d'ítem simple