Using collocation segmentation to augment the phrase table

Henriquez, Carlos A; Ruiz Costa-Jussà, Marta; Daudaravicius, Vidas; Banchs Martínez, Rafael Enrique; Mariño, José B.

Visualitza/Obre

W10-1712.pdf (398,2Kb)

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Henriquez, Carlos A

Ruiz Costa-Jussà, Marta

Daudaravicius, Vidas

Banchs Martínez, Rafael Enrique

Mariño, José B.

Tipus de documentText en actes de congrés

Data publicació2010

EditorAssociation for Computational Linguistics

Condicions d'accésAccés obert

Attribution-NonCommercial-NoDerivs 3.0 Spain

Llevat que s'hi indiqui el contrari, els continguts d'aquesta obra estan subjectes a la llicència de Creative Commons : Reconeixement-NoComercial-SenseObraDerivada 3.0 Espanya

Abstract

This paper describes the 2010 phrase-based statistical machine translation system developed at the TALP Research Center of the UPC1 in cooperation with BMIC2 and VMU3. In phrase-based SMT, the phrase table is the main tool in translation. It is created extracting phrases from an aligned parallel corpus and then computing translation model scores with them. Performing a collocation segmentation over the source and target corpus before the alignment causes that di erent and larger phrases are extracted from the same original documents. We performed this segmentation and used the union of this phrase set with the phrase set extracted from the nonsegmented corpus to compute the phrase table. We present the con gurations considered and also report results obtained with internal and o cial test sets.

CitacióHenriquez, C., Ruiz, M., Daudaravicius, V., Banchs, R., Mariño, J. Using collocation segmentation to augment the phrase table. A: Workshop on Statistical Machine Translation and MetricsMATR. "ACL 2010 Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR (WMT '10) : Uppsala, Sweden, July 15 - 16, 2010". Association for Computational Linguistics, 2010, p. 98-102.

URIhttp://hdl.handle.net/2117/102341

ISBN978-1-932432-71-8

Col·leccions

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
W10-1712.pdf		398,2Kb	PDF	Visualitza/Obre

UPCommons. Portal del coneixement obert de la UPC

Using collocation segmentation to augment the phrase table

Visualitza/Obre

Explora