Using collocation segmentation to augment the phrase table
Visualitza/Obre
Estadístiques de LA Referencia / Recolecta
Inclou dades d'ús des de 2022
Cita com:
hdl:2117/102341
Tipus de documentText en actes de congrés
Data publicació2010
EditorAssociation for Computational Linguistics
Condicions d'accésAccés obert
Llevat que s'hi indiqui el contrari, els
continguts d'aquesta obra estan subjectes a la llicència de Creative Commons
:
Reconeixement-NoComercial-SenseObraDerivada 3.0 Espanya
Abstract
This paper describes the 2010 phrase-based statistical machine translation system developed at the TALP Research Center of the UPC1 in cooperation with BMIC2 and VMU3. In phrase-based SMT, the phrase table is the main tool in translation. It is created extracting phrases from an aligned parallel corpus and then computing translation model scores with them. Performing a collocation segmentation over the source and target corpus before the alignment causes that di erent and larger phrases are extracted from the same original documents. We performed this segmentation and used the union of this phrase set with the phrase set extracted from the nonsegmented corpus to compute the phrase table. We present the con gurations considered and also report results obtained with internal and o cial test sets.
CitacióHenriquez, C., Ruiz, M., Daudaravicius, V., Banchs, R., Mariño, J. Using collocation segmentation to augment the phrase table. A: Workshop on Statistical Machine Translation and MetricsMATR. "ACL 2010 Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR (WMT '10) : Uppsala, Sweden, July 15 - 16, 2010". Association for Computational Linguistics, 2010, p. 98-102.
ISBN978-1-932432-71-8
Fitxers | Descripció | Mida | Format | Visualitza |
---|---|---|---|---|
W10-1712.pdf | 398,2Kb | Visualitza/Obre |