Using collocation segmentation to augment the phrase table
Document typeConference report
PublisherAssociation for Computational Linguistics
Rights accessOpen Access
This paper describes the 2010 phrase-based statistical machine translation system developed at the TALP Research Center of the UPC1 in cooperation with BMIC2 and VMU3. In phrase-based SMT, the phrase table is the main tool in translation. It is created extracting phrases from an aligned parallel corpus and then computing translation model scores with them. Performing a collocation segmentation over the source and target corpus before the alignment causes that di erent and larger phrases are extracted from the same original documents. We performed this segmentation and used the union of this phrase set with the phrase set extracted from the nonsegmented corpus to compute the phrase table. We present the con gurations considered and also report results obtained with internal and o cial test sets.
CitationHenriquez, C., Ruiz, M., Daudaravicius, V., Banchs, R., Mariño, J. Using collocation segmentation to augment the phrase table. A: Workshop on Statistical Machine Translation and MetricsMATR. "ACL 2010 Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR (WMT '10) : Uppsala, Sweden, July 15 - 16, 2010". Association for Computational Linguistics, 2010, p. 98-102.