UPC-BMIC-VDU system description for the IWSLT 2010: testing several collocation segmentations in a phrase-based SMT system

Henriquez, Carlos A; Ruiz Costa-Jussà, Marta; Daudaravicius, Vidas; Banchs, Rafael E.; Mariño, José B.

Visualitza/Obre

iwslt10_ec_upc.pdf (454,6Kb)

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Henriquez, Carlos A

Ruiz Costa-Jussà, Marta

Daudaravicius, Vidas

Banchs, Rafael E.

Mariño, José B.

Tipus de documentText en actes de congrés

Data publicació2010

Condicions d'accésAccés obert

Attribution-NonCommercial-NoDerivs 3.0 Spain

Llevat que s'hi indiqui el contrari, els continguts d'aquesta obra estan subjectes a la llicència de Creative Commons : Reconeixement-NoComercial-SenseObraDerivada 3.0 Espanya

Abstract

This paper describes the UPC-BMIC-VMU participation in the IWSLT 2010 evaluation campaign. The SMT system is a standard phrase-based enriched with novel segmentations. These novel segmentations are computed using statistical measures such as Log-likelihood, T-score, Chi-squared, Dice, Mutual Information or Gravity-Counts. The analysis of translation results allows to divide measures into three groups. First, Log-likelihood, Chi-squared and T-score tend to combine high frequency words and collocation segments are very short. They improve the SMT system by adding new translation units. Second, Mutual Information and Dice tend to combine low frequency words and collocation segments are short. They improve the SMT system by smoothing the translation units. And third, Gravity- Counts tends to combine high and low frequency words and collocation segments are long. However, in this case, the SMT system is not improved. Thus, the road-map for translation system improvement is to introduce new phrases with either low frequency or high frequency words. It is hard to introduce new phrases with low and high frequency words in order to improve translation quality. Experimental results are reported in the Frenchto- English IWSLT 2010 evaluation where our system was ranked 3rd out of nine systems.

CitacióHenriquez, C., Ruiz, M., Daudaravicius, V., Banchs, R., Mariño, J. UPC-BMIC-VDU system description for the IWSLT 2010: testing several collocation segmentations in a phrase-based SMT system. A: International Workshop on Spoken Language Translation. "Proceedings of IWSLT 2010, Paris, France". 2010, p. 189-195.

URIhttp://hdl.handle.net/2117/102470

Versió de l'editorhttp://www.isca-speech.org/archive/iwslt_10/slta_189.html

Col·leccions

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
iwslt10_ec_upc.pdf		454,6Kb	PDF	Visualitza/Obre

UPCommons. Portal del coneixement obert de la UPC

UPC-BMIC-VDU system description for the IWSLT 2010: testing several collocation segmentations in a phrase-based SMT system

Visualitza/Obre

Explora