On the impact of morphology in English to Spanish statistical MT
Visualitza/Obre
On the impact of morphology in English to Spanish statistical MT.pdf (208,9Kb) (Accés restringit)
Sol·licita una còpia a l'autor
Què és aquest botó?
Aquest botó permet demanar una còpia d'un document restringit a l'autor. Es mostra quan:
- Disposem del correu electrònic de l'autor
- El document té una mida inferior a 20 Mb
- Es tracta d'un document d'accés restringit per decisió de l'autor o d'un document d'accés restringit per política de l'editorial
10.1016/j.specom.2008.05.003
Inclou dades d'ús des de 2022
Cita com:
hdl:2117/79198
Tipus de documentArticle
Data publicació2008-12-31
Condicions d'accésAccés restringit per política de l'editorial
Llevat que s'hi indiqui el contrari, els
continguts d'aquesta obra estan subjectes a la llicència de Creative Commons
:
Reconeixement-NoComercial-SenseObraDerivada 3.0 Espanya
Abstract
This paper presents a thorough study of the impact of morphology derivation on N-gram-based Statistical Machine Translation (SMT) models from English into a morphology-rich language such as Spanish. For this purpose, we define a framework under the assumption that a certain degree of morphology-related information is not only being ignored by current statistical translation models, but also has a negative impact on their estimation due to the data sparseness it causes. Moreover, we describe how this information can be decoupled from the standard bilingual N-gram models and introduced separately by means of a well-defined and better informed feature-based classification task.
Results are presented for the European Parliament Plenary Sessions (EPPS) English ¿ Spanish task, showing oracle scores based on to what extent SMT models can benefit from simplifying Spanish morphological surface forms for each Part-Of-Speech category. We show that verb form morphological richness greatly weakens the standard statistical models, and we carry out a posterior morphology classification by defining a simple set of features and applying machine learning techniques.
In addition to that, we propose a simple technique to deal with Spanish enclitic pronouns. Both techniques are empirically evaluated and final translation results show improvements over the baseline by just dealing with Spanish morphology. In principle, the study is also valid for translation from English into any other Romance language (Portuguese, Catalan, French, Galician, Italian, etc.).
The proposed method can be applied to both monotonic and non-monotonic decoding scenarios, thus revealing the interaction between word-order decoding and the proposed morphology simplification techniques. Overall results achieve statistically significant improvement over baseline performance in this demanding task.
Citacióde Gispert, A., Mariño, J.B. On the impact of morphology in English to Spanish statistical MT. "Speech communication", 31 Desembre 2008, vol. 50, núm. 11, p. 1034-1046.
ISSN0167-6393
Versió de l'editorhttp://www.sciencedirect.com/science/article/pii/S0167639308000769
Fitxers | Descripció | Mida | Format | Visualitza |
---|---|---|---|---|
On the impact o ... Spanish statistical MT.pdf | 208,9Kb | Accés restringit |