A Deep source-context feature for lexical selection in statistical machine translation
Rights accessOpen Access
This paper presents a methodology to address lexical disambiguation in a standard phrase-based statistical machine translation system. Similarity among source contexts is used to select appropriate translation units. The information is introduced as a novel feature of the phrase-based model and it is used to select the translation units extracted from the training sentence more similar to the sentence to translate. The similarity is computed through a deep autoencoder representation, which allows to obtain effective low-dimensional embedding of data and statistically significant BLEU score improvements on two different tasks (English-to-Spanish and English-to-Hindi). (C) 2016 Elsevier B.V. All rights reserved.
CitationGupta, P., Ruiz, M., Rosso, P., Banchs, R. A Deep source-context feature for lexical selection in statistical machine translation. "Pattern recognition letters", 1 Maig 2016, vol. 75, p. 24-29.