In this paper we apply distributional semantic information to document-level machine translation. We train monolingual and bilingual word vector models on large corpora and we evaluate them first in a cross-lingual lexical substitution task and then on the final translation task. For translation, we incorporate the semantic information in a statistical document-level decoder (Docent), by enforcing translation choices that are semantically similar to the context. As expected, the bilingual word vector models are more appropriate for the purpose of translation. The final document-level translator incorporating
the semantic model outperforms the basic Docent (without semantics) and also
performs slightly over a standard sentence level SMT system in terms of ULC (the average of a set of standard automatic evaluation metrics for MT). Finally, we also present some manual analysis of the translations of some concrete documents
CitationMartinez, E.; España-Bonet, C.; Márquez , L. Document-level machine translation with word vector models. A: Annual Conference of the European Association for Machine Translation. "Proceedings of the 18th Annual Conference of the European Association for Machine Translation". Antalya: 2015, p. 59-66.
All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder. If you wish to make any use of the work not provided for in the law, please contact: firstname.lastname@example.org