N-gram-based statistical machine translation versus syntax augmented machine translation: comparison and system combination
Document typeConference lecture
Rights accessOpen Access
In this paper we compare and contrast two approaches to Machine Translation (MT): the CMU-UKA Syntax Augmented Machine Translation system (SAMT) and UPC-TALP N-gram-based Statistical Machine Translation (SMT). SAMT is a hierarchical syntax-driven translation system underlain by a phrase-based model and a target part parse tree. In N-gram-based SMT, the translation process is based on bilingual units related to word-to-word alignment and statistical modeling of the bilingual context following a maximumentropy framework. We provide a stepby- step comparison of the systems and report results in terms of automatic evaluation metrics and required computational resources for a smaller Arabic-to-English translation task (1.5M tokens in the training corpus). Human error analysis clarifies advantages and disadvantages of the systems under consideration. Finally, we combine the output of both systems to yield significant improvements in translation quality.
CitationKhalilov, M.; Fonollosa, José A. R. N-gram-based statistical machine translation versus syntax augmented machine translation: comparison and system combination. A: Association for Computational Linguistics. European Chapter. Conference. "12th Conference of the Europe Chapter of the Association for Computational Linguistics". 2009, p. 424-432.
All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder