A Richly annotated, multilingual parallel corpus for hybrid machine translation
Document typeConference report
PublisherEuropean Language Resources Association (ELRA)
Rights accessOpen Access
In recent years, machine translation (MT) research has focused on investigating how hybrid machine translation as well as system combination approachescan bedesigned so that theresulting hybrid translationsshow an improvement over theindividual “component” translations. As a first step towards achieving this objectivewe have developed a parallel corpuswith source text and the corresponding translation output from a number of machine translation engines, annotated with metadata information, capturing aspects of the translation process performed by the different MT systems. This corpus aims to serve as a basic resource for further research on whether hybridmachinetranslation algorithmsand systemcombination techniques can benefit fromadditional (linguistically motivated, decoding, and runtime) information provided by thedifferent systems involved. In this paper, wedescribe the annotated corpuswehave created. We provide an overview on the component MT systems and the XLIFF-based annotation format we have developed. We also report on first experimentswith theML4HMT corpus data.
CitationAvramidis, E., Ruiz, M., Federmann, C., Melero, M., Pecina, P., Van Genabith, J. A Richly annotated, multilingual parallel corpus for hybrid machine translation. A: International Conference on Language Resources and Evaluation. "Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)". European Language Resources Association (ELRA), 2012, p. 2189-2193.