High frequent in-domain word segmentation and forward translation for the WMT21 Biomedical task

View/Open
Cita com:
hdl:2117/366780
Document typeConference report
Defense date2021
PublisherAssociation for Computational Linguistics
Rights accessOpen Access
This work is protected by the corresponding intellectual and industrial property rights.
Except where otherwise noted, its contents are licensed under a Creative Commons license
:
Attribution-NonCommercial-NoDerivs 3.0 Spain
Abstract
This paper reports the optimization of using the out-of-domain data in the Biomedical translation task. We firstly optimized our parallel training dataset using the BabelNet in-domain terminology words. Afterward, to increase the training set, we studied the effects of the out-of-domain data on biomedical translation tasks, and we created a mixture of in-domain and out-of-domain training sets and added more in-domain data using forward translation in the English-Spanish task. Finally, with a simple bpe optimization method, we increased the number of in-domain subwords in our mixed training set and trained the Transformer model on the generated data. Results show improvements using our proposed method. © 2021 Association for Computational Linguistics
CitationRafieian, B.; Costa-jussà, M.R. High frequent in-domain word segmentation and forward translation for the WMT21 Biomedical task. A: Conference on Machine Translation. "Sixth Conference on Machine Translation: proceedings of the conference: November 10-11, 2021: WMT 2021". Stroudsburg, PA: Association for Computational Linguistics, 2021, p. 863-867. ISBN 978-1-954085-94-7.
ISBN978-1-954085-94-7
Publisher versionhttps://aclanthology.org/2021.wmt-1.87.pdf
Files | Description | Size | Format | View |
---|---|---|---|---|
2021.wmt-1.87.pdf | 166,8Kb | View/Open |