High frequent in-domain word segmentation and forward translation for the WMT21 Biomedical task
Visualitza/Obre
Estadístiques de LA Referencia / Recolecta
Inclou dades d'ús des de 2022
Cita com:
hdl:2117/366780
Tipus de documentText en actes de congrés
Data publicació2021
EditorAssociation for Computational Linguistics
Condicions d'accésAccés obert
Llevat que s'hi indiqui el contrari, els
continguts d'aquesta obra estan subjectes a la llicència de Creative Commons
:
Reconeixement-NoComercial-SenseObraDerivada 3.0 Espanya
Abstract
This paper reports the optimization of using the out-of-domain data in the Biomedical translation task. We firstly optimized our parallel training dataset using the BabelNet in-domain terminology words. Afterward, to increase the training set, we studied the effects of the out-of-domain data on biomedical translation tasks, and we created a mixture of in-domain and out-of-domain training sets and added more in-domain data using forward translation in the English-Spanish task. Finally, with a simple bpe optimization method, we increased the number of in-domain subwords in our mixed training set and trained the Transformer model on the generated data. Results show improvements using our proposed method. © 2021 Association for Computational Linguistics
CitacióRafieian, B.; Costa-jussà, M.R. High frequent in-domain word segmentation and forward translation for the WMT21 Biomedical task. A: Conference on Machine Translation. "Sixth Conference on Machine Translation: proceedings of the conference: November 10-11, 2021: WMT 2021". Stroudsburg, PA: Association for Computational Linguistics, 2021, p. 863-867. ISBN 978-1-954085-94-7.
ISBN978-1-954085-94-7
Versió de l'editorhttps://aclanthology.org/2021.wmt-1.87.pdf
Fitxers | Descripció | Mida | Format | Visualitza |
---|---|---|---|---|
2021.wmt-1.87.pdf | 166,8Kb | Visualitza/Obre |