Linguistic-family-specific encoders and decoders for multilingual machine translation
Tipus de documentProjecte Final de Màster Oficial
Data2022-02-01
Condicions d'accésAccés obert
Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i
industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva
reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets
Abstract
Multilingual Machine Translation has been approached from different perspectives including the shared and the language-specific encoders-decoders. The shared one uses a single encoder and decoder for all languages but the language-specific encoders-decoders allocate encoder and decoder for each language. Both perspectives have their benefits and drawbacks on translation quality and resource consumption aspect. To find a balance between these two factors, this project explores a new approach that is to share the encoders and decoders for language families. The new model was trained and tested on the TED2020 dataset with 21 chosen languages to form 4 language families. Comparison between the all-language shared baseline and our model shows a great improvement in BLEU score which can from 3 points to a maximum of 10 points according to the family pairs. The new model also has a good performance of zero-shot translation, which outperforms that of the baseline model and the improvement follows the rule of growth concluded from the model training.
TitulacióMÀSTER UNIVERSITARI EN ENGINYERIA DE TELECOMUNICACIÓ (Pla 2013)
Col·leccions
Fitxers | Descripció | Mida | Format | Visualitza |
---|---|---|---|---|
Final-Report.pdf | 687,6Kb | Visualitza/Obre | ||
ANNEXES-CODES.zip | 2,989Mb | application/zip | Visualitza/Obre |