Mitigating social biases in machine translation using domain adaptation techniques
Visualitza/Obre
Estadístiques de LA Referencia / Recolecta
Inclou dades d'ús des de 2022
Cita com:
hdl:2117/334921
Tipus de documentProjecte Final de Màster Oficial
Data2020-09
Condicions d'accésAccés obert
Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i
industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva
reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets
Abstract
Misrepresentation of certain communities in current datasets is causing serious disruptions in artificial intelligence applications. Examples of this can be found from lower performance of speech recognizers for women than for men to lower accuracy in face recognition for Asian faces compared to American or European ones. It also amplifies stereotypes in Machine Translation. These challenges are at the core of natural language processing applications and, in particular, there are many works focusing on trying to solve gender biases. Previous research in the area of Machine Translation (MT) has proposed to either mitigate biases by means of using debiased word embeddings and using contextual information or evaluating and measuring the amount of bias present in the translation. The closest work to ours is the one by were authors generate a very small gender-balanced dataset and use techniques of Elastic Weight Consolidation to perform transfer learning and mitigate the consequences of training with unbalanced datasets. Differently from this one, we use a larger non-synthetic balanced dataset to perform fine-tunning on an unbalanced-dataset and evaluate the reduction of presence of gender bias in the final translation. We also evaluate the gender bias in word embedding models like in, and conclude that they can be successfully applied to downstream systems in the case of the gender-balanced dataset. The results are not exactly what we expected, since our hypothesis was that the model which would eliminate the gender bias to a greater degree would be the model that was fine-tuned with only the balanced dataset. This has not been the case, given some known difficulties that translation models have when adapting to a new and totally different distribution of data, i.e. catastrophic forgetting, which means that the model fits the new distribution but forgets the one which was trained on before. Some regularization techniques like dropout or adaptive learning rate have been applied, without having a significant improvement. Nevertheless, results show that even if the balanced dataset is from a different domain than the training and the test of the NMT system, it does improve the translation quality (up to 2 BLEU points) and it is able to mitigate the gender bias in a significant amount, up to a 12.5\% accuracy.
MatèriesNeural networks (Computer science), Machine translating, Xarxes neuronals (Informàtica), Traducció automàtica
TitulacióMÀSTER UNIVERSITARI EN ENGINYERIA DE TELECOMUNICACIÓ (Pla 2013)
Col·leccions
Fitxers | Descripció | Mida | Format | Visualitza |
---|---|---|---|---|
AdriaDeJorge_FinalThesis.pdf | 659,7Kb | Visualitza/Obre |