Mitigating social biases in machine translation using domain adaptation techniques

Jorge Sánchez, Adrián de

Visualitza/Obre

AdriaDeJorge_FinalThesis.pdf (659,7Kb)

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Jorge Sánchez, Adrián de

Tutor / directorRuiz Costa-Jussà, Marta

Tipus de documentProjecte Final de Màster Oficial

Data2020-09

Condicions d'accésAccés obert

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets

Abstract

Misrepresentation of certain communities in current datasets is causing serious disruptions in artificial intelligence applications. Examples of this can be found from lower performance of speech recognizers for women than for men to lower accuracy in face recognition for Asian faces compared to American or European ones. It also amplifies stereotypes in Machine Translation. These challenges are at the core of natural language processing applications and, in particular, there are many works focusing on trying to solve gender biases. Previous research in the area of Machine Translation (MT) has proposed to either mitigate biases by means of using debiased word embeddings and using contextual information or evaluating and measuring the amount of bias present in the translation. The closest work to ours is the one by were authors generate a very small gender-balanced dataset and use techniques of Elastic Weight Consolidation to perform transfer learning and mitigate the consequences of training with unbalanced datasets. Differently from this one, we use a larger non-synthetic balanced dataset to perform fine-tunning on an unbalanced-dataset and evaluate the reduction of presence of gender bias in the final translation. We also evaluate the gender bias in word embedding models like in, and conclude that they can be successfully applied to downstream systems in the case of the gender-balanced dataset. The results are not exactly what we expected, since our hypothesis was that the model which would eliminate the gender bias to a greater degree would be the model that was fine-tuned with only the balanced dataset. This has not been the case, given some known difficulties that translation models have when adapting to a new and totally different distribution of data, i.e. catastrophic forgetting, which means that the model fits the new distribution but forgets the one which was trained on before. Some regularization techniques like dropout or adaptive learning rate have been applied, without having a significant improvement. Nevertheless, results show that even if the balanced dataset is from a different domain than the training and the test of the NMT system, it does improve the translation quality (up to 2 BLEU points) and it is able to mitigate the gender bias in a significant amount, up to a 12.5\% accuracy.

MatèriesNeural networks (Computer science), Machine translating, Xarxes neuronals (Informàtica), Traducció automàtica

TitulacióMÀSTER UNIVERSITARI EN ENGINYERIA DE TELECOMUNICACIÓ (Pla 2013)

URIhttp://hdl.handle.net/2117/334921

Col·leccions

Màsters oficials - Master's degree in Telecommunications Engineering (MET) [393]

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
AdriaDeJorge_FinalThesis.pdf		659,7Kb	PDF	Visualitza/Obre

UPCommons. Portal del coneixement obert de la UPC

Mitigating social biases in machine translation using domain adaptation techniques

Visualitza/Obre

Explora