Linguistic knowledge-based vocabularies for Neural Machine Translation
Cita com:
hdl:2117/330835
Tipus de documentArticle
Data publicació2020
EditorCambridge University Press
Condicions d'accésAccés obert
Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i
industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva
reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets
ProjecteTECNOLOGIAS DE APRENDIZAJE PROFUNDO APLICADAS AL PROCESADO DE VOZ Y AUDIO (MINECO-TEC2015-69266-P)
AUTONOMOUS LIFELONG LEARNING INTELLIGENT SYSTEMS (AEI-PCIN-2017-079)
AUTONOMOUS LIFELONG LEARNING INTELLIGENT SYSTEMS (AEI-PCIN-2017-079)
Abstract
Neural Networks applied to Machine Translation need a finite vocabulary to express textual information as a sequence of discrete tokens. The currently dominant subword vocabularies exploit statistically-discovered common parts of words to achieve the flexibility of character-based vocabularies without delegating the whole learning of word formation to the neural network. However, they trade this for the inability to apply word-level token associations, which limits their use in semantically-rich areas and prevents some transfer learning approaches e.g. cross-lingual pretrained embeddings, and reduces their interpretability. In this work, we propose new hybrid linguistically-grounded vocabulary definition strategies that keep both the advantages of subword vocabularies and the word-level associations, enabling neural networks to profit from the derived benefits. We test the proposed approaches in both morphologically rich and poor languages, showing that, for the former, the quality in the translation of out-of-domain texts is improved with respect to a strong subword baseline.
Descripció
This article has been published in a revised form in Natural Language Engineering https://doi.org/10.1017/S1351324920000364. This version is free to view and download for private research and study only. Not for re-distribution, re-sale or use in derivative works. © Cambridge University Press
CitacióCasas, N. [et al.]. Linguistic knowledge-based vocabularies for Neural Machine Translation. "Natural language engineering", 2020, p. 1-22.
ISSN1469-8110
Col·leccions
Fitxers | Descripció | Mida | Format | Visualitza |
---|---|---|---|---|
2020nle_linguistic_vocabs.pdf | 884,5Kb | Visualitza/Obre | ||
2020nle_linguistic_vocabs.pdf | 884,5Kb | Visualitza/Obre |