Mostra el registre d'ítem simple
Linguistic knowledge-based vocabularies for Neural Machine Translation
dc.contributor.author | Casas Manzanares, Noé |
dc.contributor.author | Ruiz Costa-Jussà, Marta |
dc.contributor.author | Rodríguez Fonollosa, José Adrián |
dc.contributor.author | Alonso, Juan |
dc.contributor.author | Fanlo, Ramon |
dc.contributor.other | Universitat Politècnica de Catalunya. Doctorat en Teoria del Senyal i Comunicacions |
dc.contributor.other | Universitat Politècnica de Catalunya. Departament de Ciències de la Computació |
dc.contributor.other | Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions |
dc.date.accessioned | 2020-10-26T18:05:03Z |
dc.date.available | 2021-01-02T01:32:13Z |
dc.date.issued | 2020 |
dc.identifier.citation | Casas, N. [et al.]. Linguistic knowledge-based vocabularies for Neural Machine Translation. "Natural language engineering", 2020, p. 1-22. |
dc.identifier.issn | 1469-8110 |
dc.identifier.uri | http://hdl.handle.net/2117/330835 |
dc.description | This article has been published in a revised form in Natural Language Engineering https://doi.org/10.1017/S1351324920000364. This version is free to view and download for private research and study only. Not for re-distribution, re-sale or use in derivative works. © Cambridge University Press |
dc.description.abstract | Neural Networks applied to Machine Translation need a finite vocabulary to express textual information as a sequence of discrete tokens. The currently dominant subword vocabularies exploit statistically-discovered common parts of words to achieve the flexibility of character-based vocabularies without delegating the whole learning of word formation to the neural network. However, they trade this for the inability to apply word-level token associations, which limits their use in semantically-rich areas and prevents some transfer learning approaches e.g. cross-lingual pretrained embeddings, and reduces their interpretability. In this work, we propose new hybrid linguistically-grounded vocabulary definition strategies that keep both the advantages of subword vocabularies and the word-level associations, enabling neural networks to profit from the derived benefits. We test the proposed approaches in both morphologically rich and poor languages, showing that, for the former, the quality in the translation of out-of-domain texts is improved with respect to a strong subword baseline. |
dc.description.sponsorship | This work is partially supported by Lucy Software / United Language Group (ULG) and the Catalan Agency for Management of University and Research Grants (AGAUR) through an Industrial PhD Grant. This work is also supported in part by the Spanish Ministerio de Economa y Competitividad, the European Regional Development Fund and the Agencia Estatal de Investigacin, through the postdoctoral senior grant Ramn y Cajal, contract TEC2015-69266-P (MINECO/FEDER,EU) and contract PCIN-2017-079 (AEI/MINECO). |
dc.format.extent | 22 p. |
dc.language.iso | eng |
dc.publisher | Cambridge University Press |
dc.subject | Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial |
dc.subject.lcsh | Machine translating |
dc.subject.other | Machine translation |
dc.subject.other | Neural network |
dc.subject.other | Morphology |
dc.subject.other | Vocabulary |
dc.title | Linguistic knowledge-based vocabularies for Neural Machine Translation |
dc.type | Article |
dc.subject.lemac | Traducció automàtica |
dc.contributor.group | Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla |
dc.identifier.doi | 10.1017/S1351324920000364 |
dc.description.peerreviewed | Peer Reviewed |
dc.relation.publisherversion | https://www.cambridge.org/core/journals/natural-language-engineering/article/linguistic-knowledgebased-vocabularies-for-neural-machine-translation/C1FAB80C1D6ADCD252EB627BA3B4082B |
dc.rights.access | Open Access |
local.identifier.drac | 29194510 |
dc.description.version | Postprint (author's final draft) |
dc.relation.projectid | info:eu-repo/grantAgreement/MINECO//TEC2015-69266-P/ES/TECNOLOGIAS DE APRENDIZAJE PROFUNDO APLICADAS AL PROCESADO DE VOZ Y AUDIO/ |
dc.relation.projectid | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación (PEICTI) 2013-2016/PCIN-2017-079/ES/AUTONOMOUS LIFELONG LEARNING INTELLIGENT SYSTEMS/ |
local.citation.author | Casas, N.; Costa-jussà, Marta R.; Fonollosa, José A. R.; Alonso, J.; Fanlo, R. |
local.citation.publicationName | Natural language engineering |
local.citation.startingPage | 1 |
local.citation.endingPage | 22 |
Fitxers d'aquest items
Aquest ítem apareix a les col·leccions següents
-
Articles de revista [1.049]
-
Articles de revista [172]
-
Articles de revista [2.528]
-
Articles de revista [211]