Recent Submissions

  • Double multi-head attention for speaker verification 

    India Massana, Miquel Àngel; Safari, Pooyan; Hernando Pericás, Francisco Javier (Institute of Electrical and Electronics Engineers (IEEE), 2021)
    Conference report
    Open Access
    Most state-of-the-art Deep Learning systems for text-independent speaker verification are based on speaker embedding extractors. These architectures are commonly composed of a feature extractor front-end together with a ...
  • Multilingual machine translation: Closing the gap between shared and language-specific encoder-decoders 

    Escolano Peinado, Carlos; Ruiz Costa-Jussà, Marta; Rodríguez Fonollosa, José Adrián; Artetxe Zurutuza, Mikel (Association for Computational Linguistics, 2021)
    Conference lecture
    Open Access
    State-of-the-art multilingual machine translation relies on a universal encoder-decoder, which requires retraining the entire system to add new languages. In this paper, we propose an alternative approach that is based on ...
  • The TALP-UPC system for the WMT similar language task: statistical vs neural machine translation 

    Biesialska, Magdalena Marta; Guàrdia Fernández, Lluís; Ruiz Costa-Jussà, Marta (Association for Computational Linguistics, 2019)
    Conference lecture
    Open Access
    Although the problem of similar language translation has been an area of research interest for many years, yet it is still far from being solved. In this paper, we study the performance of two popular approaches: statistical ...
  • Refinement of unsupervised cross-lingual word embeddings 

    Biesialska, Magdalena Marta; Ruiz Costa-Jussà, Marta (Ios Press, 2020)
    Conference lecture
    Open Access
    Cross-lingual word embeddings aim to bridge the gap between high-resource and low-resource languages by allowing to learn multilingual word representations even without using any direct bilingual signal. The lion's share ...
  • Syntax-driven iterative expansion language models for controllable text generation 

    Casas Manzanares, Noé; Rodríguez Fonollosa, José Adrián; Ruiz Costa-Jussà, Marta (Association for Computational Linguistics, 2020)
    Conference lecture
    Open Access
    The dominant language modeling paradigm handles text as a sequence of discrete tokens. While that approach can capture the latent structure of the text, it is inherently constrained to sequential dynamics for text generation. ...
  • GeBioToolkit: automatic extraction of gender-balanced multilingual corpus of Wikipedia biographies 

    Ruiz Costa-Jussà, Marta; Li Lin, Pau; España Bonet, Cristina (European Language Resources Association (ELRA), 2020)
    Conference lecture
    Open Access
    We introduce GeBioToolkit, a tool for extracting multilingual parallel corpora at sentence level, with document and gender information from Wikipedia biographies. Despite the gender inequalities present in Wikipedia, the ...
  • Enhancing word embeddings with knowledge extracted from lexical resources 

    Biesialska, Magdalena Marta; Rafieian, Bardia; Ruiz Costa-Jussà, Marta (Association for Computational Linguistics, 2020)
    Conference lecture
    Open Access
    In this work, we present an effective method for semantic specialization of word vector representations. To this end, we use traditional word embeddings and apply specialization methods to better capture semantic relations ...
  • Fine-tuning neural machine translation on gender-balanced datasets 

    Ruiz Costa-Jussà, Marta; de Jorge Sánchez, Adrián (Association for Computational Linguistics, 2020)
    Conference lecture
    Open Access
    Misrepresentation of certain communities in datasets is causing big disruptions in artificial intelligence applications. In this paper, we propose using an automatically extracted gender-balanced dataset parallel corpus ...
  • Findings of the first shared task on lifelong learning machine translation 

    Barrault, Loïc; Biesialska, Magdalena Marta; Ruiz Costa-Jussà, Marta; Bougares, Fethi; Galibert, Olivier (Association for Computational Linguistics, 2020)
    Conference lecture
    Open Access
    A lifelong learning system can adapt to new data without forgetting previously acquired knowledge. In this paper, we introduce the first benchmark for lifelong learning machine translation. For this purpose, we provide ...
  • Multilingual neural machine translation: case-study for Catalan, Spanish and Portuguese romance languages 

    Vergés Boncompte, Pere; Ruiz Costa-Jussà, Marta (Association for Computational Linguistics, 2020)
    Conference lecture
    Open Access
    In this paper, we describe the TALP-UPC participation in the WMT Similar Language Translation task between Catalan, Spanish, and Portuguese, all of them, Romance languages. We made use of different techniques to improve ...
  • The TALP-UPC system description for WMT20 news translation task: multilingual adaptation for low resource MT 

    Escolano Peinado, Carlos; Ruiz Costa-Jussà, Marta; Rodríguez Fonollosa, José Adrián (Association for Computational Linguistics, 2020)
    Conference lecture
    Open Access
    In this article, we describe the TALP-UPC participation in the WMT20 news translation shared task for Tamil-English. Given the low amount of parallel training data, we resort to adapt the task to a multilingual system to ...
  • The IPN-CIC team system submission for the WMT 2020 similar language task 

    Menéndez-Salazar, Luis A.; Sidorov, Grigori; Ruiz Costa-Jussà, Marta (Association for Computational Linguistics, 2020)
    Conference lecture
    Open Access
    This paper describes the participation of the NLP research team of the IPN Computer Research center in the WMT 2020 Similar Language Translation Task. We have submitted systems for the Spanish-Portuguese language pair (in ...

View more