Ara es mostren els items 64-83 de 135

    • GeBioToolkit: automatic extraction of gender-balanced multilingual corpus of Wikipedia biographies 

      Ruiz Costa-Jussà, Marta; Li Lin, Pau; España Bonet, Cristina (European Language Resources Association (ELRA), 2020)
      Comunicació de congrés
      Accés obert
      We introduce GeBioToolkit, a tool for extracting multilingual parallel corpora at sentence level, with document and gender information from Wikipedia biographies. Despite the gender inequalities present in Wikipedia, the ...
    • Gender bias in multilingual neural machine translation: The architecture matters 

      Ruiz Costa-Jussà, Marta; Escolano Peinado, Carlos; Basta, Christine Raouf Saad; Ferrando Monsonís, Javier; Batlle, Roser; Kharitonova, Ksenia (2020-12-24)
      Report de recerca
      Accés obert
      Multilingual Neural Machine Translation architectures mainly differ in the amount of sharing modules and parameters among languages. In this paper, and from an algorithmic perspective, we explore if the chosen architecture, ...
    • Generación morfológica con algoritmos de aprendizaje profundo integrada en un sistema de traducción automática estadística 

      Escolano, Carlos; Ruiz Costa-Jussà, Marta (2017-09-22)
      Article
      Accés obert
      La variación morfológica entre un lenguaje fuente y el lenguaje destino genera dificultades a los algoritmos estándares de traducción como el estadístico basado en segmentos. En este trabajo planteamos dividir la tarea de ...
    • High frequent in-domain word segmentation and forward translation for the WMT21 Biomedical task 

      Rafieian, Bardia; Ruiz Costa-Jussà, Marta (Association for Computational Linguistics, 2021)
      Text en actes de congrés
      Accés obert
      This paper reports the optimization of using the out-of-domain data in the Biomedical translation task. We firstly optimized our parallel training dataset using the BabelNet in-domain terminology words. Afterward, to ...
    • Holaaa!! Writin like u talk is kewl but kinda hard 4 NLP 

      Melero, Maite; Ruiz Costa-Jussà, Marta; Domingo, Judit; Marquina, Montse; Quixal, Martí (2012)
      Comunicació de congrés
      Accés obert
      We present work in progress aiming to build tools for the normalization of User-Generated Content (UGC). As we will see, the task requires the revisiting of the initial steps of NLP processing, since UGC (micro-blog, blog, ...
    • How much hybridisation does machine translation need? 

      Ruiz Costa-Jussà, Marta (2015-10-01)
      Article
      Accés obert
      Rule-based and corpus-based machine translation (MT)have coexisted for more than 20 years. Recently, bound-aries between the two paradigms have narrowed andhybrid approaches are gaining interest from bothacademia and ...
    • Hybrid machine translation: integration of linguistics and statistics : editorial 

      Rodríguez Fonollosa, José Adrián; Ruiz Costa-Jussà, Marta (Elsevier, 2015-07)
      Article
      Accés obert
    • Improving a Catalan-Spanish statistical translation system using morphosyntactic knowledge 

      Farrús, Mireia; Ruiz Costa-Jussà, Marta; Poch, Marc; Hernández, Adolfo; Mariño, José B. (2009)
      Text en actes de congrés
      Accés obert
      In this paper, a human evaluation of a Catalan-Spanish Ngram-based statistical machine translation system is used to develop specific techniques based on the use of grammatical categories, lexical categorisation and text ...
    • Initial approaches on Cross-Lingual Information Retrieval using Statistical Machine Translation on user-queries 

      Ruiz Costa-Jussà, Marta; Paz-Trillo, Christian; Wassermann, Renata (2012)
      Text en actes de congrés
      Accés obert
      In this paper we propose a multilingual extension for OnAIR which is an ontology-aided information retrieval system applied to retrieve clips from a video collection. The multilingual extension basically involves allowing ...
    • Integration of machine translation paradigms (IMTraP) 

      Ruiz Costa-Jussà, Marta (2016-09)
      Article
      Accés obert
      La Traducción Automática (TA) es un campo altamente interdisciplinar y multidisciplinar porque en él trabajan: ingenieros, informáticos, estadísticos y lingüistas. El objetivo de este proyecto es acercar los diferentes ...
    • Integration of statistical collocation segmentations in a phrase-based statistical machine translation system 

      Ruiz Costa-Jussà, Marta; Daudaravicius, Vidas; Banchs, Rafael E. (2010)
      Text en actes de congrés
      Accés obert
      This study evaluates the impact of integrating two different collocation segmentations methods in a standard phrase-based statistical machine translation approach. The collocation segmentation techniques are implemented ...
    • Introduction to the special issue on cross-language algorithms and applications 

      Ruiz Costa-Jussà, Marta; Bangalore, Srinivas; Lambert, Patrik; Màrquez, Lluís; Montiel-Ponsoda, Elena (2016-12-01)
      Article
      Accés obert
      With the increasingly global nature of our everyday interactions, the need for multilingual technologies to support efficient and efective information access and communication cannot be overemphasized. Computational ...
    • Introduction to the special issue on deep learning approaches for machine translation 

      Ruiz Costa-Jussà, Marta; Allauzen, Alexandre; Barrault, loïc; Cho, Kyunghun; Schwenk, Holger (Elsevier, 2017)
      Article
      Accés obert
      Deep learning is revolutionizing speech and natural language technologies since it is offering an effective way to train systems and obtaining significant improvements. The main advantage of deep learning is that, by ...
    • Introduction: MT Approaches 

      Ruiz Costa-Jussà, Marta (2014-10-01)
      Audiovisual
      Accés obert
      Setmana 1, punt 2 del MOOC "Approaches to machine translation: rule-based, statistical and hybrid".
    • Is there hope for interlingua methods? A CLIR comparison experiment between interlingua and query translation 

      Ruiz Costa-Jussà, Marta; Banchs Martínez, Rafael Enrique (2014-12-01)
      Article
      Accés restringit per política de l'editorial
      A comparison of interlingua and query translation is proposed in a particular cross-language information retrieval (CLIR) application which consists on retrieving a book from the collection by using one of its chapters ...
    • Latest trends in hybrid machine translation and its applications 

      Rodríguez Fonollosa, José Adrián; Ruiz Costa-Jussà, Marta (Elsevier, 2015-07)
      Article
      Accés obert
      This survey on hybrid machine translation (MT) is motivated by the fact that hybridization techniques have become popular as they attempt to combine the best characteristics of highly advanced pure rule or corpus-based MT ...
    • Linguistic knowledge-based vocabularies for Neural Machine Translation 

      Casas Manzanares, Noé; Ruiz Costa-Jussà, Marta; Rodríguez Fonollosa, José Adrián; Alonso, Juan; Fanlo, Ramon (Cambridge University Press, 2020)
      Article
      Accés obert
      Neural Networks applied to Machine Translation need a finite vocabulary to express textual information as a sequence of discrete tokens. The currently dominant subword vocabularies exploit statistically-discovered common ...
    • Linguistic-based evaluation criteria to identify statistical machine translation errors 

      Farrús Cabeceran, Mireia; Ruiz Costa-Jussà, Marta; Mariño Acebal, José Bernardo; Rodríguez Fonollosa, José Adrián (2010-05)
      Comunicació de congrés
      Accés obert
      Machine translation evaluation methods are highly necessary in order to analyze the performance of translation systems. Up to now, the most traditional methods are the use of automatic measures such as BLEU or the ...
    • Measuring the mixing of contextual information in the transformer 

      Ferrando Monsonís, Javier; Gallego Olsina, Gerard Ion; Ruiz Costa-Jussà, Marta (Association for Computational Linguistics, 2022)
      Comunicació de congrés
      Accés obert
      The Transformer architecture aggregates input information through the self-attention mechanism, but there is no clear understanding of how this information is mixed across the entire model. Additionally, recent works have ...
    • Modelo estocástico de traducción basado en N-gramas de tuplas bilingues y combinación log-lineal de características  

      Mariño, José B.; Banchs, Rafael E.; Crego, Josep Maria; de Gispert, Adrià; Lambert, Patrik; Rodríguez Fonollosa, José Adrián; Ruiz Costa-Jussà, Marta (2005-09-01)
      Article
      Accés restringit per política de l'editorial
      En esta comunicación se presenta un sistema de traducción estocástica basado en el modelado mediante N-gramas de la probabilidad conjunta de textos bilingües. La unidad básica del modelo es la tupla, par de cadenas de ...