Now showing items 1-20 of 27

    • A factory of comparable corpora from Wikipedia 

      Barrón-Cedeño, Alberto; España Bonet, Cristina; Boldoba Trapote, Josu; Márquez Villodre, Luís (Association for Computational Linguistics, 2015)
      Conference report
      Open Access
      Multiple approaches to grab comparable data from the Web have been developed up to date. Nevertheless, coming out with a high-quality comparable corpus of a specific topic is not straightforward. We present a model ...
    • A hybrid machine translation architecture guided by syntax 

      Labaka, Gorka; España Bonet, Cristina; Màrquez Villodre, Lluís; Sarasola, Kepa (2014-09-16)
      Article
      Restricted access - publisher's policy
      This article presents a hybrid architecture which combines rule-based machine translation (RBMT) with phrase-based statistical machine translation (SMT). The hybrid translation system is guided by the rule-based engine. ...
    • A hybrid system for patent translation 

      Enache, Ramona; España Bonet, Cristina; Ranta, Aarne; Màrquez Villodre, Lluís (2012)
      Conference lecture
      Open Access
      This work presents a HMT system for patent translation. The system exploits the high coverage of SMT and the high precision of an RBMT system based on GF to deal with specific issues of the language. The translator is ...
    • CoCo, a web interface for corpora compilation 

      España Bonet, Cristina; Vila Rigat, Marta; Rodríguez Hontoria, Horacio; Martí, Maria Antònia (2009)
      Article
      Open Access
      CoCo es una interfaz web colaborativa para la compilación de recursos lingüísticos. En esta demo se presenta una de sus posibles aplicaciones: la obtención de paráfrasis. / CoCo is a collaborative web interface for the ...
    • Context-aware machine translation for software localization 

      Muntés Mulero, Víctor; Paladini Adell, Patricia; España Bonet, Cristina; Màrquez Villodre, Lluís (2012)
      Conference lecture
      Open Access
      Software localization requires translating short text strings appearing in user interfaces (UI) into several languages. These strings are usually unrelated to the other strings in the UI. Due to the lack of semantic ...
    • Deep evaluation of hybrid architectures: simple metrics correlated with human judgments 

      Labaka, Gorka; Díaz de Ilarraza Sánchez, Arantza; Sarasola Gabiola, Kepa; España Bonet, Cristina; Màrquez Villodre, Lluís (2011)
      Conference lecture
      Open Access
      The process of developing hybrid MT systems is guided by the evaluation method used to compare different combinations of basic subsystems. This work presents a deep evaluation experiment of a hybrid architecture ...
    • Deep evaluation of hybrid architectures: Use of different metrics in MERT weight optimization 

      España Bonet, Cristina; Labaka, Gorka; Díaz de Ilarraza Sánchez, Arantza; Màrquez Villodre, Lluís; Sarasola, Kepa (2013)
      Conference report
      Open Access
      The process of developing hybrid MT systems is usually guided by an evaluation method used to compare different combinations of basic subsystems. This work presents a deep evaluation experiment of a hybrid architecture, ...
    • Discriminative learning within Arabic statistical machine translation 

      España Bonet, Cristina; Giménez, Jesús; Màrquez Villodre, Lluís (2009-01)
      External research report
      Open Access
      Written Arabic is a especially ambiguous due to the lack of diacritisation of texts, and this makes the translation harder for automatic systems that do not take into account the context of phrases. Here, we use a standard ...
    • Document-level machine translation as a re-translation process 

      Martínez Garcia, Eva; España Bonet, Cristina; Màrquez Villodre, Lluís (2014-09-22)
      Article
      Open Access
      Most of the current Machine Translation systems are designed to translate a document sentence by sentence ignoring discourse information and producing incoherencies in the final translations. In this paper we present some ...
    • Document-level machine translation with word vector models 

      Martínez Garcia, Eva; España Bonet, Cristina; Márquez Villodre, Luís (2015)
      Conference report
      Open Access
      In this paper we apply distributional semantic information to document-level machine translation. We train monolingual and bilingual word vector models on large corpora and we evaluate them first in a cross-lingual lexical ...
    • El català i les tecnologies de la llengua 

      Boleda Torrent, Gemma; Cuadros Oller, Montserrat; España Bonet, Cristina; Melero, Maite; Padró, Lluís; Quixal, Martí; Rodríguez, Carlos (2009)
      Article
      Restricted access - publisher's policy
      El processament computacional de la llengua abraça qualsevol activitat relacionada amb la creació, gestió i utilització de tecnologia i recursos lingüístics. En el pla científic, aquesta activitat és central en disciplines ...
    • Experiments on document level machine translation 

      Martínez Garcia, Eva; España Bonet, Cristina; Márquez Villodre, Luís (2014-03-03)
      External research report
      Open Access
      Most of the current SMT systems work at sentence level. They translate a text assuming that sentences are independent, but, when one looks at a well formed document, it is clear that there exist many inter sentence relations. ...
    • Full machine translation for factoid question answering 

      España Bonet, Cristina; Comas Umbert, Pere Ramon (Association for Computational Linguistics, 2012)
      Conference lecture
      Open Access
      In this paper we present an SMT-based approach to Question Answering (QA). QA is the task of extracting exact answers in response to natural language questions. In our approach, the answer is a translation of the question ...
    • GeBioToolkit: automatic extraction of gender-balanced multilingual corpus of Wikipedia biographies 

      Ruiz Costa-Jussà, Marta; Li Lin, Pau; España Bonet, Cristina (European Language Resources Association (ELRA), 2020)
      Conference lecture
      Open Access
      We introduce GeBioToolkit, a tool for extracting multilingual parallel corpora at sentence level, with document and gender information from Wikipedia biographies. Despite the gender inequalities present in Wikipedia, the ...
    • Hybrid machine translation guided by a rule-based system 

      España Bonet, Cristina; Màrquez Villodre, Lluís; Labaka, Gorka; Díaz de Ilarraza Sánchez, Arantza; Sarasola Gabiola, Kepa (2011)
      Conference lecture
      Open Access
      This paper presents a machine translation architecture which hybridizes Matxin, a rulebased system, with regular phrase-based Statistical Machine Translation. In short, the hybrid translation process is guided by the ...
    • Language technology challenges of a "small" language (Catalan) 

      Melero, Maite; Boleda Torrent, Gemma; Cuadros Oller, Montserrat; España Bonet, Cristina; Padró, Lluís; Quixal, Martí; Rodríguez, Carlos; Saurí, Roser (2010-05)
      Conference report
      Restricted access - publisher's policy
      In this paper, we present a brief snapshot of the state of affairs in computational processing of Catalan and the initiatives that are starting to take place in an effort to bring the field a step forward, by making a ...
    • MT techniques in a retrieval system of semantically enriched patents 

      González Bermúdez, Meritxell; Mateva, Maria; Enache, Ramona; España Bonet, Cristina; Màrquez Villodre, Lluís; Popov, Borislav; Ranta, Aarne (2013)
      Conference lecture
      Open Access
      This paper focuses on how automatic translation techniques integrated in a patent retrieval system increase its capabilities and make possible extended features and functionalities. We describe 1) a novel methodology ...
    • Overview of TweetMT : a shared task on machine translation of tweets at SEPLN 2015 

      Alegria, Iñaki; Aranberri, Nora; España Bonet, Cristina; Gamallo, Pablo; Gonçalo Oliveira, Hugo; Martínez Garcia, Eva; San Vicente Roncal, Iñaki; Toral, Antonio; Zubiaga, Arkaitz (2015)
      Conference report
      Open Access
      This article presents an overview of the shared task that took place as part of the TweetMT workshop held at SEPLN 2015. The task consisted in translating collections of tweets from and to several ...
    • Patent translation within the MOLTO project 

      España Bonet, Cristina; Enache, Ramona; Slaski, Adam; Ranta, Aarne; Màrquez Villodre, Lluís; González Bermúdez, Meritxell (2011)
      Conference lecture
      Open Access
      MOLTO is an FP7 European project whose goal is to translate texts between multiple languages in real time with high quality. Patents translation is a case of study where research is focused on simultaneously obtaining a ...
    • Primera Jornada del Processament Computacional del Català 

      Boleda Torrent, Gemma; Cuadros Oller, Montserrat; España Bonet, Cristina; Melero, Maite; Padró, Lluís; Quixal, Martí; Rodríguez, Carlos (2009-09)
      Article
      Restricted access - publisher's policy
      Presentamos las conclusiones de la primera Jornada del Processament Computacional del Català, celebrado en Barcelona en marzo del 2009. We present the conclusions of the first Jornada del Processament Computacional del ...