Now showing items 1-13 of 13

    • A comparison of approaches for measuring cross-lingual similarity of wikipedia articles 

      Barrón-Cedeño, Alberto; Lestari Paramita, Monica; Clough, Paul; Rosso, Paolo (Springer, 2014)
      Conference report
      Open Access
      Wikipedia has been used as a source of comparable texts for a range of tasks, such as Statistical Machine Translation and Cross-Language Information Retrieval. Articles written in different languages on the same topic are ...
    • A factory of comparable corpora from Wikipedia 

      Barrón-Cedeño, Alberto; España Bonet, Cristina; Boldoba Trapote, Josu; Márquez Villodre, Luís (Association for Computational Linguistics, 2015)
      Conference report
      Open Access
      Multiple approaches to grab comparable data from the Web have been developed up to date. Nevertheless, coming out with a high-quality comparable corpus of a specific topic is not straightforward. We present a model ...
    • Identifying useful human correction feedback from an on-line machine translation service 

      Barrón-Cedeño, Alberto; Màrquez Villodre, Lluís; Henríquez Quintana, Carlos Alberto; Formiga Fanals, Lluís; Romero Merino, Enrique; May, Jonathan (2013)
      Conference report
      Open Access
      Post-editing feedback provided by users of on-line translation services offers an excellent opportunity for automatic improvement of statistical machine translation (SMT) systems. However, feedback provided by casual users ...
    • Identifying useful human feedback from an on-line translation service 

      Barrón-Cedeño, Alberto; Màrquez Villodre, Lluís; Henríquez Quintana, Carlos Alberto; Formiga Fanals, Lluís; Romero Merino, Enrique; May, Jonathan (2013)
      Conference lecture
      Open Access
      Post-editing feedback provided by users of on-line translation services offers an excellent opportunity for automatic improvement of statistical machine translation (SMT) systems. However, feedback provided by casual ...
    • IPA and STOUT: leveraging linguistic and source-based features for machine translation evaluation 

      González Bermúdez, Meritxell; Barrón-Cedeño, Alberto; Màrquez Villodre, Lluís (Association for Computational Linguistics, 2014)
      Conference lecture
      Restricted access - publisher's policy
      This paper describes the UPC submissions to the WMT14 Metrics Shared Task : UPC-IPA and UPC-STOUT. These metrics use a collection of evaluation measures integrated in ASIYA, a toolkit for machine translation evaluation. ...
    • Leveraging online user feedback to improve statistical machine translation 

      Formiga, Lluís; Barrón-Cedeño, Alberto; Marquez, Lluis; Henriquez, Carlos A; Mariño Acebal, José Bernardo (2015-09-01)
      Article
      Open Access
      In this article we present a three-step methodology for dynamically improving a statistical machine translation (SMT) system by incorporating human feedback in the form of free edits on the system translations. We target ...
    • Methods for cross-language plagiarism detection 

      Barrón-Cedeño, Alberto; Gupta, P.; Rosso, Paolo (2013-09)
      Article
      Restricted access - publisher's policy
      Three reasons make plagiarism across languages to be on the rise: (i) speakers of under-resourced languages often consult documentation in a foreign language, (ii) people immersed in a foreign country can still consult ...
    • PAN@FIRE: overview of the cross-language Indian text re-use detection competition 

      Barrón-Cedeño, Alberto; Rosso, Paolo; Lalitha Devi, Sobha; Clough, Paul; Stevenson, Mark (2010)
      Conference report
      Restricted access - publisher's policy
      The development of models for automatic detection of text re-use and plagiarism across languages has received increasing attention in recent years. However, the lack of an evaluation framework composed of annotated datasets ...
    • Plagiarism meets paraphrasing: insights for the next generation in automatic plagiarism detection 

      Barrón-Cedeño, Alberto; Vila, Marta; Martí, Maria Antonia; Rosso, Paolo (2013)
      Article
      Open Access
      Although paraphrasing is the linguistic mechanism underlying many plagiarism cases, little attention has been paid to its analysis in the framework of automatic plagiarism detection. Therefore, state-of-the-art plagiarism ...
    • The TALP-UPC approach to system selection: ASIYA features and pairwise classification using random forests 

      Formiga Fanals, Lluís; González Bermúdez, Meritxell; Barrón-Cedeño, Alberto; Rodríguez Fonollosa, José Adrián; Màrquez Villodre, Lluís (2013)
      Conference report
      Restricted access - publisher's policy
      This paper describes the TALP-UPC participation in the WMT’13 Shared Task on Quality Estimation (QE). Our participation is reduced to task 1.2 on System Selection. We used a broad set of features (86 for German-to-English ...
    • The TALP-UPC phrase-based translation systems for WMT13: system combination with morphology generation, domain adaptation and corpus filtering 

      Formiga Fanals, Lluís; Ruiz Costa-Jussà, Marta; Mariño Acebal, José Bernardo; Rodríguez Fonollosa, José Adrián; Barrón-Cedeño, Alberto; Màrquez Villodre, Lluís (2013)
      Conference report
      Restricted access - publisher's policy
      This paper describes the TALP participation in the WMT13 evaluation campaign. Our participation is based on the combination of several statistical machine translation systems: based on standard hrasebased Moses systems. ...
    • UPC-CORE : What can machine translation evaluation metrics and Wikipedia do for estimating semantic textual similarity? 

      Barrón-Cedeño, Alberto; Màrquez Villodre, Lluís; Fuentes Fort, Maria; Rodríguez Hontoria, Horacio; Turmo Borras, Jorge (2013)
      Conference lecture
      Open Access
      In this paper we discuss our participation to the 2013 Semeval Semantic Textual Similarity task. Our core features include (i) a set of metrics borrowed from automatic machine translation, originally intended to evaluate ...
    • Wikicardi : hacia la extracción de oraciones paralelas de Wikipedia 

      Boldoba Trapote, Josu; Barrón-Cedeño, Alberto; España Bonet, Cristina (2014-03-01)
      External research report
      Open Access
      Uno de los objetivos del proyecto Tacardi (TIN2012-38523-C02-00) consiste en extraer oraciones paralelas de corpus comparables para enriquecer y adaptar traductores automáticos. En esta investigación usamos un subconjunto ...