• A comparison of approaches for measuring cross-lingual similarity of wikipedia articles 

    Barrón-Cedeño, Alberto; Lestari Paramita, Monica; Clough, Paul; Rosso, Paolo (Springer, 2014)
    Texto en actas de congreso
    Acceso abierto
    Wikipedia has been used as a source of comparable texts for a range of tasks, such as Statistical Machine Translation and Cross-Language Information Retrieval. Articles written in different languages on the same topic are ...
  • A factory of comparable corpora from Wikipedia 

    Barrón-Cedeño, Alberto; España Bonet, Cristina; Boldoba Trapote, Josu; Márquez Villodre, Luís (Association for Computational Linguistics, 2015)
    Texto en actas de congreso
    Acceso abierto
    Multiple approaches to grab comparable data from the Web have been developed up to date. Nevertheless, coming out with a high-quality comparable corpus of a specific topic is not straightforward. We present a model ...
  • Identifying useful human correction feedback from an on-line machine translation service 

    Barrón-Cedeño, Alberto; Màrquez Villodre, Lluís; Henríquez Quintana, Carlos Alberto; Formiga Fanals, Lluís; Romero Merino, Enrique; May, Jonathan (2013)
    Texto en actas de congreso
    Acceso abierto
    Post-editing feedback provided by users of on-line translation services offers an excellent opportunity for automatic improvement of statistical machine translation (SMT) systems. However, feedback provided by casual users ...
  • Identifying useful human feedback from an on-line translation service 

    Barrón-Cedeño, Alberto; Màrquez Villodre, Lluís; Henríquez Quintana, Carlos Alberto; Formiga Fanals, Lluís; Romero Merino, Enrique; May, Jonathan (2013)
    Comunicación de congreso
    Acceso abierto
    Post-editing feedback provided by users of on-line translation services offers an excellent opportunity for automatic improvement of statistical machine translation (SMT) systems. However, feedback provided by casual ...
  • IPA and STOUT: leveraging linguistic and source-based features for machine translation evaluation 

    González Bermúdez, Meritxell; Barrón-Cedeño, Alberto; Màrquez Villodre, Lluís (Association for Computational Linguistics, 2014)
    Comunicación de congreso
    Acceso restringido por política de la editorial
    This paper describes the UPC submissions to the WMT14 Metrics Shared Task : UPC-IPA and UPC-STOUT. These metrics use a collection of evaluation measures integrated in ASIYA, a toolkit for machine translation evaluation. ...
  • Leveraging online user feedback to improve statistical machine translation 

    Formiga, Lluís; Barrón-Cedeño, Alberto; Marquez, Lluis; Henriquez, Carlos A; Mariño Acebal, José Bernardo (2015-09-01)
    Artículo
    Acceso abierto
    In this article we present a three-step methodology for dynamically improving a statistical machine translation (SMT) system by incorporating human feedback in the form of free edits on the system translations. We target ...
  • Methods for cross-language plagiarism detection 

    Barrón-Cedeño, Alberto; Gupta, P.; Rosso, Paolo (2013-09)
    Artículo
    Acceso restringido por política de la editorial
    Three reasons make plagiarism across languages to be on the rise: (i) speakers of under-resourced languages often consult documentation in a foreign language, (ii) people immersed in a foreign country can still consult ...
  • PAN@FIRE: overview of the cross-language Indian text re-use detection competition 

    Barrón-Cedeño, Alberto; Rosso, Paolo; Lalitha Devi, Sobha; Clough, Paul; Stevenson, Mark (2010)
    Texto en actas de congreso
    Acceso restringido por política de la editorial
    The development of models for automatic detection of text re-use and plagiarism across languages has received increasing attention in recent years. However, the lack of an evaluation framework composed of annotated datasets ...
  • Plagiarism meets paraphrasing: insights for the next generation in automatic plagiarism detection 

    Barrón-Cedeño, Alberto; Vila, Marta; Martí, Maria Antonia; Rosso, Paolo (2013)
    Artículo
    Acceso abierto
    Although paraphrasing is the linguistic mechanism underlying many plagiarism cases, little attention has been paid to its analysis in the framework of automatic plagiarism detection. Therefore, state-of-the-art plagiarism ...
  • The TALP-UPC approach to system selection: ASIYA features and pairwise classification using random forests 

    Formiga Fanals, Lluís; González Bermúdez, Meritxell; Barrón-Cedeño, Alberto; Rodríguez Fonollosa, José Adrián; Màrquez Villodre, Lluís (2013)
    Texto en actas de congreso
    Acceso restringido por política de la editorial
    This paper describes the TALP-UPC participation in the WMT’13 Shared Task on Quality Estimation (QE). Our participation is reduced to task 1.2 on System Selection. We used a broad set of features (86 for German-to-English ...
  • The TALP-UPC phrase-based translation systems for WMT13: system combination with morphology generation, domain adaptation and corpus filtering 

    Formiga Fanals, Lluís; Ruiz Costa-Jussà, Marta; Mariño Acebal, José Bernardo; Rodríguez Fonollosa, José Adrián; Barrón-Cedeño, Alberto; Màrquez Villodre, Lluís (2013)
    Texto en actas de congreso
    Acceso restringido por política de la editorial
    This paper describes the TALP participation in the WMT13 evaluation campaign. Our participation is based on the combination of several statistical machine translation systems: based on standard hrasebased Moses systems. ...
  • UPC-CORE : What can machine translation evaluation metrics and Wikipedia do for estimating semantic textual similarity? 

    Barrón-Cedeño, Alberto; Màrquez Villodre, Lluís; Fuentes Fort, Maria; Rodríguez Hontoria, Horacio; Turmo Borras, Jorge (2013)
    Comunicación de congreso
    Acceso abierto
    In this paper we discuss our participation to the 2013 Semeval Semantic Textual Similarity task. Our core features include (i) a set of metrics borrowed from automatic machine translation, originally intended to evaluate ...
  • Wikicardi : hacia la extracción de oraciones paralelas de Wikipedia 

    Boldoba Trapote, Josu; Barrón-Cedeño, Alberto; España Bonet, Cristina (2014-03-01)
    Report de recerca
    Acceso abierto
    Uno de los objetivos del proyecto Tacardi (TIN2012-38523-C02-00) consiste en extraer oraciones paralelas de corpus comparables para enriquecer y adaptar traductores automáticos. En esta investigación usamos un subconjunto ...