Methods for cross-language plagiarism detection

Barrón-Cedeño, Alberto; Gupta, P.; Rosso, Paolo

doi:10.1016/j.knosys.2013.06.018

Visualitza/Obre

Barron-cedeno et al.pdf (504,3Kb) (Accés restringit) Sol·licita una còpia a l'autor

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Barrón-Cedeño, Alberto

Gupta, P.

Rosso, Paolo

Tipus de documentArticle

Data publicació2013-09

Condicions d'accésAccés restringit per política de l'editorial

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets

Abstract

Three reasons make plagiarism across languages to be on the rise: (i) speakers of under-resourced languages often consult documentation in a foreign language, (ii) people immersed in a foreign country can still consult material written in their native language, and (iii) people are often interested in writing in a language different to their native one. Most efforts for automatically detecting cross-language plagiarism depend on a preliminary translation, which is not always available. In this paper we propose a freely available architecture for plagiarism detection across languages covering the entire process: heuristic retrieval, detailed analysis, and post-processing. On top of this architecture we explore the suitability of three cross-language similarity estimation models: Cross-Language Alignment-based Similarity Analysis (CL-ASA), Cross-Language Character n-Grams (CL-CNG), and Translation plus Monolingual Analysis (T + MA); three inherently different models in nature and required resources. The three models are tested extensively under the same conditions on the different plagiarism detection sub-tasks—something never done before. The experiments show that T+MA produces the best results, closely followed by CL-ASA. Still CL-ASA obtains higher values of precision, an important factor in plagiarism detection when lesser user intervention is desired.

CitacióBarron-Cedeño, A.; Gupta, P.; Rosso, P. Methods for cross-language plagiarism detection. "Knowledge-based systems", Setembre 2013, vol. 50, p. 211-217.

URIhttp://hdl.handle.net/2117/20275

DOI10.1016/j.knosys.2013.06.018

ISSN0950-7051

Col·leccions

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
Barron-cedeno et al.pdf		504,3Kb	PDF	Accés restringit

UPCommons. Portal del coneixement obert de la UPC

Methods for cross-language plagiarism detection

Visualitza/Obre

Explora