Mostra el registre d'ítem simple
Plagiarism detection using information retrieval and similarity measures based on image processing techniques
dc.contributor.author | Ruiz Costa-Jussà, Marta |
dc.contributor.author | Banchs, Rafael E. |
dc.contributor.author | Grivolla, Jens |
dc.contributor.author | Codina, Joan |
dc.contributor.other | Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions |
dc.date.accessioned | 2017-03-13T17:38:48Z |
dc.date.available | 2017-03-13T17:38:48Z |
dc.date.issued | 2010 |
dc.identifier.citation | Ruiz, M., Banchs, R., Grivolla, J., Codina, J. Plagiarism detection using information retrieval and similarity measures based on image processing techniques. A: Conference on Multilingual and Multimodal Information Access Evaluation. "Notebook Papers of CLEF 2010 Labs and Workshops, 22-23 September, Padua, Italy, September 2010". 2010. |
dc.identifier.isbn | 978-88-904810-2-4 |
dc.identifier.uri | http://hdl.handle.net/2117/102408 |
dc.description.abstract | This paper describes the Barcelona Media Innovation Center participation in the 2nd International Competition on Plagiarism Detection. Particularly, our system focused on the external plagiarism detection task, which assumes the source documents are available. We present a two-step a approach. In the first step of our method, we build an information retrieval system based on Solr/Lucene, segmenting both suspicious and source documents into smaller texts.We perform a search based on bag-of-words which provides a first selection of potentially plagiarized texts. In the second step, each promising pair is further investigated. We implemented a sliding window approach that computes cosine distances between overlapping text segments from both the source and suspicious documents on a pair wise basis. As a result, a similarity matrix between text segments is obtained, which is smoothed by means of low-pass 2-D filtering. From the smoothed similarity matrix, plagiarized segments are identified by using image processing techniques. Our results were placed in the middle of the official ranking, which considered together two types of plagiarism: intrinsic and external. |
dc.language.iso | eng |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/3.0/es/ |
dc.subject | Àrees temàtiques de la UPC::Informàtica |
dc.subject.lcsh | Plagiarism |
dc.subject.other | Plagiarism detection |
dc.subject.other | Information retrieval |
dc.title | Plagiarism detection using information retrieval and similarity measures based on image processing techniques |
dc.type | Conference report |
dc.subject.lemac | Plagi |
dc.contributor.group | Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla |
dc.rights.access | Open Access |
local.identifier.drac | 19719602 |
dc.description.version | Postprint (published version) |
local.citation.author | Ruiz, M.; Banchs, R.; Grivolla, J.; Codina, J. |
local.citation.contributor | Conference on Multilingual and Multimodal Information Access Evaluation |
local.citation.publicationName | Notebook Papers of CLEF 2010 Labs and Workshops, 22-23 September, Padua, Italy, September 2010 |