Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

DCU@FIRE-2014: an information retrieval approach for source code plagiarism detection

Ganguly, Debasis orcid logoORCID: 0000-0003-0050-7138 and Jones, Gareth J.F. orcid logoORCID: 0000-0002-4033-9135 (2014) DCU@FIRE-2014: an information retrieval approach for source code plagiarism detection. In: Forum for Information Retrieval Evaluation (FIRE 2014) workshop, 5-7 Dec 2014, Bangalore, India.

Abstract
This paper investigates an information retrieval (IR) based approach for source code plagiarism detection. The method of extensively checking pairwise similarities between documents is not scalable for large collections of source code documents. To make the task of source code plagiarism detection fast and scalable in practice, we propose an IR based approach in which each document is treated as a pseudo-query in order to retrieve a list of potential candidate documents in a decreasing order of their similarity values. A threshold is then applied on the relative similarity decrement ratios to report a set of documents as potential cases of source-code reuse. Instead of treating a source code as an unstructured text document, we explore term extraction from the annotated parse tree of a source code and also make use of field based language model for indexing and retrieval of source code documents. Results conrm that source code parsing plays a vital role in improving the plagiarism prediction accuracy.
Metadata
Item Type:Conference or Workshop Item (Paper)
Event Type:Workshop
Refereed:Yes
Uncontrolled Keywords:Source Code Plagiarism Detection; Field Search
Subjects:Computer Science > Information retrieval
DCU Faculties and Centres:Research Institutes and Centres > Centre for Next Generation Localisation (CNGL)
DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Published in: Proceedings of FIRE 2014. .
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:Science Foundation Ireland
ID Code:20382
Deposited On:15 Jan 2015 14:59 by Gareth Jones . Last Modified 25 Oct 2018 08:54
Documents

Full text available as:

[thumbnail of 2014-soco-ganguly-dcu.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
219kB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record