Authors:
Sidik Soleman
and
Atsushi Fujii
Affiliation:
Department of Computer Science, Tokyo Institute of Technology, Tokyo and Japan
Keyword(s):
Plagiarism Detection, Citation Behavior, Information Retrieval, Content Analysis.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Clustering and Classification Methods
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Mining Text and Semi-Structured Data
;
Symbolic Systems
Abstract:
Whereas in the academic publication, citation has been used for a long time to borrow ideas from another document and show the credit to the authors of that document, plagiarism, which does not indicate the appropriate credit for a borrowed idea, has of late become problematic. Because plagiarism detection has been formulated as finding partial near-duplicate in response to a document for a suspected case of plagiarism, in this paper we propose a method to improve the similarity computation between text fragments. Our contribution is to formulate three document similarities based on citation and content analysis, and to combine them in our method. We also show the effectiveness of our method experimentally and discuss its advantages and limitation.