Git Product home page Git Product logo

gipplab / citeplag Goto Github PK

View Code? Open in Web Editor NEW
10.0 9.0 0.0 13.49 MB

Prototype of an external plagiarism detection system that combines the analysis of citations and text in academic documents to improve the identification of disguised forms of academic plagiarism

Home Page: http://www.citeplag.org

License: Other

C 0.17% Python 0.04% Java 4.34% PHP 93.64% Shell 0.04% Batchfile 0.03% CSS 0.63% JavaScript 0.87% HTML 0.01% Dockerfile 0.02% EJS 0.21%

citeplag's Introduction

Overview

DOI

The CitePlag backend is mainly written in Java and operates on a MySQL database. The schema and documentation of the database is included in the doc/database_doc directory of the repository.

After parsing documents and storing them in the database, CitePlag retrieves document data to pre-compute text-based and citation-based similarities using the algorithms in the directory backend/src/org/sciplore/cbpd/alg and stores the results called "patterns" in the database.

The frontend, which is available at www.citeplag.org, is a CakePHP application that is de-coupled from the backend. The frontend retrieves and visualizes the data in the database.

For detecting text matches, CitePlag uses Encoplot by Christian Grozea, which is a light-weight, fast, and accurate text-based detection tool. Encoplot itself is a small C script. The code for Encoplot is avalaible in this paper

To extract citation and reference data from documents, CitePlag uses ParsCit.

The backend code includes adapter classes to include the functionality of Encoplot and ParsCit.

If you are interested in details on the detection algorithms or the CitePlag system, we suggest taking a look at the doctoral thesis of Bela Gipp.

Please cite as:

Meuschke, N. & Gipp, B. & Breitinger, C., “CitePlag: A Citation-based Plagiarism Detection System Prototype,” in Proceedings of the 5th International Plagiarism Conference, Newcastle upon Tyne, UK, July 16-18 2012, DOI: 10.5281/zenodo.3483088. (PDF)

BibTeX:

@INPROCEEDINGS{Meuschke2012,
   author    = {Meuschke, Norman and Gipp, Bela and Breitinger, Corinna},
   title     = {CitePlag: A Citation-based Plagiarism Detection System Prototype},
   booktitle = {Proceedings of the 5th International Plagiarism Conference},
   year      = {2012},
   location  = {Newcastle upon Tyne, UK}
}

Related Publications

  • N. Meuschke, M. Schubotz, F. Hamborg, T. Skopal, and B. Gipp, “Analyzing Mathematical Content to Detect Academic Plagiarism,” in Proceedings of the International Conference on Information and Knowledge Management (CIKM), Singapore, 2017. (PDF)

  • N. Meuschke, N. Siebeck, M. Schubotz, B. Gipp, “Analyzing Semantic Concept Patterns to Detect Academic Plagiarism,” in Proceedings of the 6th International Workshop on Mining Scientific Publications (WOSP) held in conjunction with the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL), 2017. (PDF)

  • Gipp, Citation-based Plagiarism Detection – Detecting Disguised and Cross-language Plagiarism using Citation Pattern Analysis, Springer Vieweg Research, 2014. (PDF)

  • N. Meuschke and B. Gipp, “Reducing Computational Effort for Plagiarism Detection by using Citation Characteristics to Limit Retrieval Space”, in Proceedings of the IEEE-CS/ACM International Conference on Digital Libraries (DL), 2014, pp. 197-200. (PDF)

  • B. Gipp, N. Meuschke, and C. Breitinger, “Citation-based Plagiarism Detection: Practicability on a Large-scale Scientific Corpus”, Journal of the American Society for Information Science and Technology, vol. 65, iss. 2, pp. 1527-1540, 2014. (PDF)

  • B. Gipp, N. Meuschke, C. Breitinger, J. Pitman, and A. Nuernberger, “Web-based Demonstration of Semantic Similarity Detection using Citation Pattern Visualization for a Cross Language Plagiarism Case”, in Proceedings International Conference on Enterprise Information Systems (ICEIS), Special Session on Information Systems Security, Lisbon, Portugal, 2014, pp. 677-683. (PDF)

  • N. Meuschke and B. Gipp, “State of the Art in Detecting Academic Plagiarism”, International Journal for Educational Integrity, vol. 9, iss. 1, pp. 50-71, 2013. (PDF)

  • B. Gipp, N. Meuschke, C. Breitinger, M. Lipinski, and A. Nuernberger, “Demonstration of Citation Pattern Analysis for Plagiarism Detection”, in Proceedings International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2013. (PDF)

  • B. Gipp and N. Meuschke, “Citation Pattern Matching Algorithms for Citation-based Plagiarism Detection: Greedy Citation Tiling, Citation Chunking and Longest Common Citation Sequence”, in Proceedings ACM Symposium on Document Engineering (DocEng), 2011. (PDF)

  • B. Gipp, N. Meuschke, and J. Beel, “Comparative Evaluation of Text- and Citation-based Plagiarism Detection Approaches using GuttenPlag”, in Proceedings ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL), 2011. (PDF)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.