Git Product home page Git Product logo

no_nokol-1's Introduction

NO_NOKOL Plagiarism-Checker

A web application to check if a document contents are plagiarised with multiple options availaible based upon the requirements.

To run the app

  1. run command "pip install -r requirements.txt"
  2. run command "python app.py" in the terminal
Note1 : Make sure that python 3.8+ is installed in your system. Also installing Anaconda is recommended, if you don’t have Anaconda installed yet, follow these steps provided on the Anaconda installation page.
Note2 : Make sure that Java is installed in your machine to use the tika package
Note3 : To use the Assamese Plag-Check, follow these steps :
  1. Get your API key and location to use the Microsoft translation API ( refer here)
  2. Create a .env file and write this code (replace "yourAPIkey" and "yourLocation" with the one that you obtained in step i.)
API_KEY=yourAPIkey
LOCATION=yourLocation

How it works

  • Based on the requirements there are three approaches to this project.
  • It searches online using web scarapper designed to utilize Bing serarch engine for some queries. Queries are extracted from the source txt/pdf/odt/docx file.
  • First approach works on English extracted data.
  • Second approach works on Assamese data translated using Microsoft Translator api and then translated data is fed to scrapper for match.
  • Third approach works in a way extracted data is fed to a logic where synonyms are recoginized using NLTK Synset and then we replace the nouns and adjectives with its most commonly used synonym (obtained from NLTK brown corpus) after that matches are found using the same method.
  • Resulting URL, matched contents are checked for similarity with given text query.
  • Finally the most probable sources are displayed on views with information showing plagiarised percentage and sentences.

Required Libraries

  • The project uses tika module to extract text from files.
  • Other required libraries are:
    • beautifulsoup4==4.10.0
    • bs4==0.0.1
    • certifi==2021.5.30
    • charset-normalizer==2.0.6
    • click==8.0.1
    • colorama==0.4.4
    • Flask==2.0.1
    • idna==3.2
    • itsdangerous==2.0.1
    • Jinja2==3.0.1
    • joblib==1.0.1
    • MarkupSafe==2.0.1
    • nltk==3.6.3
    • python-dotenv==0.19.0
    • regex==2021.8.28
    • requests==2.26.0
    • soupsieve==2.2.1
    • tika==1.24
    • tqdm==4.62.3
    • urllib3==1.26.6
    • Werkzeug==2.0.1

Folder Structure

  • Static/ : Contains views functionallity logics, script and styles.
  • Templates/ : Contains html files for rendering on views.
  • app.py : Main script file for running FLask application.
  • modules.py : Consists of scrapper logic and required modules like tika logic for extracting text.

no_nokol-1's People

Contributors

reckon77 avatar akashchetia avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.