Git Product home page Git Product logo

plagchecker's Introduction

plagchecker

Build Status
This is a python project which is aimed to check programs for the level of identity. Now it is being developed for C language, it should work with single-file and multifile projects. The main idea is to make a configurable tool to be able to change/add metrics and different languages support in the future without changes in existing code. That is why this application is based on plug-ins for reading, tokenization, estimation.

Plugins

As it was mentioned before, the application should be implemented using plugins to make it flexible.
The following functions need to be executed for whole process:

  • Readers of all sources in some established order
  • Attribute methods implementation for filtering of sources that are not plagiarism for sure.
  • Preprocessors - changes of source code for further tokenization to remove unnecessary parts.
  • Tokenizers of source code for methods that work with tokenized representation of program.
  • Mappings for tokenization.

All these functions are implemented by plugins. There is detailed information about formats and requirements for each type of plugins in directories that contain implementations of these functions in src. To get information how to implement custom scripts, read README.md of needed directory.

Testing

unittest is used for testing. An example of new tests creation may be taken from existing ones.
If tests are created any other way, there is no guarantee that they will be executed by CI.
To run all tests locally run the following script from the project root:

python unit-tests/all_tests.py

The same way is used to execute integration tests. To run integration tests use
python integration-tests/all_tests.py

To add test for execution with other existing tests and to run it on CI, set path to the the script from _unit-tests_ or _integration-tests_ folders to _all_tests.py_ suite.

plagchecker's People

Contributors

akhtyamovrr avatar

Watchers

 avatar

plagchecker's Issues

Encoding troubles on CI. Tests with Russian comments run successfully on local environment but fail on CI

Stacktrace:

Traceback (most recent call last):
File "/home/travis/build/akhtyamovrr/plagchecker/integration-tests/attribute_check_added.py", line 7, in test_integration
self.assertEquals([], integration_logic.attribute_check())
File "/home/travis/build/akhtyamovrr/plagchecker/integration-tests/integration_logic.py", line 21, in attribute_check
preprocessed = read_and_preprocess()
File "/home/travis/build/akhtyamovrr/plagchecker/integration-tests/integration_logic.py", line 15, in read_and_preprocess
source = reader.read_code(integration_sources + 'complex', '*.c', order)
File "./src/readers/reader.py", line 30, in read_code
src_code = source.read()
File "/home/travis/virtualenv/python3.4.2/lib/python3.4/codecs.py", line 313, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xce in position 23: invalid continuation byte

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.