Git Product home page Git Product logo

avpd's Introduction

Authorship Verification for Hired Plagiarism Detection (AVHPD)

For project description, project related resources, and other forms of documentation, usage, etc. Please refer to our github pages link provided here: https://grchristensen.github.io/avpd/about

avpd's People

Contributors

grchristensen avatar noahwong123 avatar enriquezdaniel avatar engineeringforever2018 avatar

Watchers

 avatar

avpd's Issues

TODO: (Backend) Error Handling

When creating the API requests I only performed the bare minimum in order to make the request work, I did not implement good error handling for scenarios such as when a student submits an invalid file format. These requests need to be hardened to account for these scenarios.

There should be a few TODO comments littered around the backend code, these would be a good place to start.

TODO: (Backend) User Endpoint

Right now API functions such as "List Students" only return the ids of the users that are returned. This is fine, but there needs to be some way to get the full info for an id. Therefore there needs to be a user endpoint that can be queried.

The endpoint should be http://avpd.com/users/<id>

It should support:

  • GET - Get the user information (such as first name and last name) for the given user.

TODO: (Notebooks) Profile and Feature Extractor Refactor

Currently, style profiles are fed text and score text. They take a feature extractor as a dependency in order to process the text and score it. This dependency is unnecessary and leads to a restrictive design because feature extraction and profiling would like to be done at different stages for efficiency concerns. Also, profiles do not handle edge cases well which has led to hard to find bugs when trying to benchmark and utilize the profiles.

Profiles should be refactored so that they only work with numpy arrays representing extracted features. If the previous behavior is desired, profiles and feature extractors can be composed into a coordinator class that takes both as dependencies, while still allowing profiles/extractors to be composed in different ways. This is currently not needed though. Also, the flagging responsibility of the profiles should be handed off to another class, Thresholder, since there is not a general threshold that applies to all feature extractors. This class should be trainable, and the algorithm complexity for training should be documented with the class.

Feature extractors should retain their interface, but they should internally be decomposed into smaller classes, in case a similar refactor for them is needed in the future.

The benchmarking code should take advantage of this refactor by preprocessing the BAWE dataset beforehand for each feature extractor and then only utilizing the profiles/thresholders in the benchmarking process.

Frontend TODO

  1. Add APIs needed for UI
    a) Student request to get classes they are in
    b) Update "List students" so that is displays first and last name //right now it only has id number for students
    c) Message between students and instructor if we want in app messaging

TODO: (Notebooks) Profile and Extracted Features Persistence

Profiles and features extracted from text need to be able to be saved in files by the backend. I think the best way to handle this is to have a save function on the profile and extracted features, that returns some kind of file object. The file object needs to be able to be assigned to a Django FileField.

TODO: (Backend) Highlighting Implementation

We need to take the highlighting code and turn it into a module that accepts a word document and the positions of sentences to highlight and returns a new word document that has the specified sentences highlighted.

This functionality should be implemented through the API with a GET request at http://avpd.com/instructor/classrooms/<id>/assignments/<id>/submissions/<id>/detailed-report.

Notebooks TODO

  1. Automated threshold discovery
  2. MICUSP data extraction
  3. Function to split text into sentences and annotate each sentence with the distance from the profile.
  4. Refactor architecture so that not every class has to split sentences in order to work.

TODO: (Backend) Profile persistence

Whenever the instructor accepts an submission, the profile should be fed into and saved.

The mocks from #10 need to be completed before this can be worked on.

TODO: (Backend) Profile and Feature Extractor Refactor

The API needs to take advantage of the profile refactor in order to speed up the scoring process for instructor users.

When the student submits an assignment, the TextProcessor class should be called to preprocess the text into PreprocessedText, which is then saved in some model that is connected to that student. When the instructor asks for an assignment to be scored, all of the previous preprocessed texts are fed into the StyleProfile and then the new preprocessed text is scored with distances.

Before this can be worked on the mocks from #10 need to be completed.

TODO: (Backend) Student Classrooms Endpoint

A new API endpoint needs to be added so that students can see what classrooms they're enrolled in.

The endpoint should be http://avpd.com/student/classrooms.

It should support:

  • GET - Get the classrooms that the student is in.

TODO: (Notebooks) Stress Testing

We need to test edge cases and other scenarios for the profiles and feature extractors. These tests should be executed on the profile and feature extractor classes that are provided to the backend, not the lower level versions.

Look at the testing assignment for ideas on what to test.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.