Git Product home page Git Product logo

toxic_crepe's Introduction

Toxic CREPE

A pytorch implementation for the CREPE text classification model (https://arxiv.org/abs/1509.01626).

It is trained to predict the toxicity of a given piece of text.


Data

The train data has to be stored as a .csv file in the ./data folder and contain 2 columns - the labels and the raw texts themselves.

Importnat note: the current implementation is a binary classifier, which means it has only one single sigmoid output and is being trained with the Binary Cross Entropy loss, hence the labels should be integer values (either 0 or 1) and don't need any additional encoding.


Variables

Model and data parameters:

  • MAX_LENGTH - defines the maximal length of each input text (longer texts are truncated, shorter texts are padded with zeros)

  • CHANNELS - the number of filters (same on each hidden convolutional layer)

  • KERNEL_SIZES - the list of kernel sizes for each convolutional layer

  • POOLING_SIZE - the size of each pooling operation in the model

  • LINEAR_SIZE - the size of the hidden fully connected layers (same for all of them)

  • DROPOUT - the dropout value

  • OUTPUT_SIZE - the number of the outputs (should be equeal to the number of classes in case it's a multilabel classification problem)

  • EPOCHS - the number of epochs

  • BATCH_SIZE - the batch size

  • LEARNING_RATE- the learning rate

Other parameters

  • MODEL_PATH - the path for all model-related files

  • DATA_PATH - the training dataset path (folder + filename)

  • EXPERIMENT_PREFIX - a prefix for the experiment - all the corresponding files will have this prefix before their names

  • RUS - a hardcoded cyrillic alphabet string

  • BEST_MODEL_PATH - the filename of the best model (folder + filename)


Train and outputs

To train the model you need to define all the desired parameters in the constants.py file (or in a dev.env file in the future) After that you can run:

python train.py

The training process will procude multiple files in the model directory:

  • A train log file called train.log

  • Two models with the .pth.tar extension: the best model and last model

  • Two json files with the evaluation metrics for the best and the last models

toxic_crepe's People

Contributors

ivanklimuk avatar

Stargazers

Anatoli Belchikov avatar

Watchers

Anatoli Belchikov avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.