Git Product home page Git Product logo

categorizing-inappropriate-texts's Introduction

Categorizing-Inappropriate-Texts

Built a multi-label classification model that detects inappropriate texts and help us to further categorize them.

Dataset

We have used a kaggle dataset which contains Wikipedia comments labeled by human raters for inappropriate behaviour. The dataset has 6 labels: ’toxic’, ’severe toxic’,’obscene’, ’threat’, ’insult’ and ’identity hate’. We have a total of 2,23,286 data points in our dataset and we have divided it into: 80% training, 10% validation and 10% testing set. Thus the training set has 1,78,626 samples and the validation and testing set have 22,329 samples each.

NLP

As we have a natural language processing problem, we have first performed text cleaning which involves: converting to lowercase,removing special characters,removing numbers,removing stop words,replacing contractions with their full forms and lemmitization.

For factorizing the text data we have used 3 feature extraction techniques: Bag of word, TF-IDF, word2vec.

Baseline Models

  • Naive Bayes
  • Logistic Regression

Advanced Models

  • SVM
  • Random Forest
  • Neural Networks

Out of these, tuned Neural Network with feature engineering, text cleaning and oversampling gave the best results for us with a score of 0.7 (Using 60% Recall + 40% Precision).

For further details please check the final report.

categorizing-inappropriate-texts's People

Contributors

mansisinghal25 avatar

Watchers

 avatar

Forkers

nimisha2001

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.