Git Product home page Git Product logo

taneemishere / spam-comment-detector Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 1.0 2.21 MB

A Machine Learning Application that when you provide a comment and in return it predicts you whether it is spam or not.

License: MIT License

Jupyter Notebook 5.30% Python 94.28% CSS 0.01% HTML 0.05% PowerShell 0.33% Batchfile 0.02%
artificial-intelligence machine-learning jupyter-notebook naive-bayes-classifier spam-detection numpy pandas html css flask

spam-comment-detector's Introduction

Spam-Comment-Detector

A Machine Learning Application that when you provide a comment and in return it predicts you whether it is spam or not. This will going to be a full fledge application that will has a proper front and back end.

The Application Working

The appplication works as when user enters a comment in a HTML form, which is the index.html file, then that text captured by the "POST" method goes to the backend Python application based on Flask. Then there is Machine Learning model that is trained over to make prediction or more specifically to classify whether the entered comment or say text is spam or not spam (Ham). Now when everthing is completed and the model predicts it, then by the "GET" method the Flask application sends that prediction to the HTML form which is shown in the show.html file.
The Ham word I first read in the book "Thoughtful Machine Learning with Python: A Test-Driven Approach" by Matthew Kirk in O'REILLY Series, while reading the Naive Bayes Classification chapter.

The Model

The machine learning model working under the hood is the Nive Bayes Classifier which does really well in doing prediction about the spam classification. The Naive Bayes Classifiers is supervised learning algorithms that are base on the Bayes Theorem. And these algorithms did really well on the data where every data point or say featrues are independent of each other. Some of the Naive Bayes Classifiers are:

  • Multinomial Naive Bayes:

The one which we used in this project. And this classifier is typically used for the document classification.

  • Bernoulli Naive Bayes:

This is used for binary classification in documents like a word occured or not in this document.

  • Gaussian Naive Bayes:

This one is based on the conitinous distribution.

The Data:

The data used in this project is taken from the UCI Machine Learning Repository. It does contain 5 datasets from different artits' vidoes. The after analyzing all the datasets it we only come up with missing values in the Youtube04-Eminem.csv dataset, which was not that much important like it was in dates. And for this project we don't need to mess up with the dates. Then all the five datasets are concatenated into one. And then assign keys to everyone so that if we need to inspect specific, it would be easy for us to do so. And so finally we considered taking the data under CONTENT and CLASS columns.

The Code Flow

After describing the data into a single dataframe then extract the features from the comments into vector for our model. So we converted through vectorization our contents or say comments into crunching numbers so that it is awesome for our model.
Then split the data into training and testing sets, 33% given to the testing. Then traing the classifier over the 66% of the data and yooooo we get 91.95% of total accuracy over testing sets.
For the future prediction we've to use that pipelining as to first convert the text via the CountVectorizer so that our model gets the same shaped data by which it is trained.

spam-comment-detector's People

Contributors

taneemishere avatar zarakk avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Forkers

zarakk

spam-comment-detector's Issues

Add .html and .css files!

  • Create two directories named as: (keeping the case sensitivity as small)

    • templates
    • static/styles
  • Add index.html and show.html files into the templates directory

  • Add style.css file into static/styles

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.