Git Product home page Git Product logo

fsarab / covid-19-fake-news-detection Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 786 KB

A binary classification task (real vs fake) and benchmark the annotated dataset with four machine learning baselines- Decision Tree, Logistic Regression, Gradient Boost, and Support Vector Machine (SVM).

Jupyter Notebook 100.00%
classification covid-19 decision-trees fake-news-detection gradient-boosting logistic-regression machine-learning nlp svm text-classification

covid-19-fake-news-detection's Introduction

COVID-19 Fake News Detection

Description

This project is a NLP for the Machine Learning Course at Tehran-Polytechnic (AUT) University . The goal is classification task (real vs fake) and benchmark the annotated dataset with machine learning techniques. Fake news and rumors are rampant on social media.We annotated dataset with four machine learning baselines - Decision Tree, Logistic Regression, Gradient Boost, and Support Vector Machine (SVM). We obtain the best performance of 93.46% F1-score with SVM.

Challenge

This was a competition challenge based project arranged by CodaLab (https://competitions.codalab.org/competitions/26655). The challenge was to build a system which can detect fake news based on given training dataset (which included tweets and comments from various social media users) The following article is mainly used. paper: [https://arxiv.org/abs/2011.03327]

website: [https://constraint-shared-task-2021.github.io/]

Data

(train, validation, test, test labels): [https://competitions.codalab.org/competitions/26655]

• Real- Tweets from verified sources and give useful information on COVID-19.

• Fake- Tweets, posts, articles which make claims and speculations about COVID-19 which are verified to be not true.

@misc{

  patwa2020fighting,
  title={Fighting an Infodemic: COVID-19 Fake News Dataset}, 
  author={Parth Patwa and Shivam Sharma and Srinivas PYKL and Vineeth Guptha and Gitanjali Kumari and Md Shad Akhtar and Asif Ekbal and Amitava Das and Tanmoy Chakraborty},
  year={2020},
  eprint={2011.03327},
  archivePrefix={arXiv},
  primaryClass={cs.CL}

}

Results

10 mostfrequent tokens after removing stopwords are:

• Fake: coronavirus, covid19, people, will, new, trump, says, video, vaccine, virus.

• Real: covid19, cases, new, tests, number, total, people, reported, confirmed, states.

• Combined: covid19,cases,coronavirus, new, people, tests, number, will, deaths, total.

Decision Tree

Decision Tree Accuracy(test): 0.8864485981308411

Decision Tree Precision(test): 0.8866216229482448

Decision Tree Recal(test): 0.8864485981308411

Decision Tree F1_score(test): 0.8864914934620883

Logistic Regression

Logistic Regression Accuracy(test): 0.9200934579439253

Logistic Regression Precision(test): 0.9203180956962224

Logistic Regression Recal(test): 0.9200934579439253

Logistic Regression F1_score(test): 0.9200590271473257

Gradient Boost

Gradient Boost Accuracy(test): 0.8906542056074767

Gradient Boost Precision(test): 0.8906463847221131

Gradient Boost Recal(test): 0.8906542056074767

Gradient Boost F1_score(test): 0.8906495159369164

SVM

SVM Accuracy(val): 0.9285046728971963

SVM Precision(val): 0.9285379607450457

SVM Recal(val): 0.9285046728971963

SVM F1_score(val): 0.9284869868371163

About Me

I'm M.Sc student in Computer Science at Tehran-Polytechnic (AUT) and interested research in Machine Learning ,Natural Language Processing and Data Science.

🔗 Links

linkedin

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.