Git Product home page Git Product logo

fake-news-detection's Introduction

Fake News Detection

Fake News is a spread of disinformation and hoaxes through any news platform. The imminent threat of such a widespread misinformation is obvious and hence we have looked into ways in which such Fake News can be identified with the help of Artificial Intelligence. Fake News Detection and analysis is an open challenge in AI!

  • The main question is what exactly constitutes Fake News?

Types of Fake News identified by us:

Political Fake News:

  • The statement and justification approach is used.The statement is a direct quote from a public personality. The justification is the context or background information for the statement. Justification is important as the statement on its own doesn’t imply anything in regard to it being False or True.

Clickbait:

  • Another source of Fake News is a click bait scenario. Where the headline has no relation with the actual content and is just used to spike reader interest.

Fake News Articles:

  • Some fake articles have relatively frequent use of terms seemingly intended to inspire outrage and the present writing skill in such articles is generally considerably lesser than in standard news.

Techniques

We divide our technique into 3 categories

  • Politcal Fake News: The LIAR-PLUS Dataset along with additional data scraped from Politifact website was used by taking training size to 20,000 examples. We have concatenated the statement and description by a sentence. Given enough training examples the model learns to make inference on statements given a justification and we got the best accuracy by BERT uncased large model of 70 % on test set.

  • Clickbait: We have used BILSTM attention model and fine-tuned it using a dataset of train size 19,000 examples.

Clickbait

  • Fake News Article: Custom made dataset collected form politifactMediabias chart. To find out if this hypothesis is correct, we made a labelled data set that will give us examples of fake news and examples of real news as provided by professional fact checkers. We scraped data from the urls and pre processing is done using NLTK, we fine tuned BERT to predict the news article as real or fake.

Fake_News_Article

Results

The results obtained for each model are as follows:

  • Political Fake News: Out of all the model architectures we tried, the architecture with the best accuracy was with BERT with an accuracy of 70% on the test set. This surpasses the accuracy outlined in the paper.
  • Clickbait: We used the dataset given in clickbait challenge and applied and fine-tuned LSTM attention model to give 76 % accuracy on test set.
  • Fake News Articles: We used fine tuned BERT on our model trained on custom dataset. Our is performing with an accuracy of 81%.

Conclusion

  • Political Fake News: Our model has huge future prospects and can be easily scaled. The Political Fake News model is currently trained on the US dataset where fake news was the main topic of the US election 2016 and the problem is expected to grow in India as well. The future work would extend this to an Indian dataset .
  • Clickbait: In social media, headlines are exaggerated whose main motive is to mislead the reader. This creates a nuisance for the online user.
  • Fake News Articles: The model identifies patterns and can identify patterns in unreal data. The model only requires a working URL to work.

fake-news-detection's People

Contributors

abhilashreddys avatar addy369 avatar yash0330 avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

fake-news-detection's Issues

Fake News Article inference.py causes error: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.IntTensor instead (while checking arguments for embedding

Error:
Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.IntTensor instead (while checking arguments for embedding

I Made some small modifications to the code to make it work on my machine. My version of the code:

import sys
import pandas as pd
import numpy as np
import requests
import bs4
from bs4 import BeautifulSoup
import torch.nn as nn
import torch
from pytorch_pretrained_bert import BertTokenizer, BertModel
from keras.preprocessing.sequence import pad_sequences

# Model
class BertBinaryClassifier(nn.Module):
    def __init__(self, dropout=0.1):
        super(BertBinaryClassifier, self).__init__()
        self.bert = BertModel.from_pretrained('bert-base-uncased')
        self.dropout = nn.Dropout(dropout)
        self.linear = nn.Linear(768, 1)
        self.sigmoid = nn.Sigmoid()
    
    def forward(self, tokens, masks=None):
        _, pooled_output = self.bert(tokens, attention_mask=masks, output_all_encoded_layers=False)
        dropout_output = self.dropout(pooled_output)
        linear_output = self.linear(dropout_output)
        proba = self.sigmoid(linear_output)
        return proba

# Preprocessing 

# Imporing tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased', do_lower_case=True)

def Punctuation(string): 
  
    # punctuation marks 
    punctuations = '''!()-[]{};:'"\,<>./?@#$%^&*_~'''
  
    # traverse the given string and if any punctuation 
    # marks occur replace it with null 
    for x in string.lower(): 
        if x in punctuations: 
            string = string.replace(x, "") 
  
    # return string without punctuation 
    return string

def get_text(url):
    try:
        result=requests.get(str(url))
    except Exception:
        print("error in scraping url")
        return None
    src=result.content
    soup=BeautifulSoup(src,'html.parser')   
    text=[] 
    for p_tag in soup.find_all('p'):
        text.append(p_tag.text)
    text = Punctuation(str(text))
    return text


# loading model
# cange path as per your requirement
path='./weights.pth'
model = BertBinaryClassifier()

# optimizer = torch.optim.Adam(model.parameters(), lr=3e-6)
model.load_state_dict(torch.load(path,map_location=torch.device('cpu')))
model.eval()

def test(article,model):
    bert_predicted = []
    all_logits = []
    test_tokens = list(map(lambda t: ['[CLS]'] + tokenizer.tokenize(t)[:255], [article]))
    test_tokens_ids = list(map(tokenizer.convert_tokens_to_ids, test_tokens))
    test_tokens_ids = pad_sequences(test_tokens_ids, maxlen=256, truncating="post", padding="post", dtype="int")
    test_masks = [[float(i > 0) for i in ii] for ii in test_tokens_ids]
    test_masks_tensor = torch.tensor(test_masks)
    test_tokens_ids = torch.tensor(test_tokens_ids)
    with torch.no_grad():
        logits = model(test_tokens_ids, test_masks_tensor)
        numpy_logits = logits.cpu().detach().numpy()
        if(numpy_logits[0,0] > 0.5):
            return 'Fake'
        else:
            return 'True'


def answer(url,model):
    article = get_text(url)
    ans = test(article,model)
    return ans

url = str(sys.argv[1])
print(answer(url,model))


Does anybody know how to fix this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.