abhilashreddys / fake-news-detection Goto Github PK

Detecting Fake News using AI

Jupyter Notebook 1.29% Python 0.04% JavaScript 84.69% CSS 13.97% HTML 0.02%

nlp natural-language-processing tensorflowjs pytorch bert fake-news fake-news-challenge python3 javascript transformer huggingface-transformer huggingface

fake-news-detection's Introduction

Fake News Detection

Fake News is a spread of disinformation and hoaxes through any news platform. The imminent threat of such a widespread misinformation is obvious and hence we have looked into ways in which such Fake News can be identified with the help of Artificial Intelligence. Fake News Detection and analysis is an open challenge in AI!

The main question is what exactly constitutes Fake News?

Types of Fake News identified by us:

Political Fake News:

The statement and justification approach is used.The statement is a direct quote from a public personality. The justification is the context or background information for the statement. Justification is important as the statement on its own doesn’t imply anything in regard to it being False or True.

Clickbait:

Another source of Fake News is a click bait scenario. Where the headline has no relation with the actual content and is just used to spike reader interest.

Fake News Articles:

Some fake articles have relatively frequent use of terms seemingly intended to inspire outrage and the present writing skill in such articles is generally considerably lesser than in standard news.

Techniques

We divide our technique into 3 categories

Politcal Fake News: The LIAR-PLUS Dataset along with additional data scraped from Politifact website was used by taking training size to 20,000 examples. We have concatenated the statement and description by a sentence. Given enough training examples the model learns to make inference on statements given a justification and we got the best accuracy by BERT uncased large model of 70 % on test set.
Clickbait: We have used BILSTM attention model and fine-tuned it using a dataset of train size 19,000 examples.

Fake News Article: Custom made dataset collected form politifactMediabias chart. To find out if this hypothesis is correct, we made a labelled data set that will give us examples of fake news and examples of real news as provided by professional fact checkers. We scraped data from the urls and pre processing is done using NLTK, we fine tuned BERT to predict the news article as real or fake.

Results

The results obtained for each model are as follows:

Political Fake News: Out of all the model architectures we tried, the architecture with the best accuracy was with BERT with an accuracy of 70% on the test set. This surpasses the accuracy outlined in the paper.
Clickbait: We used the dataset given in clickbait challenge and applied and fine-tuned LSTM attention model to give 76 % accuracy on test set.
Fake News Articles: We used fine tuned BERT on our model trained on custom dataset. Our is performing with an accuracy of 81%.

Conclusion

Political Fake News: Our model has huge future prospects and can be easily scaled. The Political Fake News model is currently trained on the US dataset where fake news was the main topic of the US election 2016 and the problem is expected to grow in India as well. The future work would extend this to an Indian dataset .
Clickbait: In social media, headlines are exaggerated whose main motive is to mislead the reader. This creates a nuisance for the online user.
Fake News Articles: The model identifies patterns and can identify patterns in unreal data. The model only requires a working URL to work.

fake-news-detection's People

Contributors

Stargazers

Watchers

Forkers

kiloarchimbo tommyteavee

fake-news-detection's Issues

Fake News Article inference.py causes error: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.IntTensor instead (while checking arguments for embedding

Error:
Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.IntTensor instead (while checking arguments for embedding

I Made some small modifications to the code to make it work on my machine. My version of the code:

import sys
import pandas as pd
import numpy as np
import requests
import bs4
from bs4 import BeautifulSoup
import torch.nn as nn
import torch
from pytorch_pretrained_bert import BertTokenizer, BertModel
from keras.preprocessing.sequence import pad_sequences

# Model
class BertBinaryClassifier(nn.Module):
    def __init__(self, dropout=0.1):
        super(BertBinaryClassifier, self).__init__()
        self.bert = BertModel.from_pretrained('bert-base-uncased')
        self.dropout = nn.Dropout(dropout)
        self.linear = nn.Linear(768, 1)
        self.sigmoid = nn.Sigmoid()
    
    def forward(self, tokens, masks=None):
        _, pooled_output = self.bert(tokens, attention_mask=masks, output_all_encoded_layers=False)
        dropout_output = self.dropout(pooled_output)
        linear_output = self.linear(dropout_output)
        proba = self.sigmoid(linear_output)
        return proba

# Preprocessing 

# Imporing tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased', do_lower_case=True)

def Punctuation(string): 
  
    # punctuation marks 
    punctuations = '''!()-[]{};:'"\,<>./?@#$%^&*_~'''
  
    # traverse the given string and if any punctuation 
    # marks occur replace it with null 
    for x in string.lower(): 
        if x in punctuations: 
            string = string.replace(x, "") 
  
    # return string without punctuation 
    return string

def get_text(url):
    try:
        result=requests.get(str(url))
    except Exception:
        print("error in scraping url")
        return None
    src=result.content
    soup=BeautifulSoup(src,'html.parser')   
    text=[] 
    for p_tag in soup.find_all('p'):
        text.append(p_tag.text)
    text = Punctuation(str(text))
    return text


# loading model
# cange path as per your requirement
path='./weights.pth'
model = BertBinaryClassifier()

# optimizer = torch.optim.Adam(model.parameters(), lr=3e-6)
model.load_state_dict(torch.load(path,map_location=torch.device('cpu')))
model.eval()

def test(article,model):
    bert_predicted = []
    all_logits = []
    test_tokens = list(map(lambda t: ['[CLS]'] + tokenizer.tokenize(t)[:255], [article]))
    test_tokens_ids = list(map(tokenizer.convert_tokens_to_ids, test_tokens))
    test_tokens_ids = pad_sequences(test_tokens_ids, maxlen=256, truncating="post", padding="post", dtype="int")
    test_masks = [[float(i > 0) for i in ii] for ii in test_tokens_ids]
    test_masks_tensor = torch.tensor(test_masks)
    test_tokens_ids = torch.tensor(test_tokens_ids)
    with torch.no_grad():
        logits = model(test_tokens_ids, test_masks_tensor)
        numpy_logits = logits.cpu().detach().numpy()
        if(numpy_logits[0,0] > 0.5):
            return 'Fake'
        else:
            return 'True'


def answer(url,model):
    article = get_text(url)
    ans = test(article,model)
    return ans

url = str(sys.argv[1])
print(answer(url,model))

Does anybody know how to fix this?

abhilashreddys / fake-news-detection Goto Github PK

fake-news-detection's Introduction

Fake News Detection

Types of Fake News identified by us:

Political Fake News:

Clickbait:

Fake News Articles:

Techniques

Results

Conclusion

fake-news-detection's People

Contributors

Stargazers

Watchers

Forkers

fake-news-detection's Issues

Fake News Article inference.py causes error: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.IntTensor instead (while checking arguments for embedding

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent