Git Product home page Git Product logo

political-ideology-detection-on-twitter's Introduction

Predicting Political Ideology of Twitter Users

Problem Statement

Given a text (single/multiple documents), figuring out the political ideology of the person to which the text belongs to.

Dataset: Social Network Data - Twitter

Dataset

We acquired data from the author Lyle Ungar and received the Twitter IDs of the users which were annotated in the range of (1-7).

As of now, I have collected the data from the given Twitter ids by them and from the Twitter Lists which we had seen that day (MBFC - Media Bias / Fact Check Twitter Lists).

A brief overview of the data:

  • 3130 Tweet ID objects from the Lyle Ungar Team- Annotated 1-7.
  • 368 Left Leaning and 164 Right Leaning Tweet ID objects.
  • Most of the Tweet IDs from the Lyle Ungar team are by actual users as such whereas the Tweet IDs from the MBFC lists are of influencers as in the newspaper press, news website or so.

Methodology

Prediction

  • This would be a supervised classification problem. We aim to classify active user and not so active user who supports one or another political ideology.
  • First, we’ll extract the dataset from Twitter to form the initial seed data set (which we can filter out in later steps).
  • The dataset is cleaned by removing the stopwords and lower-casing the tweet text.
  • Naive Bayes, MaxEnt and Decision Tree are used for classification on a pre-trained model from the dataset.

Model

  • MALLET (Machine Learning for Langauge Toolkit) is used to train the model on two available datasets.
  • The data is converted to a vector of feature/value pairs, such that a feature consists of a distinct word type and the value is the number of times that word occurs in the text.

Live Prediction 💡

The live prediction script requires tweepy and MALLET installed. You'll also require Twitter API keys for extracting the user's tweets.

$ python live_prediction.py

Results

Prediction

Test data:
b'We should make america conservative again'
b'OBAMA ALL THE WAY!!!'

Output:
b'We	1	0.5256503227197841	0	0.4743496772802158
b'OBAMA	1	0.27952755637092475	0	0.7204724436290753

The first line of the test data is right-wing biased which is shown in the output with label 1 (Right) given more weightage as compared to the label 0 (Left). The second line is opposite of the first denoting the Obama being a Leftist.

Classification

These results also incorporate hashtags, mentions and sentiment (Senti Strength is used) along with the Bag of Words features (text) which are independent of language.

Binary Classification (Left or Right):

alt text

Multi-class Classification (Left, Right or Neutral):

alt text

References

  1. Predicting the Political Alignment of Twitter Users(2011) - https://ieeexplore.ieee.org/document/6113114
  2. Ideology Detection for Twitter Users with Heterogeneous Types of Links(2016) - https://arxiv.org/abs/1612.08207
  3. Beyond Binary Labels: Political Ideology Prediction of Twitter Users(2017) - https://aclanthology.coli.uni-saarland.de/papers/P17-1068/p17-1068, Slides - http://anthology.aclweb.org/attachments/P/P17/P17-1068.Presentation.pdf

political-ideology-detection-on-twitter's People

Contributors

shrebox avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.