Git Product home page Git Product logo

social-media-data-scripts's Introduction

Mining data from social media platforms

zoolander

At present, most journalists treat social sources like they would any other — individual anecdotes and single points of contact. But to do so with a handful of tweets and Instagram posts is to ignore the potential of hundreds of millions of others.

Many stories lay dormant in the vast amounts of data produced by everyday consumers. Here's a guide and tool box that may help you.

How to get the data

What data you can get with the scripts

This is a growing list of scripts we've put together to make social data mining easier. Right now we have scripts for Twitter and Facebook.

Setup

Before you begin

  1. If you don’t already have Python installed, start by getting Python up and running. Also have git installed. A helpful guide to getting a brand new machine set up can be found here, courtesy of NPR's Visuals Team.
  2. You should also make sure you have pip.

Twitter and Facebook-related preparations

  1. You need to get developer oauth credentials from the social media platforms you want to tap into. Oauth credentials are like an ID and password (often referred to as an app ID and secret respectively) that you create for an app or a script to access the data stream that a social media company provides. This data stream — also known as a company's Application Program Interface, or API — is often accessible using these credentials through a link (for example, this is what one of these queries could look like https://graph.facebook.com/v2.6/BuzzFeed/posts/?fields=message/&access_token=YOURID|YOURSECRET). Here's where you can get them: Twitter: https://apps.twitter.com/ Facebook: https://developers.facebook.com/

Setting up the scripts

  1. Open up your Terminal and go to the folder where you want to clone this repository of code using the cd bash command.
git clone https://github.com/lamthuyvo/social-media-data-scripts.git
cd social-data-scripts
  1. Then install all the dependencies, i.e. the Python libraries we are using for these scripts by running the following command:
pip install -r requirements.txt

or

sudo pip install -r requirements.txt

If you have problems with installing the dependencies through

pip install requests
pip install tweepy --ignore-installed six

or

sudo pip install requests
sudo pip install tweepy --ignore-installed six
  1. Make a secrets.py file that is modeled after the secrets.py.example file by going into the scripts directory and running these bash commands
cd scripts
cp secrets.py.example secrets.py

Now you have a secrets.py file! 🤗 Open it up in a text editor of your choice (like Atom or Sublime Text!) and fill the credentials you created earlier. Don't forget to save it!

Using Twitter's API

Scripts

  • twitter_tweet_dumper.py: Up to 3200 tweets from an individual account (includes tweet id, time stamp, location, text, retweet count, favorite count (though the favorite count is inaccurate for retweets), whether something was a manual retweet, how it was tweeted (Tweetdek, Android, etc.)). This script was modified from @Yanofsky's original script.
  • twitter_get_friends.py: Twitter user bios (name, display name, bio, followers count (at time of scraping), following count (at time of scraping), when the account was created, location given in the bio) for all the accounts that a specific user follows.
  • twitter_get_followers.py: Twitter user bios (name, display name, bio, followers count (at time of scraping), following count (at time of scraping), when the account was created, location given in the bio) for all the accounts that follow a specific user.
  • twitter_bio_info_compiler.py: Twitter user bios (name, display name, bio, followers count (at time of scraping), following count (at time of scraping), when the account was created, location given in the bio) for a list of accounts you specify
  • twitter_searcher.py: You can search Twitter via its search API going back 7 days and grab tweets (id, author name, timestamp when it was created, favorites (again, unreliable), retweets, text)

Using Facebook's API

Facebook does not allow you to

Scripts

fb_get_page_info.py: This script allows you to get Facebook Page information, such as the title, description, fan count, for a number of Facebook pages.
* fb_id_proofer.py: This script allows you to go through a list of Facebook Page IDs and see whether they are valid.

How to run each script

  1. Follow the instructions in the comments of each script to customize your API query and resulting .csv file
  2. Run your script with the bash command python scriptname.py to generate a csv of tweets or Facebook posts. Then, go make do some journalism-ing!

social-media-data-scripts's People

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.