Git Product home page Git Product logo

twitter_scraping_and_saving's Introduction

Twitter_Scraping_and_Saving

This repo has two jupyter notebooks designed to make it easy to scrape, munge, and save tweet data in csv or json files.

What kind of data do you want to scrape?

When most (non programming) people think of a tweet, it's a single social media post that consists of a 280 character message.

I was recently asked to help scrape some tweets for a professor. They provided me with three hashtags and a date range they were interested in. I asked them, "what variables are you interested in and what format would you like the data".

They responded, "Just all of the twitter data for up to 60,000 tweets in a csv file".

What this professor (and probably most people that have never munged through a tweet before) didn't realize is that it is not uncommon for a single tweet to have 300+ nested key, value pairs!

Add some additional hashtags, mentions, retweets - you can easily surpase 400 nested key, value pairs!

Tweet json returns using tweepy include the tweet text, but it also includes a ridiculous amount of information you would probably never even care about; i.e. the user's account profile_link_color.

If you've never looked at what a single tweet response looks like, I would encourage you to look at the twitter_variables_blank_example.json file in this repo. This example has more than 360 key, value pairs from a single tweet!

What's in this repo?

There are two jupyter notebook files intended to help you quickly scrape and munge tweets using python and tweepy. Once you have your own twitter API keys, these two notebooks should enable you to scrape, munge, and save 50,000+ tweets in a csv in 30 minutes or less!

Tweet_Scraping_Public.ipynb

  • You need to add your own twitter API keys, look below for some info on how to get your own keys.
  • Enter a search term (#ILovePython)
  • Scrape up to 5,000 tweets at a time starting with the most recent tweet ID matching the search term
  • Adjust the tweet ID so you can search for older tweets
  • Save the tweet returns in JSON and CSV files
  • Simple Vader Sentiment Analysis

Twitter_Munging_csv_Conversion.ipynb

  • Load and merge json files
  • Extract only the key, value pairs you are interested in
  • Save the data as json or csv files
  • Convert complete tweet json files into flattened or partially flattened csv files

Twitter API Keys?

I used a python library Tweepy to access Twitter's API to scrape data. You will need to enter your own twitter API keys for tweepy to authenticate properly.

Warning Generating Twitter API keys isn't a simple 5 minute process. It can take anywhere from minutes to weeks to get a Twitter Developer account approved so you can generate API keys - so have some patience and plan ahead!

To create a Twitter Developer account, visit Twitter Developer and apply for access.

twitter_scraping_and_saving's People

Contributors

dkreitzer avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.