Git Product home page Git Product logo

pull-twitter-followers's Introduction

Twitter follower harvester

Pull follower ids for a set of Twitter users. If you want to harvest lots of accounts (or the accounts have lots of followers) this can take a long time, and it's easy for stuff to go wrong, which leaves you with incomplete data.

This repo just:

  • Each account that you want to harvest gets pushed into a Redis-backed RQ queue.

  • The worker pops a screen name, and starts cursoring out the follower list from the Twitter API. When this finishes, the list is committed as a batch of rows to a local SQLite database in a single transaction. This way, you never have an incomplete follower list for an account.

  • In the database, each follower looks like:

    CREATE TABLE follower (
    	id INTEGER NOT NULL,
    	screen_name VARCHAR NOT NULL,
    	job_timestamp INTEGER NOT NULL,
    	follower_id INTEGER NOT NULL,
    	PRIMARY KEY (id)
    );

    Where job_timestamp is the same for all rows harvested during a given cursor iteration. (The time the job started.) This makes it's possible to repeatedly snapshot the same account(s) at different points in time.

Setup

  1. Install Redis and pipenv.

  2. Clone this repo, pipenv install, pipenv shell.

  3. Set your Twitter API credentials as ENV vars:

    export TWITTER_TOKEN=XXX
    export TWITTER_SECRET=XXX

Usage

  1. Put the screen names of the accounts you want to harvest into a text file, which can sit anywhere.

  2. Run the spool task to queue a job for each screen name: inv spool <txt file>

  3. Start a worker: rq worker. Data flows to a ./followers.db file.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.