Pull follower ids for a set of Twitter users. If you want to harvest lots of accounts (or the accounts have lots of followers) this can take a long time, and it's easy for stuff to go wrong, which leaves you with incomplete data.
This repo just:
-
Each account that you want to harvest gets pushed into a Redis-backed RQ queue.
-
The worker pops a screen name, and starts cursoring out the follower list from the Twitter API. When this finishes, the list is committed as a batch of rows to a local SQLite database in a single transaction. This way, you never have an incomplete follower list for an account.
-
In the database, each follower looks like:
CREATE TABLE follower ( id INTEGER NOT NULL, screen_name VARCHAR NOT NULL, job_timestamp INTEGER NOT NULL, follower_id INTEGER NOT NULL, PRIMARY KEY (id) );
Where
job_timestamp
is the same for all rows harvested during a given cursor iteration. (The time the job started.) This makes it's possible to repeatedly snapshot the same account(s) at different points in time.
-
Clone this repo,
pipenv install
,pipenv shell
. -
Set your Twitter API credentials as ENV vars:
export TWITTER_TOKEN=XXX export TWITTER_SECRET=XXX
-
Put the screen names of the accounts you want to harvest into a text file, which can sit anywhere.
-
Run the
spool
task to queue a job for each screen name:inv spool <txt file>
-
Start a worker:
rq worker
. Data flows to a./followers.db
file.