Git Product home page Git Product logo

scweet's Introduction

A simple and unlimited twitter scraper with python and without authentification.

In the last days, Twitter banned almost every twitter scrapers. This repository represent an alternative legal tool (depending on how many seconds we wait between each scrolling) to scrap tweets between two given dates (start_date and max_date), for a given language and list of words or account name, and saves a csv file containing scraped data. It is also possible to scrape user profile information, including following and followers.

Scweet uses only selenium to scrape data. Authentification is required in the case of followers/following scraping. It is recommended to log in with a new account (if the list of followers is very long, it is possible that your account will be banned). To log in to your account, you need to enter your username and password in env file. You can controle the wait parameter in the get_followers and get_following functions.

The user code allows you to get all user information, including location, join date and lists of followers and following. Check this example.

Note that all these functionalities will be added in the final version of the library.

Requierments :

pip install -r requirements.txt

Results :

Tweets :

The CSV file contains the following features (for each tweet) :

  • 'UserScreenName' :
  • 'UserName' : UserName
  • 'Timestamp' : timestamp of the tweet
  • 'Text' : tweet text
  • 'Emojis' : emojis existing in tweet
  • 'Comments' : number of comments
  • 'Likes' : number of likes
  • 'Retweets' : number of retweets
  • 'Image link' : Link of the image in the tweet (it will be an option to download images soon).
  • 'Tweet URL' : Tweet URL.

Following / Followers :

The get_following and get_followers in user give a list of following and followers fo a given user. Note that the user should start with an uppercase letter.

More features will be added soon, such as "all reaplies of each tweet for a specific twitter account"

Usage :

Notebook example :

You can check the example here.

Terminal :


optional arguments:
  -h, --help            show this help message and exit
  --words WORDS         Words to search. they should be separated by "//" : Cat//Dog.
  --from_account FROM_ACCOUNT
                        Tweets posted by "from_account" account.
  --to_account TO_ACCOUNT
                        Tweets posted in response to "to_account" account.
  --hashtag HASHTAG
                        Tweets containing #hashtag
  --max_date MAX_DATE   max date for search query. example : %Y-%m-%d.
  --start_date START_DATE
                        Start date for search query. example : %Y-%m-%d.
  --interval INTERVAL   Interval days between each start date and end date for
                        search queries. example : 5.
  --navig NAVIG         Navigator to use : chrome or edge.
  --lang LANG           tweets language. Example : "en" for english and "fr"
                        for french.
  --headless HEADLESS   Headless webdrives or not. True or False
  --limit LIMIT         Limit tweets per <interval>
  --display_type DISPLAY_TYPE
                        Display type of twitter page : Latest or Top tweets (
  --resume RESUME       Resume the last scraping work. You need to pass the same arguments (<words>, <start_date>, <max_date>...)```

### To execute the script : 
python scweet.py --words "excellente//car" --to_account "tesla"  --max_date 2020-01-05 --start_date 2020-01-01 --limit 10 --interval 1 --navig chrome --display_type Latest --lang="en" --headless True

scweet's People

Contributors

altimis avatar bengadisoufiane avatar jagoosw avatar okkymabruri avatar perara avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.