Git Product home page Git Product logo

app-store-scraper's Introduction

build PRs Welcome PyPI downloads license code style

   ___                _____ _                   _____
  / _ \              /  ___| |                 /  ___|
 / /_\ \_ __  _ __   \ `--.| |_ ___  _ __ ___  \ `--.  ___ _ __ __ _ _ __   ___ _ __
 |  _  | '_ \| '_ \   `--. \ __/ _ \| '__/ _ \  `--. \/ __| '__/ _` | '_ \ / _ \ '__|
 | | | | |_) | |_) | /\__/ / || (_) | | |  __/ /\__/ / (__| | | (_| | |_) |  __/ |
 \_| |_/ .__/| .__/  \____/ \__\___/|_|  \___| \____/ \___|_|  \__,_| .__/ \___|_|
       | |   | |                                                    | |
       |_|   |_|                                                    |_|

Quickstart

Install:

pip3 install app-store-scraper

Scrape reviews for an app:

from app_store_scraper import AppStore
from pprint import pprint

minecraft = AppStore(country="nz", app_name="minecraft")
minecraft.review(how_many=20)

pprint(minecraft.reviews)
pprint(minecraft.reviews_count)

Scrape reviews for a podcast:

from app_store_scraper import Podcast
from pprint import pprint

sysk = Podcast(country="nz", app_name="stuff you should know")
sysk.review(how_many=20)

pprint(sysk.reviews)
pprint(sysk.reviews_count)

Extra Details

Let's continue from the code example used in Quickstart.

Instantiation

There are two required and one positional parameters:

  • country (required)
  • app_name (required)
    • name of an iOS application to fetch reviews for
    • also used by search_id() method to search for app_id internally
  • app_id (positional)
    • can be passed directly
    • or ignored to be obtained by search_id method internally

Once instantiated, the object can be examined:

>>> minecraft
AppStore(country='nz', app_name='minecraft', app_id=479516143)
>>> print(app)
     Country | nz
        Name | minecraft
          ID | 479516143
         URL | https://apps.apple.com/nz/app/minecraft/id479516143
Review count | 0

Other optional parameters are:

  • log_format
    • passed directly to logging.basicConfig(format=log_format)
    • default is "%(asctime)s [%(levelname)s] %(name)s - %(message)s"
  • log_level
    • passed directly to logging.basicConfig(level=log_level)
    • default is "INFO"
  • log_interval
    • log is produced every 5 seconds (by default) as a "heartbeat" (useful for a long scraping session)
    • default is 5

Fetching Review

The maximum number of reviews fetched per request is 20. To minimise the number of calls, the limit of 20 is hardcoded. This means the review() method will always grab more than the how_many argument supplied with an increment of 20.

>>> minecraft.review(how_many=33)
>>> minecraft.reviews_count
40

If how_many is not provided, review() will terminate after all reviews are fetched.

NOTE the review count seen on the landing page differs from the actual number of reviews fetched. This is simply because only some users who rated the app also leave reviews.

Optional Parameters

  • after
    • a datetime object to filter older reviews
  • sleep
    • an int to specify seconds to sleep between each call

Review Data

The fetched review data are loaded in memory and live inside reviews attribute as a list of dict.

>>> minecraft.reviews
[{'userName': 'someone', 'rating': 5, 'date': datetime.datetime(...

Each review dictionary has the following schema:

{
    "date": datetime.datetime,
    "isEdited": bool,
    "rating": int,
    "review": str,
    "title": str,
    "userName": str
 }

app-store-scraper's People

Contributors

cowboy-bebug avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.