Git Product home page Git Product logo

shopify-spy's Introduction

Shopify Spy

Shopify Spy is a simple but powerful Scrapy application for scraping Shopify websites. Its main feature is shopify_spider, a universal Shopify spider. The spider is designed to extract detailed data from any Shopify store, including high-value information like vendor names and inventory levels.

To find Shopify stores to scrape, try searching Google with the argument site:myshopify.com.

Forking

Shopify Spy is just a project built using the Scrapy framework. To use it, fork and/or clone the repository. Forking is recommended, since you might want to adjust the settings in shopify_spy/settings.py, and can fetch updates.

Usage

The spider can be used like any Scrapy spider, but you must provide it with an URL. Set your working directory to the project directory and execute one of the following commands.

Scrape a single Shopify store:

scrapy crawl shopify_spider -a url=https://www.example.com/

Scrape multiple Shopify stores at once using a text file with one URL per line:

scrapy crawl shopify_spider -a url_file=resources/urls.txt

Specify which items to scrape:

scrapy crawl shopify_spider -a url=https://www.example.com/ -a products=False -a collections=True

Arguments must always be preceded with the -a flag, as is standard for Scrapy. The results will be stored in a JSON lines file in /resources/shopify_spider.

Please refer to the Scrapy documentation for questions about adjusting the settings, more advanced usage, or the Scrapy framework in general.

Limitations

Attempting to scrape a large store may result in a temporary ban. This can be mitigated by configuring AutoThrottle, which is disabled by default.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change. Make sure to update the tests in tests.py and contracts in the spider.

License

MIT

Credits

Icon by Bartama Graphic.

shopify-spy's People

Contributors

ndgigliotti avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

shopify-spy's Issues

Errors when executing run command

Hello @ndgigliotti! Great tool here and it promises to be something of great value to a project I am working on. However, I am not able to get it to run successfully. I've made a few basic Scrapy spiders but am not too familiar with the details. When I fork and run Shopify Spy with the url of a Shopify site, I get a bunch of errors and I am not sure how to resolve them.

The command I run:

scrapy crawl shopify_spider -a url=https://www.bikeberry.com/

Returns these errors:

judsonlmoore@Judsons-MacBook-Air shopify_spy % scrapy crawl shopify_spider -a url=https://www.bikeberry.com/
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.10/bin/scrapy", line 8, in <module>
    sys.exit(execute())
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/scrapy/cmdline.py", line 144, in execute
    cmd.crawler_process = CrawlerProcess(settings)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/scrapy/crawler.py", line 290, in __init__
    super().__init__(settings)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/scrapy/crawler.py", line 167, in __init__
    self.spider_loader = self._get_spider_loader(settings)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/scrapy/crawler.py", line 161, in _get_spider_loader
    return loader_cls.from_settings(settings.frozencopy())
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/scrapy/spiderloader.py", line 67, in from_settings
    return cls(settings)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/scrapy/spiderloader.py", line 24, in __init__
    self._load_all_spiders()
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/scrapy/spiderloader.py", line 51, in _load_all_spiders
    for module in walk_modules(name):
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/scrapy/utils/misc.py", line 88, in walk_modules
    submod = import_module(fullpath)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/Users/judsonlmoore/Documents/GitHub/shopify-spy/shopify_spy/spiders/shopify.py", line 6, in <module>
    import nested_lookup as nl
ModuleNotFoundError: No module named 'nested_lookup'

Are there any dependencies or other changes I need to make to the project files? Perhaps all of the example.com urls need replacing in tests.py and shopify.py? But then if the URL is updated in those files, why is the URL needed in the run command?

Any hints you've got for me are greatly appreciated!

How can I see stock level?

Is stock level available for all stores? For the ones I have tried in the result all I see it 'inventory_management': 'shopify'

Thanks for the cool library and the help!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.