Shopify Spy

Shopify Spy is a simple but powerful Scrapy application for scraping Shopify websites. Its main feature is shopify_spider, a universal Shopify spider. The spider is designed to extract detailed data from any Shopify store, including high-value information like vendor names and inventory levels.

To find Shopify stores to scrape, try searching Google with the argument site:myshopify.com.

Forking

Shopify Spy is just a project built using the Scrapy framework. To use it, fork and/or clone the repository. Forking is recommended, since you might want to adjust the settings in shopify_spy/settings.py, and can fetch updates.

Usage

The spider can be used like any Scrapy spider, but you must provide it with an URL. Set your working directory to the project directory and execute one of the following commands.

Scrape a single Shopify store:

scrapy crawl shopify_spider -a url=https://www.example.com/

Scrape multiple Shopify stores at once using a text file with one URL per line:

scrapy crawl shopify_spider -a url_file=resources/urls.txt

Specify which items to scrape:

scrapy crawl shopify_spider -a url=https://www.example.com/ -a products=False -a collections=True

Arguments must always be preceded with the -a flag, as is standard for Scrapy. The results will be stored in a JSON lines file in /resources/shopify_spider.

Please refer to the Scrapy documentation for questions about adjusting the settings, more advanced usage, or the Scrapy framework in general.

Limitations

Attempting to scrape a large store may result in a temporary ban. This can be mitigated by configuring AutoThrottle, which is disabled by default.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change. Make sure to update the tests in tests.py and contracts in the spider.

License

MIT

Credits

Icon by Bartama Graphic.

Errors when executing run command

Hello @ndgigliotti! Great tool here and it promises to be something of great value to a project I am working on. However, I am not able to get it to run successfully. I've made a few basic Scrapy spiders but am not too familiar with the details. When I fork and run Shopify Spy with the url of a Shopify site, I get a bunch of errors and I am not sure how to resolve them.

The command I run:

scrapy crawl shopify_spider -a url=https://www.bikeberry.com/

Returns these errors:

judsonlmoore@Judsons-MacBook-Air shopify_spy % scrapy crawl shopify_spider -a url=https://www.bikeberry.com/
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.10/bin/scrapy", line 8, in <module>
    sys.exit(execute())
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/scrapy/cmdline.py", line 144, in execute
    cmd.crawler_process = CrawlerProcess(settings)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/scrapy/crawler.py", line 290, in __init__
    super().__init__(settings)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/scrapy/crawler.py", line 167, in __init__
    self.spider_loader = self._get_spider_loader(settings)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/scrapy/crawler.py", line 161, in _get_spider_loader
    return loader_cls.from_settings(settings.frozencopy())
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/scrapy/spiderloader.py", line 67, in from_settings
    return cls(settings)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/scrapy/spiderloader.py", line 24, in __init__
    self._load_all_spiders()
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/scrapy/spiderloader.py", line 51, in _load_all_spiders
    for module in walk_modules(name):
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/scrapy/utils/misc.py", line 88, in walk_modules
    submod = import_module(fullpath)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/Users/judsonlmoore/Documents/GitHub/shopify-spy/shopify_spy/spiders/shopify.py", line 6, in <module>
    import nested_lookup as nl
ModuleNotFoundError: No module named 'nested_lookup'

Are there any dependencies or other changes I need to make to the project files? Perhaps all of the example.com urls need replacing in tests.py and shopify.py? But then if the URL is updated in those files, why is the URL needed in the run command?

Any hints you've got for me are greatly appreciated!

ndgigliotti / shopify-spy Goto Github PK

shopify-spy's Introduction

Shopify Spy

Forking

Usage

Limitations

Contributing

License

Credits

shopify-spy's People

Contributors

Stargazers

Watchers

Forkers

shopify-spy's Issues

Errors when executing run command

Logging

How can I see stock level?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent