marcus_scrapper's Introduction

Start Here

Installing Needed Packages This particular web scrapy is been powered by "scrapy", an open-source web crawling framework To implement this web scrapper install it into your environment by using the command below

   pip install scrapy

To Scrap the Specific Pages Following web scraping best practices, different spiders have been created to meet the needs of different categories with regards the peculiarities of their web pages and pagination of each category...

The scrapper was used on an online store to scrape prices

Example Tutorial to scrape information from the boots category

1.cd into the project(Getting into the file location of the marcus scrapper in your terminal)

run "scrapy crawl marcus_boots" in the terminal, to crawl through the specified category in
the spider
And also to output in json run "scrapy crawl marcus_boots -o boots.json" Or for csv format run "scrapy crawl marcus_boots -o boots.csv"

P.S. run "scrapy crawl YOUR_SPECIFIED_CATEGORY"

The data that will be crawled includes 1.shoeType(e.g boots) 2.shoeBrand 3.shoeName 4.price(Original price of shoe) 5.salePrice(Discounted price of some specific shoes)

30th of July, 2020 update 2 new files have been added to the project for review(The data scraped by marcus_boots.py was used as an example(boots.csv)) format.py The first is the format.py to format whatever out we get from the crawler (the result is an output.csv file)

Final_Format.ipynb This converts the formatted file into a table that can be used for analysis An example is already showed in the "Final_Format" notebook

Recommend Projects

victorojewale / marcus_scrapper Goto Github PK

marcus_scrapper's Introduction

Start Here

Example Tutorial to scrape information from the boots category

marcus_scrapper's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent