If you have any questions, please contact @Anfernee Chang
- GTalk: [email protected]
- Skype: anfernee-chang
- You MUST read carefully - Product Database Schema v3.5 (last updated 02/07/13)
- We define the product item in the items.py, please follow it.
- Please run your spider and make sure it passes scraper/pipelines/validation.py before sending it.
- Please make sure the spider doesn't raise any errors with 'scrapy crawl spider' before sending it.
- Any spiders sent without checking will result in 'penalties!'
- Please follow PEP8 style.
- Please use 'pasre_product' to be the parsing method for A product and pass no meta in if you can.
- Please add node's XPath in the spider class variable 'xpaths' dict. We will use these information to check your spider.
- Please raises ValueError('XXX!') if the page have no data for the XPath to any Required Fields.
- Please use 'copy.deepcopy' or 'new ProductItem()' to re-generate a item for each different product variation(colors etc.).
- Since we use Duplicate Filter to save the carwled url, please use 'dont_filter' carefully.
- To complete the job, we'd only be requiring the spiders/store.py file from you. Please send it by email.