configure a scrapy project with a json file, which describes the route to get new page link and the extractors for page content.
TODO:
- item pipeline automatic construction
- url deduplicate: for repeatedly runing, considering content update, distributed crawling system?
- schedule
- javascript render server
- paging
- http proxy fetcher