The goal of this project is to build a distributed, decentralized crawler to scrape 1 billion web pages using a couple hundred dollars of commodity AWS hardware. The trick is using AWS spot instances, which allow compute on-demand with Amazon's idle resources. When demand is low, these spot instances can cost anywhere from 10-60% of their original price.
morristech / crawler-1 Goto Github PK
View Code? Open in Web Editor NEWThis project forked from nadrane/crawler