A sample web crawling application.
- Java
- Selenium
- Jsoup
- HttpClient
- Maven
- Eclipse
This project contains three main modules: scheduler, fetcher, parser.
- schedule new jobs
- dispatch job objects
- common crawler
- multithread
- parse information from fetcher results
- storage
- Analyze AJAX links and parameters
- Use phantomjs to parse JS
- Use Selenium to simulate