This script have severals steps :
- Web crawl everals website (cf Managed Website below) with simple http request (like wget) or with more complexe request to be like a real human (use of selenium lib).
- Store in a python dict
- Store in elastic cluster in case the price has change (up or down)
- Create alerts in case of a degrease price of more than 20% by mail, mattermost or Discord. For now, Discord is for me the best and simplest one.
Requirements to run :
- [require] Docker service on a dedicated and always running VM/server
- [require] Elastic cluster to store price (and add config in config.yml)
- [optionnal] Discord webhook
- [optionnal] Mastodon account
- [optionnal] Mail account
- deporvillage.fr
- decathlon.fr
- cyclable.com
- probikeshop.fr
- bikester.fr
- culturevelo.com
- alltricks.fr
- bike24.fr
The main script that mainly manage the 4 phases describe in purpose part.
Permit to manage all variable that can vary like :
- Mastodon
- Discord
- List of web pages to survey
site.py
is the lib to manage crawling with the tag defined in site.yml
.
This lib is a try to be more generic as possible but need to adapt to specific website.
Permit to check if there is still data injected in Elastic Cluster from list config.yml
-> "tosurvey".
Dockerfile
and docker-compose.yml̀
permit to create a container with all package and python lib requirements.txt
.