- MySQL/MariaDB
- Node.js
- Python2.7
- mysql-python module
- scrapy module
- scrapy-redis module
- requets module
The required tools is almost ready in project.
- install redis:
cd redis-4.0.1
and just runmake
- install NiuTransServer(For the sake of insurance, you'd better run the command below in a new account):
cd NiuTransServer/service/
sh ./install.sh
source ~/.bashrc
- install node requires:
cd Front
npm install # I recommend you using cnpm, it would be faster
To setup ES, just run:
bash ./es_index.sh delete
bash ./es_index.sh create
You should create a MySQL database YLS
, and then load the MySql Tables:
mysql -u<YOUR-USER-NAME> -p<YOUR-PASSWORD> YLS < db.sql
And before start the whole project, in config.py
, you can customize the Redis
, MySQL
, NiuTransServer
and ES
with their hosts and ports, etc.
- start elasticsearch: run
bash ./start_es.sh
- start redis: run
bash ./start_redis.sh
- start spider: run
bash ./start_spider.sh
- start NiuTransServer: run
bash ./start_trans.sh
- start front search page: run
bash ./start_search.sh
- start ES daemon to automatic add data: run
bash ./es_daemon.sh
- enable Timer to clear outdated fingerprint every day at 00:00: use
crontab
and run/timer.sh
script at the frequency you want - enable incrementally crawl the web page: also use
crontab
and run therun_client.sh
at the time you want, noticed that you can add the newstart_urls
inrecursive_crawler/health_websites_ch.xlsx
.