Git Product home page Git Product logo

pengchengspider's Introduction

YLS


Require

  • MySQL/MariaDB
  • Node.js
  • Python2.7
  • mysql-python module
  • scrapy module
  • scrapy-redis module
  • requets module

Build

The required tools is almost ready in project.

  • install redis: cd redis-4.0.1 and just run make
  • install NiuTransServer(For the sake of insurance, you'd better run the command below in a new account):
cd NiuTransServer/service/
sh ./install.sh
source ~/.bashrc
  • install node requires:
cd Front
npm install # I recommend you using cnpm, it would be faster

Setup

To setup ES, just run:

bash ./es_index.sh delete
bash ./es_index.sh create

You should create a MySQL database YLS, and then load the MySql Tables:

mysql -u<YOUR-USER-NAME> -p<YOUR-PASSWORD> YLS < db.sql

And before start the whole project, in config.py, you can customize the Redis, MySQL, NiuTransServer and ES with their hosts and ports, etc.

Run

  • start elasticsearch: run bash ./start_es.sh
  • start redis: run bash ./start_redis.sh
  • start spider: run bash ./start_spider.sh
  • start NiuTransServer: run bash ./start_trans.sh
  • start front search page: run bash ./start_search.sh
  • start ES daemon to automatic add data: run bash ./es_daemon.sh
  • enable Timer to clear outdated fingerprint every day at 00:00: use crontab and run /timer.sh script at the frequency you want
  • enable incrementally crawl the web page: also use crontab and run the run_client.sh at the time you want, noticed that you can add the new start_urls in recursive_crawler/health_websites_ch.xlsx.

Reference

Good Luck!

pengchengspider's People

Contributors

emanuelgi avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.