Git Product home page Git Product logo

ants-go's Introduction

ants-go

open source, restful, distributed crawler engine

gitter

Join the chat at https://gitter.im/wcong/ants-go

comming up

  • Persistence
  • Dynamic Master

design of ants-go

ants

I wrote a crawler engine named ants in python base on scrapy. But sometimes, dynamic language is chaos. So I start to write it in a compile language.

scrapy

I design the crawler framework by imitating scrapy. such as downloader,scraper,and the way user write customize spider, but in a compile way

elasticsearch

I design my distributed architecture by imitating elasticsearch. it spire me to do a engine for distributed crawler

requirement

go get github.com/PuerkitoBio/goquery
go get github.com/go-sql-driver/mysql

install

go get github.com/wcong/ants-go
go install github.com/wcong/ants-go

run

cd bin
./ants-go

check cluster status

curl 'http://localhost:8200/cluster'

get all spiders

curl 'http://localhost:8200/spiders'

start a spider

curl 'http://localhost:8200/crawl?spider=spiderName'

cluster in one computer

to test cluster in one computer,you can run it from different port in different terminal

one node,use the default port tcp 8300 http 8200

cd bin
./ants-go

the other node set tcp port and http port

cd bin
./ants-go -tcp 9300 -http 9200

flags

there are some flags you can set,check out the help message

./ants-go -h
./ants-go -help

Customize spider

  1. go to spiders
  2. write your spiders follow the example deap_loop_spider.go or go to the spider page
  3. add you spider to spiderMap,follow the example in LoadAllSpiders in load_all_spider.go
  4. install again

ants-go's People

Contributors

gitter-badger avatar wcong avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.