Git Product home page Git Product logo

sinaspider's Introduction

##Sina_Spider1: 《新浪微博爬虫分享(一天可抓取 1300 万条数据)## ##Sina_Spider2: 《新浪微博分布式爬虫分享## ##Sina_Spider3: 《新浪微博爬虫分享(2016年12月01日更新)##

Sina_Spider1为单机版本。

Sina_Spider2在Sina_Spider1的基础上基于scrapy_redis模块实现分布式。

Sina_Spider3增加了Cookie池的维护,优化了种子队列和去重队列。


三个版本的详细介绍请看各自的博客。 遇到什么问题请尽量留言,方便后来遇到同样问题的同学查看。也可加一下QQ交流群:微博爬虫交流群

如果需要数据可以邮件联系我([email protected]

sinaspider's People

Contributors

liuxingming avatar bone-ace avatar magic282 avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.