Git Product home page Git Product logo

scrapy_auto_proxy's Introduction

scrapy_auto_proxy

基于scrapy实现自动爬取代理并设置代理

请根据自己的需要,将/demo/middlewares/middlewares.py中calss ProxyMiddleware的两种代理方法开启或关闭.(注意只能选择一种代理的方法,并且当你选择从import中导入代理的方法时需要你拥有自己的代理服务器,并且在设置在/settings.py中)

介于互联网上的代理网站提供的免费代理可用比例不大,使用代理时每次会先检测代理的可用性,若不可用,则将之移除代理队列

请使用脚本维护爬取的代理的数据库,例如在crontab中根据自己的爬取频率到项目目录执行"scrapy crawl 360_proxy;scrapy crawl xici_proxy"

爬取的代理可以选择存储在项目根目录下的json文件中,也可以选择存储在数据库中,请根据需要在/demo/pipelines/db.py;/demo/middlewares/middlewares.py中更改配置

代理选择规则为最简单的random规则

附带了几个最简单的例子

爬取'http://wufazhuce.com/' 'one' 每日图片信息

爬取'http://ename.dict.cn/' 英文名信息

爬取国内主流直播平台的直播信息,包括斗鱼 全民 战旗 火猫, 虎牙 b站

爬取中英文亚马逊书城的图书信息

scrapy_auto_proxy's People

Contributors

xinghanggogogo avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.