Git Product home page Git Product logo

proxypool's Introduction

proxypool's People

Contributors

germey avatar jimcurrywang avatar tommyzihao avatar zenghongtu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

proxypool's Issues

单个网站提取太多代理ip的时候会报错:Async Error

单个网站提取太多代理ip的时候会报错:Async Error,比如一个页面有200+ ip,我多提取几页,测试可用性的时候就会报错

Getting 119.135.185.99:9999 from crawl_xxxx
Getting 60.160.128.10:9797 from crawl_xxxx
Getting 27.38.138.165:8118 from crawl_xxxx
ValidityTester is working
Async Error

proxypool.error.PoolEmptyError: 'The proxy pool is empty'

每次代理取走一个值,redis删除一个值,当取比存更快时,redis中键proxies里没有值时,键proxies会被删除,这时程序找不到proxies键也不会再存值到redis中,导致无限死循环在Refreshing ip,Waiting for adding,需要重启。能不能修复一下这个bug

why the same codes runs error in Mac os or ubuntu 16.04 but OK in Win10

The error message is as follows, anybody could help me?

Ip processing running

  • Serving Flask app "proxypool.api" (lazy loading)
  • Environment: production
    WARNING: Do not use the development server in a production environment.
    Use a production WSGI server instead.
  • Debug mode: off
    Refreshing ip
  • Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
    Waiting for adding
    PoolAdder is working
    Callback crawl_ip181
    Process Process-1:
    Process Process-2:
    Traceback (most recent call last):
    File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
    File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
    File "/Users/wangyiran/pythonProject/ProxyPool-master/proxypool/schedule.py", line 112, in valid_proxy
    time.sleep(cycle)
    KeyboardInterrupt
    Traceback (most recent call last):
    File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
    File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
    File "/Users/wangyiran/pythonProject/ProxyPool-master/proxypool/schedule.py", line 130, in check_pool
    adder.add_to_queue()
    File "/Users/wangyiran/pythonProject/ProxyPool-master/proxypool/schedule.py", line 87, in add_to_queue
    raw_proxies = self._crawler.get_raw_proxies(callback)
    File "/Users/wangyiran/pythonProject/ProxyPool-master/proxypool/getter.py", line 28, in get_raw_proxies
    for proxy in eval("self.{}()".format(callback)):
    File "/Users/wangyiran/pythonProject/ProxyPool-master/proxypool/getter.py", line 35, in crawl_ip181
    html = get_page(start_url)
    File "/Users/wangyiran/pythonProject/ProxyPool-master/proxypool/utils.py", line 10, in get_page
    ua = UserAgent()
    File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/fake_useragent/fake.py", line 69, in init
    self.load()
    File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/fake_useragent/fake.py", line 78, in load
    verify_ssl=self.verify_ssl,
    File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/fake_useragent/utils.py", line 250, in load_cached
    update(path, use_cache_server=use_cache_server, verify_ssl=verify_ssl)
    File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/fake_useragent/utils.py", line 245, in update
    write(path, load(use_cache_server=use_cache_server, verify_ssl=verify_ssl))
    File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/fake_useragent/utils.py", line 154, in load
    for item in get_browsers(verify_ssl=verify_ssl):
    File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/fake_useragent/utils.py", line 97, in get_browsers
    html = get(settings.BROWSERS_STATS_PAGE, verify_ssl=verify_ssl)
    File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/fake_useragent/utils.py", line 67, in get
    context=context,
    File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
    File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 525, in open
    response = self._open(req, data)
    File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 543, in _open
    '_open', req)
    File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
    File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1360, in https_open
    context=self._context, check_hostname=self._check_hostname)
    File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1317, in do_open
    encode_chunked=req.has_header('Transfer-encoding'))
    File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1229, in request
    self._send_request(method, url, body, headers, encode_chunked)
    File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1275, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
    File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1224, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
    File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1016, in _send_output
    self.send(msg)
    File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 956, in send
    self.connect()
    File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1384, in connect
    super().connect()
    File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 928, in connect
    (self.host,self.port), self.timeout, self.source_address)
    File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/socket.py", line 716, in create_connection
    sock.connect(sa)

运行报错

Traceback (most recent call last):
File "D:\python\lib\multiprocessing\process.py", line 297, in _bootstrap
self.run()
File "D:\python\lib\multiprocessing\process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\Administrator\Desktop\爬虫学习\ProxyPool-master\proxypool\scheduler.py", line 28, in schedule_getter
getter.run()
File "C:\Users\Administrator\Desktop\爬虫学习\ProxyPool-master\proxypool\getter.py", line 30, in run
self.redis.add(proxy)
File "C:\Users\Administrator\Desktop\爬虫学习\ProxyPool-master\proxypool\db.py", line 30, in add
return self.db.zadd(REDIS_KEY, score, proxy)
File "D:\python\lib\site-packages\redis\client.py", line 2388, in zadd
for pair in iteritems(mapping):
File "D:\python\lib\site-packages\redis_compat.py", line 110, in iteritems
return iter(x.items())
AttributeError: 'int' object has no attribute 'items'

元类好复杂 如果不写元类行不行

比如 直接调用类名.__dict__ 也能查到类的方法。像大佬你这样命名好,需要调用的地方也能返回对应的回调函数吧 求指导。实在没精力弄懂元类

运行源码出现type error是什么意思哦

File "C:\Users\mcx\Downloads\ProxyPool-master\proxypool\getter.py", line 51, in crawl_kuaidaili
re_ip_adress = ip_adress.findall(html)
TypeError: expected string or bytes-like object
请问怎么处理呢!麻烦大佬了!

ClientProxyConnectionError

I try to run the code , but it shows aiohttp.client_exceptions.ClientProxyConnectionError.

Besides it seems to lose connection to class FreeProxyGetter - def crawl_daili66 and def crawl_haoip

Have any advice to solve aiohttp Error?

No module named 'aiohttp.errors'

我下载的是视频里的那个版本5d363ee,这行报错
from aiohttp.errors import ProxyConnectionError
ModuleNotFoundError: No module named 'aiohttp.errors'
本地的aiohttp版本是3.1.0,是我导入的版本不对么?

首次运行报错AttributeError: 'int' object has no attribute 'items'

正在抓取 http://www.ip3366.net/?stype=1&page=3
抓取成功 http://www.ip3366.net/?stype=1&page=3 200
成功获取到代理 221.126.249.99:8080
成功获取到代理 115.225.127.235:9999
成功获取到代理 182.146.253.173:8118
成功获取到代理 182.88.14.46:8123
成功获取到代理 103.79.228.230:43138
成功获取到代理 220.175.182.121:9999
成功获取到代理 113.13.160.105:9999
成功获取到代理 110.52.235.160:9999
成功获取到代理 115.46.96.169:8123
成功获取到代理 121.31.156.243:8123
Traceback (most recent call last):
File "D:\anacoda\lib\multiprocessing\process.py", line 258, in _bootstrap
self.run()
File "D:\anacoda\lib\multiprocessing\process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "F:\python\ProxyPool-master\ProxyPool-master\proxypool\scheduler.py", line 28, in schedule_getter
getter.run()
File "F:\python\ProxyPool-master\ProxyPool-master\proxypool\getter.py", line 30, in run
self.redis.add(proxy)
File "F:\python\ProxyPool-master\ProxyPool-master\proxypool\db.py", line 30, in add
return self.db.zadd(REDIS_KEY, score, proxy)
File "D:\anacoda\lib\site-packages\redis\client.py", line 2263, in zadd
for pair in iteritems(mapping):
File "D:\anacoda\lib\site-packages\redis_compat.py", line 123, in iteritems
return iter(x.items())

本地连接拒绝连接请求

能够在IDE中返回并输出代理连接,但在浏览器中输入127.0.0.1:5000/get时提示拒绝访问此网站,求教...

redis.exceptions.ResponseError: WRONGTYPE Operation against a key holding the wr ong kind of value

代理池开始运行
Process Process-2:
Traceback (most recent call last):
File "C:\Anaconda3\lib\multiprocessing\process.py", line 297, in _bootstrap
self.run()
File "C:\Anaconda3\lib\multiprocessing\process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "C:\proxypool\scheduler.py", line 28, in schedule_getter
getter.run()
File "C:\proxypool\getter.py", line 23, in run
if not self.is_over_threshold():
File "C:\proxypool\getter.py", line 16, in is_over_threshold
if self.redis.count() >= POOL_UPPER_THRESHOLD:
File "C:\proxypool\db.py", line 83, in count
return self.db.zcard(REDIS_KEY)
File "C:\Anaconda3\lib\site-packages\redis\client.py", line 1701, in zcard
return self.execute_command('ZCARD', name)

  • Serving Flask app "proxypool.api" (lazy loading)
    File "C:\Anaconda3\lib\site-packages\redis\client.py", line 668, in execute_command
    return self.parse_response(connection, command_name, **options)
    File "C:\Anaconda3\lib\site-packages\redis\client.py", line 680, in parse_response
    response = connection.read_response()
    File "C:\Anaconda3\lib\site-packages\redis\connection.py", line 629, in read_response
    raise response
    redis.exceptions.ResponseError: WRONGTYPE Operation against a key holding the wrong kind of value
    开始抓取代理
    获取器开始执行
  • Environment: production
    WARNING: Do not use the development server in a production environment.
    Use a production WSGI server instead.
  • Debug mode: off
  • Running on http://127.0.0.1:5555/ (Press CTRL+C to quit)

Could you help me with this?

Every time I run run.py, the followings show up after a few seconds and the program just stops.

Process Process-2:
Traceback (most recent call last):
File "D:\Program Files (x86)\Python\Python36\lib\multiprocessing\process.py", line 258, in _bootstrap
self.run()
File "D:\Program Files (x86)\Python\Python36\lib\multiprocessing\process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "D:\LEARNING\ProxyPool-master\proxypool\scheduler.py", line 28, in schedule_getter
getter.run()
File "D:\LEARNING\ProxyPool-master\proxypool\getter.py", line 30, in run
self.redis.add(proxy)
File "D:\LEARNING\ProxyPool-master\proxypool\db.py", line 30, in add
return self.db.zadd(REDIS_KEY, score, proxy)
File "D:\Program Files (x86)\Python\Python36\lib\site-packages\redis\client.py", line 2263, in zadd
for pair in iteritems(mapping):
File "D:\Program Files (x86)\Python\Python36\lib\site-packages\redis_compat.py", line 123, in iteritems
return iter(x.items())
AttributeError: 'int' object has no attribute 'items'

运行不了

Traceback (most recent call last):
File "run.py", line 2, in
from proxypool.schedule import Schedule
File "E:\soft\Python\untitled\2\ProxyPool\proxypool\schedule.py", line 11, in
from proxypool.getter import FreeProxyGetter
File "E:\soft\Python\untitled\2\ProxyPool\proxypool\getter.py", line 1, in
from .utils import get_page
File "E:\soft\Python\untitled\2\ProxyPool\proxypool\utils.py", line 5, in
from fake_useragent import UserAgent,FakeUserAgentError
ModuleNotFoundError: No module named 'fake_useragent'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.