germey / proxypool Goto Github PK
View Code? Open in Web Editor NEWProxy Pool System
License: Apache License 2.0
Proxy Pool System
License: Apache License 2.0
单个网站提取太多代理ip的时候会报错:Async Error,比如一个页面有200+ ip,我多提取几页,测试可用性的时候就会报错
Getting 119.135.185.99:9999 from crawl_xxxx
Getting 60.160.128.10:9797 from crawl_xxxx
Getting 27.38.138.165:8118 from crawl_xxxx
ValidityTester is working
Async Error
还不是很懂Linux命令,是要用pkill关掉吗?谢谢。
谷歌发现这个报错指的是端口占用,但是我用lsof命令查询了只有redis-server和rdm占用了这个端口,为什么还是会报这个错呢
每次代理取走一个值,redis删除一个值,当取比存更快时,redis中键proxies里没有值时,键proxies会被删除,这时程序找不到proxies键也不会再存值到redis中,导致无限死循环在Refreshing ip,Waiting for adding,需要重启。能不能修复一下这个bug
What is the mining pool can use with iPhone
Internal Server Error
The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.
when I get the http://127.0.0.1:5000/get .It shows this.
but when I get http://127.0.0.1:5000, it's OK!
我觉得需要去重操作
The error message is as follows, anybody could help me?
Ip processing running
Traceback (most recent call last):
File "D:\python\lib\multiprocessing\process.py", line 297, in _bootstrap
self.run()
File "D:\python\lib\multiprocessing\process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\Administrator\Desktop\爬虫学习\ProxyPool-master\proxypool\scheduler.py", line 28, in schedule_getter
getter.run()
File "C:\Users\Administrator\Desktop\爬虫学习\ProxyPool-master\proxypool\getter.py", line 30, in run
self.redis.add(proxy)
File "C:\Users\Administrator\Desktop\爬虫学习\ProxyPool-master\proxypool\db.py", line 30, in add
return self.db.zadd(REDIS_KEY, score, proxy)
File "D:\python\lib\site-packages\redis\client.py", line 2388, in zadd
for pair in iteritems(mapping):
File "D:\python\lib\site-packages\redis_compat.py", line 110, in iteritems
return iter(x.items())
AttributeError: 'int' object has no attribute 'items'
代码里面的端口是5555,不是5000,还有获取代理的方法是http://localhost:5555/random,不是http://localhost:5555/get。
我是小白。
我把她架设到公网上,但是她的接口ip,我不知道如何设置,
在api里面设置?
大才哥,我是towbe
获取网页status_code不为200时get_page()没有返回值
get_page(url, options={}):
......
if response.status_code == 200:
return response.text
......
比如 直接调用类名.__dict__ 也能查到类的方法。像大佬你这样命名好,需要调用的地方也能返回对应的回调函数吧 求指导。实在没精力弄懂元类
ip181.com似乎不能访问了
File "C:\Users\mcx\Downloads\ProxyPool-master\proxypool\getter.py", line 51, in crawl_kuaidaili
re_ip_adress = ip_adress.findall(html)
TypeError: expected string or bytes-like object
请问怎么处理呢!麻烦大佬了!
I try to run the code , but it shows aiohttp.client_exceptions.ClientProxyConnectionError.
Besides it seems to lose connection to class FreeProxyGetter - def crawl_daili66 and def crawl_haoip
Have any advice to solve aiohttp Error?
没有这个模块,请问作者这个模块从哪里获得?
我下载的是视频里的那个版本5d363ee,这行报错
from aiohttp.errors import ProxyConnectionError
ModuleNotFoundError: No module named 'aiohttp.errors'
本地的aiohttp版本是3.1.0,是我导入的版本不对么?
安装了依赖以后,还缺一个 fake-useragent 需要手动安装
Crawling Failed http://www.kxdaili.com/ipList/1.html#ip
项目中好像少了这个fake_useragent'
正在抓取 http://www.ip3366.net/?stype=1&page=3
抓取成功 http://www.ip3366.net/?stype=1&page=3 200
成功获取到代理 221.126.249.99:8080
成功获取到代理 115.225.127.235:9999
成功获取到代理 182.146.253.173:8118
成功获取到代理 182.88.14.46:8123
成功获取到代理 103.79.228.230:43138
成功获取到代理 220.175.182.121:9999
成功获取到代理 113.13.160.105:9999
成功获取到代理 110.52.235.160:9999
成功获取到代理 115.46.96.169:8123
成功获取到代理 121.31.156.243:8123
Traceback (most recent call last):
File "D:\anacoda\lib\multiprocessing\process.py", line 258, in _bootstrap
self.run()
File "D:\anacoda\lib\multiprocessing\process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "F:\python\ProxyPool-master\ProxyPool-master\proxypool\scheduler.py", line 28, in schedule_getter
getter.run()
File "F:\python\ProxyPool-master\ProxyPool-master\proxypool\getter.py", line 30, in run
self.redis.add(proxy)
File "F:\python\ProxyPool-master\ProxyPool-master\proxypool\db.py", line 30, in add
return self.db.zadd(REDIS_KEY, score, proxy)
File "D:\anacoda\lib\site-packages\redis\client.py", line 2263, in zadd
for pair in iteritems(mapping):
File "D:\anacoda\lib\site-packages\redis_compat.py", line 123, in iteritems
return iter(x.items())
能够在IDE中返回并输出代理连接,但在浏览器中输入127.0.0.1:5000/get时提示拒绝访问此网站,求教...
代理池开始运行
Process Process-2:
Traceback (most recent call last):
File "C:\Anaconda3\lib\multiprocessing\process.py", line 297, in _bootstrap
self.run()
File "C:\Anaconda3\lib\multiprocessing\process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "C:\proxypool\scheduler.py", line 28, in schedule_getter
getter.run()
File "C:\proxypool\getter.py", line 23, in run
if not self.is_over_threshold():
File "C:\proxypool\getter.py", line 16, in is_over_threshold
if self.redis.count() >= POOL_UPPER_THRESHOLD:
File "C:\proxypool\db.py", line 83, in count
return self.db.zcard(REDIS_KEY)
File "C:\Anaconda3\lib\site-packages\redis\client.py", line 1701, in zcard
return self.execute_command('ZCARD', name)
Every time I run run.py, the followings show up after a few seconds and the program just stops.
Process Process-2:
Traceback (most recent call last):
File "D:\Program Files (x86)\Python\Python36\lib\multiprocessing\process.py", line 258, in _bootstrap
self.run()
File "D:\Program Files (x86)\Python\Python36\lib\multiprocessing\process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "D:\LEARNING\ProxyPool-master\proxypool\scheduler.py", line 28, in schedule_getter
getter.run()
File "D:\LEARNING\ProxyPool-master\proxypool\getter.py", line 30, in run
self.redis.add(proxy)
File "D:\LEARNING\ProxyPool-master\proxypool\db.py", line 30, in add
return self.db.zadd(REDIS_KEY, score, proxy)
File "D:\Program Files (x86)\Python\Python36\lib\site-packages\redis\client.py", line 2263, in zadd
for pair in iteritems(mapping):
File "D:\Program Files (x86)\Python\Python36\lib\site-packages\redis_compat.py", line 123, in iteritems
return iter(x.items())
AttributeError: 'int' object has no attribute 'items'
Traceback (most recent call last):
File "run.py", line 2, in
from proxypool.schedule import Schedule
File "E:\soft\Python\untitled\2\ProxyPool\proxypool\schedule.py", line 11, in
from proxypool.getter import FreeProxyGetter
File "E:\soft\Python\untitled\2\ProxyPool\proxypool\getter.py", line 1, in
from .utils import get_page
File "E:\soft\Python\untitled\2\ProxyPool\proxypool\utils.py", line 5, in
from fake_useragent import UserAgent,FakeUserAgentError
ModuleNotFoundError: No module named 'fake_useragent'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.