Comments (7)
感谢反馈,能否提供出错的user_id,方便调试,谢谢
from weibo-crawler.
感谢反馈,能否提供出错的user_id,方便调试,谢谢
我也遇到这个问题了,user_id是1877809031
from weibo-crawler.
感谢反馈,能否提供出错的user_id,方便调试,谢谢
我也遇到这个问题了,user_id是1877809031
还有这个user_id:2214855827,爬到第7页也开始出现这个问题
from weibo-crawler.
感谢反馈,能否提供出错的user_id,方便调试,谢谢
我也遇到这个问题了,user_id是1877809031
还有这个user_id:2214855827,爬到第7页也开始出现这个问题
还有1650713582
from weibo-crawler.
@469698742 @Nathern001
我测试了下,除了最后一个都可以爬下来,感觉是爬的太快被限制了。最后一个显示有几千条微博,主页却没有一条微博,感觉是博主自己作了限制。
解决上面的问题,大概有两种,一种是加cookie,另一种就是减慢爬取速度。如果加了cookie还有问题,应该是被限制了,过一段时间限制会自动解除。
减慢速度需要修改get_pages方法中的如下代码:
if page - page1 == random_pages and page < page_count:
sleep(random.randint(6, 10))
page1 = page
random_pages = random.randint(1, 5)
代码的意思是程序每爬1到5页,随机暂停6到10秒,这是程序的默认值。因为要减速,可以加快暂停的频率,如改成1到3页,也可以增加每次的暂停时间,如10到15秒。可以根据自己的需求改,爬的越慢,被限制的几率就越小,但是速度就将下来了,需要自己权衡利弊。
from weibo-crawler.
我个人需求是爬用户,因此设置了0天发现有的用户没发微博就爬得太快了,因此我把self.get_pages()注释掉并加了暂停,就没有被限制了
def start(self):
"""运行爬虫"""
try:
for user_id in self.user_id_list:
self.initialize_info(user_id)
self.get_user_info()
#self.get_pages()
print(u'信息抓取完毕')
print('*' * 100)
sleep(random.randint(6, 10))
from weibo-crawler.
@469698742
感谢反馈,很有参考价值。
from weibo-crawler.
Related Issues (20)
- 你好,我想下载所有微博正文,该怎么设置呢? HOT 3
- docker镜像 HOT 1
- 某行是一条独立换行内容的时候这个换行最终读取到MYSQL会被省略掉
- 微博内容(Weibo.text)最大长度报错问题 HOT 1
- 求助-爬微博进度100%,但是数量明显跟实际数量不一致有可能是何原因? HOT 3
- 如何爬取微博正文时展开全文爬取到完整内容 HOT 1
- COOKIES没有失效,但是Progress: 19%就结束了,可能是什么原因? HOT 3
- 当单个微博图片超过9张时,weibo-crawler只能下载9张 HOT 2
- 爬取的图片名字里有一个奇怪的T HOT 1
- “检测cookie是否有效”的功能失效 HOT 2
- 怎么爬取用户的ip归属地呢? HOT 7
- 请问下载的评论在哪 HOT 1
- 转发和评论好像出现了奇怪的问题 HOT 1
- csv中抓取的用户头像链接有时候会失效怎么解决? HOT 1
- sqlite储存数据非常大 HOT 1
- Docker 定时跑时,图片和视频是否是反复下载?
- since_date 格式不正确,请确认配置是否正确 HOT 6
- 被ban了 HOT 1
- 无法爬取地区、学校、生日相关信息 HOT 3
- 微博爬取截止时间 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from weibo-crawler.