Comments (8)
我又试了用 tee 命令把 terminal 中的输出保存成 TXT 文件。但是结果如下:
python weiboSpider.py |tee -a weibozanshuo.txt
Traceback (most recent call last):
File "weiboSpider.py", line 42, in get_username
print(u"用户名: " + self.username)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)
Traceback (most recent call last):
File "weiboSpider.py", line 64, in get_user_info
print(u"微博数: " + str(self.weibo_num))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)
进度: 0%| | 0/1146 [00:00<?, ?it/s]Traceback (most recent call last):
File "weiboSpider.py", line 104, in get_original_weibo
sys.stdout.encoding, "ignore").decode(
TypeError: encode() argument 1 must be string, not None
Traceback (most recent call last):
File "weiboSpider.py", line 197, in get_weibo_place
print(u"微博位置: " + weibo_place)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)
Traceback (most recent call last):
File "weiboSpider.py", line 207, in get_publish_time
sys.stdout.encoding, "ignore").decode(sys.stdout.encoding)
TypeError: encode() argument 1 must be string, not None
Traceback (most recent call last):
File "weiboSpider.py", line 240, in get_publish_tool
sys.stdout.encoding, "ignore").decode(sys.stdout.encoding)
TypeError: encode() argument 1 must be string, not None
Traceback (most recent call last):
File "weiboSpider.py", line 289, in get_weibo_info
sys.stdout.encoding, "ignore").decode(sys.stdout.encoding)
TypeError: encode() argument 1 must be string, not None
Traceback (most recent call last):
File "weiboSpider.py", line 352, in write_txt
f.write(result.encode(sys.stdout.encoding))
TypeError: encode() argument 1 must be string, not None
Traceback (most recent call last):
File "weiboSpider.py", line 381, in main
print(u"用户名: " + wb.username)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)
('Error: ', UnicodeEncodeError('ascii', u'\u7528\u6237\u540d: \u54b1\u8bf4', 0, 3, 'ordinal not in range(128)'))
('Error: ', UnicodeEncodeError('ascii', u'\u5fae\u535a\u6570: 11301', 0, 3, 'ordinal not in range(128)'))
('Error: ', TypeError('encode() argument 1 must be string, not None',))
None
('Error: ', UnicodeEncodeError('ascii', u'\u5fae\u535a\u4f4d\u7f6e: \u65e0', 0, 4, 'ordinal not in range(128)'))
('Error: ', TypeError('encode() argument 1 must be string, not None',))
('Error: ', TypeError('encode() argument 1 must be string, not None',))
('Error: ', TypeError('encode() argument 1 must be string, not None',))
('Error: ', TypeError('encode() argument 1 must be string, not None',))
('Error: ', UnicodeEncodeError('ascii', u'\u4fe1\u606f\u6293\u53d6\u5b8c\u6bd5', 0, 6, 'ordinal not in range(128)'))
('Error: ', UnicodeEncodeError('ascii', u'\u7528\u6237\u540d: \u54b1\u8bf4', 0, 3, 'ordinal not in range(128)'))
我用的系统是 Ubuntu 18.04 系统语言是英文。
from weibospider.
from weibospider.
看起来是微博发布工具为None,在写文件之前出错,所以weibo没有保存。如果不需要“发布工具”,也可以去掉write_txt中的
+ u"发布工具: " + self.publish_tool[i - 1] + "\n\n"
,能否提供微博id测试下,谢谢
from weibospider.
@dataabc 谢谢你的回复。要爬微博ID是 1711243680。
comment掉你说的语句后,所得的反馈如下:
$python weiboSpider.py |tee -a weibozanshuo1.txt
Traceback (most recent call last):
File "weiboSpider.py", line 42, in get_username
print(u"用户名: " + self.username)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)
Traceback (most recent call last):
File "weiboSpider.py", line 64, in get_user_info
print(u"微博数: " + str(self.weibo_num))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)
进度: 0%| | 0/1147 [00:00<?, ?it/s]Traceback (most recent call last):
File "weiboSpider.py", line 104, in get_original_weibo
sys.stdout.encoding, "ignore").decode(
TypeError: encode() argument 1 must be string, not None
Traceback (most recent call last):
File "weiboSpider.py", line 197, in get_weibo_place
print(u"微博位置: " + weibo_place)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)
Traceback (most recent call last):
File "weiboSpider.py", line 207, in get_publish_time
sys.stdout.encoding, "ignore").decode(sys.stdout.encoding)
TypeError: encode() argument 1 must be string, not None
Traceback (most recent call last):
File "weiboSpider.py", line 240, in get_publish_tool
sys.stdout.encoding, "ignore").decode(sys.stdout.encoding)
TypeError: encode() argument 1 must be string, not None
Traceback (most recent call last):
File "weiboSpider.py", line 289, in get_weibo_info
sys.stdout.encoding, "ignore").decode(sys.stdout.encoding)
TypeError: encode() argument 1 must be string, not None
Traceback (most recent call last):
File "weiboSpider.py", line 352, in write_txt
f.write(result.encode(sys.stdout.encoding))
TypeError: encode() argument 1 must be string, not None
Traceback (most recent call last):
File "weiboSpider.py", line 381, in main
print(u"用户名: " + wb.username)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)
('Error: ', UnicodeEncodeError('ascii', u'\u7528\u6237\u540d: \u54b1\u8bf4', 0, 3, 'ordinal not in range(128)'))
('Error: ', UnicodeEncodeError('ascii', u'\u5fae\u535a\u6570: 11310', 0, 3, 'ordinal not in range(128)'))
('Error: ', TypeError('encode() argument 1 must be string, not None',))
None
('Error: ', UnicodeEncodeError('ascii', u'\u5fae\u535a\u4f4d\u7f6e: \u65e0', 0, 4, 'ordinal not in range(128)'))
('Error: ', TypeError('encode() argument 1 must be string, not None',))
('Error: ', TypeError('encode() argument 1 must be string, not None',))
('Error: ', TypeError('encode() argument 1 must be string, not None',))
('Error: ', TypeError('encode() argument 1 must be string, not None',))
('Error: ', UnicodeEncodeError('ascii', u'\u4fe1\u606f\u6293\u53d6\u5b8c\u6bd5', 0, 6, 'ordinal not in range(128)'))
('Error: ', UnicodeEncodeError('ascii', u'\u7528\u6237\u540d: \u54b1\u8bf4', 0, 3, 'ordinal not in range(128)'))
我在网上查的是说 str 和 uni 类型不能相加。要用 unicode()函数
但是我是小白,也不知道具体怎么改。
我现在尝试用script命令在纪录爬虫结果。
from weibospider.
似乎是跟我电脑设置有关,这是script出来的.txt文件开头的一部分,可以看到抓取的微博能正常显示出来,但是抓取微博前terminal中的一些语句在.txt文件中呈现乱码
weibospider�[00m$ python app.py�������������sudo gedit /etc/default/grub &��[K��[K��[K��[K��[K��[K��[K��[K��[K��[K��[K��[K��[K��[K��[K��[K��[K��[K��[K��[K��[K��[K��[K��[K��[K��[K��[K��[K��[K��[K�������python wei�bosp�i��[K��[K��[KSpider.py
用户名: 咱说
微博数: 11310
关注数: 236
粉丝数: 730601
进度: 0%| | 0/1147 [00:00<?, ?it/s]多年前给《笛卡尔的错误》写过一篇书评,也是本人迄今为止未能自我超越的一篇文章,据说影响了不少人。最近发现原文章的微博链接在手机上无法打开,今日稍作补充修订之后重新发表于此。重读这篇书评的时候,我想到两点,第一,最近自己在思考“从高端科普回归基础科普”,其实这篇书评可以视作,以基础科普的写法介绍了一个相当高端的科学观点。它除了篇幅很长之外,阅读门槛其实不高,很多基本概念我都做了解释,只要能顺着文章的逻辑流读下来,一定会有所收获。第二,这篇书评已经不止是一篇书评,它事实上是我把那一段时间所思考的学术问题,借着这本书的启发,进行了一次吐故纳新的整合。除去介绍这本书的核心议题,它还包含着来自其他学者的研究和观点,以及我个人的思考。不过,无论我这篇书评写得多么好,它仍然不能代替读者的思考,更不能代替读者的阅读。循着它的导引去读英文原著吧,去读十年前毛彩凤老师翻译的中文版吧。开卷有益。 心理在哪里
微博位置: 无
微博发布时间: 2018-03-23 21:10
微博发布工具: 微博 weibo.com
点赞数: 623
转发数: 657
评论数: 138
转发理由:你才是个笑话,你现在不搜索资料,猜一下**有多少残障人士?或者你告诉我“没有那么多”是多少?你看不到不代表他们不存在,你看到的少不意味着他们人数少。//@echoedinthewell:其实有尊重和遇到的时候给予方便就可以了,推广是一个笑话,再说,也没有那么多不方便的人
原始用户: 咱说
转发内容: 第一次知道国内的公交车有这设计,可见有关部门对这个功能的宣传是何其少,使用这个功能的残障人士何其少,以至于形同虚设。 原图
微博位置: 无
微博发布时间: 2019-04-22 14:50
微博发布工具: 无
点赞数: 34
转发数: 20
评论数: 24
好在导出的微博没有出现乱码。
from weibospider.
似乎找到出现cannot concatenate 'str' and 'NoneType' objects的原因了。我刚刚试了下,发现第一次出现是在爬了100多页以后。然后,又测试了几次,出现很多None,而且页数小于100,甚至出现了第一条微博为None的情况,怀疑是因为爬取速度过快且数量较多,账号被微博限制了,使很多应该爬取的信息变成了None,导致在组合信息时出现上述错误。
建议,减慢爬取速度。如每爬取几页sleep一段时间。get_weibo_info方法中的
for page in tqdm(range(1, page_num + 1), desc=u"进度"):
可以控制速度,每循环一次代表爬取一页,你可以做一下判断,如
from time import sleep
......
for page in tqdm(range(1, page_num + 1), desc=u"进度"):
if page % 5==0:
sleep(3)
......
表示每爬5页暂停3秒,具体应该多少页暂停你可以自己测试,也可以参考#8
from weibospider.
谢谢回复。不过我的号应该没有被微博限制,因为我只要不输出成 txt 格式,只在terminal中爬取,就不会报错。所以我用 script 把terminal中的爬取结果全部纪录并存成 txt 格式,成功爬完了万余条。
这个错误看来是我个人问题,我自己再试下就好,可以关闭 issue 啦。
from weibospider.
我又试了下爬另几个人的百余条微博,就没有上述问题,成功输出 .txt 文件了。看来是个例,不用担心。
from weibospider.
Related Issues (20)
- 你好 运行问题出现错误 HOT 6
- 为什么按照说明,第一次运行后没有生成config.json呢?之后自己配置了config.json运行程序也没有反应 HOT 7
- weiboSpider如何拉取视频 HOT 4
- require满足不了 HOT 5
- 可以提取评论内容吗 HOT 2
- 运行·获取的微博内容出现特殊字符不能写入·mysql数据库 HOT 3
- 关于微博页码获取 HOT 1
- 求助 HOT 1
- 如何去掉不想爬取的内容 HOT 1
- 运行程序无反应 HOT 2
- 不输入cookie只能爬取前两页的内容 HOT 3
- 系统中可能没有安装pymysql库,请先运行 pip install pymysql ,再运行程序 HOT 2
- 'NoneType' object has no attribute 'xpath'报错 HOT 5
- 爬取微博时只获取了前几页内容就自动停止了,提示'NoneType' object has no attribute 'xpath' HOT 3
- 更改page range后无法爬取微博 HOT 1
- 我是小白,想问一下我下载了代码zip之后怎么用啊? HOT 2
- 我是小白,想问一下我下载了代码zip之后怎么用啊? HOT 3
- 运行spider.py ,提示ImportError: attempted relative import with no known parent package,是为什么呢 HOT 2
- 请求商务推广合作 HOT 3
- cookie错误或已过期,请按照README中方法重新获取 HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from weibospider.