Comments (2)
感谢建议,我考虑下
from weibo-crawler.
非常好的建议,已经添加写入json文件操作了,只需要在config.json文件的"write_mode"添加json参数就可以。结果json中包含用户信息和微博信息,每次爬取,用户信息和以前爬过的微博都会自动更新,没爬过的会添加到列表里,以迪丽热巴2019年12月27日到今天发的5条微博为例,格式如下:
{
"user": {
"id": "1669879400",
"screen_name": "Dear-迪丽热巴",
"gender": "f",
"statuses_count": 1085,
"followers_count": 65585238,
"follow_count": 248,
"description": "一只喜欢默默表演的小透明。工作联系[email protected] 🍒",
"profile_url": "https://m.weibo.cn/u/1669879400?uid=1669879400&luicode=10000011&lfid=1005051669879400",
"profile_image_url": "https://tvax1.sinaimg.cn/crop.0.0.996.996.180/63885668ly8fjf57kfmgfj20ro0ro0u7.jpg?KID=imgbed,tva&Expires=1578329741&ssig=3jZYwOBVPM",
"avatar_hd": "https://wx1.sinaimg.cn/orj480/63885668ly8fjf57kfmgfj20ro0ro0u7.jpg",
"urank": 44,
"mbrank": 7,
"verified": true,
"verified_type": 0,
"verified_reason": "嘉行传媒签约演员 "
},
"weibo": [
{
"user_id": 1669879400,
"screen_name": "Dear-迪丽热巴",
"id": 4456240593602073,
"bid": "InB4Df73X",
"text": "#happyNEOyear#都到了2020,还不换点新pose配新装[來] 穿上@adidasneo 迪士尼联名款,让#生来好动#的我们一起玩“新”大发、自拍不重样🤳网页链接 adidasneo的微博视频 ",
"pics": "",
"video_url": "http://f.video.weibocdn.com/001oZ3FTlx07zPTrZB4A01041200jkYm0E010.mp4?label=mp4_720p&template=1280x720.25.0&trans_finger=1f0da16358befad33323e3a1b7f95fc9&Expires=1578322542&ssig=%2F5ZzIlEJnl&KID=unistore,video",
"location": "",
"created_at": "2020-01-02",
"source": "",
"attitudes_count": 262755,
"comments_count": 104770,
"reposts_count": 328001,
"topics": "happyNEOyear,生来好动",
"at_users": "adidasneo"
},
{
"user_id": 1669879400,
"screen_name": "Dear-迪丽热巴",
"id": 4455670746717994,
"bid": "InmfwsbEK",
"text": "#梦圆东方##追梦2020#离开2019~迈向新的一年,2020新年快乐~🎆 ",
"pics": "https://wx1.sinaimg.cn/large/63885668ly1gag7sm3gz5j22jf3t5kjn.jpg,https://wx4.sinaimg.cn/large/63885668ly1gag7sd7705j22jf3t5b2c.jpg,https://wx1.sinaimg.cn/large/63885668ly1gag7s8yy1vj22jf3t5e83.jpg,https://wx2.sinaimg.cn/large/63885668ly1gag7sga78yj22jf3t5npf.jpg,https://wx1.sinaimg.cn/large/63885668ly1gag7sv3uuzj21ke2ckkjo.jpg,https://wx3.sinaimg.cn/large/63885668ly1gag7srmhpjj21o72i81l1.jpg",
"video_url": "",
"location": "",
"created_at": "2019-12-31",
"source": "",
"attitudes_count": 448266,
"comments_count": 162745,
"reposts_count": 863550,
"topics": "梦圆东方,追梦2020",
"at_users": ""
},
{
"user_id": 1669879400,
"screen_name": "Dear-迪丽热巴",
"id": 4455290722175462,
"bid": "IncmA97W6",
"text": "2020就要来啦~你准备好了没?送大家一份新年魔法,祝愿大家新的一年,都能追随内心更加勇敢,做自己想做的事。跟我一起对准@安慕希希腊酸奶 扫2020,许下你的愿望,告诉我你的2020新期待!#2020一起安慕希# Dear-迪丽热巴的微博视频 ",
"pics": "",
"video_url": "http://f.video.weibocdn.com/0039swf4lx07zLH0CQha010412002TqN0E010.mp4?label=mp4_720p&template=1280x720.25.0&trans_finger=1f0da16358befad33323e3a1b7f95fc9&Expires=1578322504&ssig=t3oq%2FUw%2Fot&KID=unistore,video",
"location": "",
"created_at": "2019-12-30",
"source": "",
"attitudes_count": 307451,
"comments_count": 58193,
"reposts_count": 637412,
"topics": "2020一起安慕希",
"at_users": "安慕希希腊酸奶"
},
{
"user_id": 1669879400,
"screen_name": "Dear-迪丽热巴",
"id": 4454572602912349,
"bid": "ImTGkcdDn",
"text": "今天的#星光大赏# ",
"pics": "https://wx3.sinaimg.cn/large/63885668ly1gacppdn1nmj21yi2qp7wk.jpg,https://wx4.sinaimg.cn/large/63885668ly1gacpphkj5gj22ik3t0b2d.jpg,https://wx4.sinaimg.cn/large/63885668ly1gacppb4atej22yo4g04qr.jpg,https://wx2.sinaimg.cn/large/63885668ly1gacpn0eeyij22yo4g04qr.jpg",
"video_url": "",
"location": "",
"created_at": "2019-12-28",
"source": "",
"attitudes_count": 551894,
"comments_count": 182010,
"reposts_count": 1000000,
"topics": "星光大赏",
"at_users": ""
},
{
"user_id": 1669879400,
"screen_name": "Dear-迪丽热巴",
"id": 4454081098040623,
"bid": "ImGTzxJJt",
"text": "我最爱用的娇韵诗双萃精华穿上限量“金”装啦,希望阿丝儿们跟我一起在新的一年更美更年轻,喜笑颜开没有细纹困扰!限定新春礼盒还有祝福悄悄话,大家了解一下~",
"pics": "",
"video_url": "",
"location": "",
"created_at": "2019-12-27",
"source": "",
"attitudes_count": 190840,
"comments_count": 43523,
"reposts_count": 1000000,
"topics": "",
"at_users": "",
"retweet": {
"user_id": 1684832145,
"screen_name": "法国娇韵诗",
"id": 4454028484570123,
"bid": "ImFwIjaTF",
"text": "#点萃成金 年轻焕新# 将源自天然的植物力量,转化为滴滴珍贵如金的双萃精华。这份点萃成金的独到匠心,只为守护娇粉们的美丽而来。点击视频,与@Dear-迪丽热巴 一同邂逅新年限量版黄金双萃,以闪耀开运金,送上新春宠肌臻礼。 跟着迪迪选年货,还有双重新春惊喜,爱丽丝们看这里! 第一重参与微淘活动邀请好友关注娇韵诗天猫旗舰店,就有机会赢取限量款热巴新年礼盒,打开就能聆听仙女迪亲口送出的新春祝福哦!点击网页链接下单晒热巴同款黄金双萃,并且@法国娇韵诗,更有机会获得热巴亲笔签名的礼盒哦! 第二重转评说出新年希望娇韵诗为你解决的肌肤愿望,截止至1/10,小娇将从铁粉中抽取1位娇粉送出限量版热巴定制礼盒,抽取3位娇粉送出热巴明信片1张~ #迪丽热巴代言娇韵诗#养成同款御龄美肌,就从现在开始。法国娇韵诗的微博视频",
"pics": "",
"video_url": "http://f.video.weibocdn.com/003vQjnRlx07zFkxIMjS010412003bNx0E010.mp4?label=mp4_hd&template=852x480.25.0&trans_finger=62b30a3f061b162e421008955c73f536&Expires=1578322522&ssig=P3ozrNA3mv&KID=unistore,video",
"location": "",
"created_at": "2019-12-27",
"source": "微博 weibo.com",
"attitudes_count": 18389,
"comments_count": 3201,
"reposts_count": 1000000,
"topics": "点萃成金 年轻焕新,迪丽热巴代言娇韵诗",
"at_users": "Dear-迪丽热巴,法国娇韵诗"
}
}
]
}
from weibo-crawler.
Related Issues (20)
- 关于时间和位置信息的爬取问题 HOT 2
- 运行过程中出现KeyError: 'card_group'
- 运行过程中出现KeyError: 'card_group' HOT 7
- 希望增添 since_date 对时间的支持 HOT 1
- 调用weibo_crawler时group ‘Nonetype’ HOT 1
- 安装依赖遇到command not found怎么解决 HOT 1
- 抓取某一条微博的所有评论数据 HOT 1
- 运行时提示main文件里需要schedule_interval参数 HOT 1
- feature request: jsonc support HOT 1
- Aug 30 版本无法下载超过9张图的原创微博,第10张起都不下载 HOT 1
- 配置了数据库信息,但实际创建数据库连接报错 HOT 1
- 数据库表问题
- 数据库表问题 HOT 1
- 微博用户注册信息 HOT 1
- string indices must be integers, not 'str' HOT 3
- max_id_type错误 HOT 1
- 如果是长微博出现'全文'字样爬取不全 HOT 3
- 半年可见限制 HOT 1
- 可不可以把爬取到的评论直接以csv的格式显示,不放在sqlte数据库里面 HOT 1
- 批量下载时,中间有提示错误,但是下载还是继续,其他微博的下载也正常 HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from weibo-crawler.