wjllp / douyin_spider Goto Github PK
View Code? Open in Web Editor NEW批量下载收藏的抖音短视频
批量下载收藏的抖音短视频
大神:这个代码我运行好像获取不到数据了
test
在下面url加了两个参数_signatur和dytk,可以在网页内xhr中获取,思路是多次循环失败直到成功继续下一步。
#code:utf-8
import requests
from bs4 import BeautifulSoup
import json
session = requests.session()
headers={'User-Agent':'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}#增加ua,不见得管用。。。
#保存url的文件名
filename = "urls.txt"
c = 0
def start(userid,count):
#一次请求最多能获取到的url数
maxCount = 35
#计算出需要发送多少次请求(向上取整)
page = int((count + maxCount - 1) / maxCount)
#初始游标为0
max_cursor = 0
for i in range(0,page):
print ('此时count为:',count)
print ('当前游标为:',max_cursor)
#如果需获取的视频数大于最大能获取的数,则传入maxCount,并减小count的值
if (count > maxCount):
max_cursor = download(userid,maxCount,max_cursor)
count = count - maxCount
#最后count被减到小于maxCount的时候,传入count
else:
max_cursor = download(userid,count,max_cursor)
#参数:用户id,用于下载指定用户的收藏视频。count:下载数量。max_cursor:游标
def download(userid,count,max_cursor):
global c
url = 'https://www.douyin.com/aweme/v1/aweme/favorite/?user_id='+str(userid)+'&count='+str(count+1)+'&max_cursor='+str(max_cursor)+'&aid=1128&_signature=请注意这里!!!!!!!!&dytk=请注意这里!!!!!!!!'
print (url)
#get请求,并保存响应报文
resp = session.get(url,headers=headers)
print (resp)
#解析http报文
soup = BeautifulSoup(resp.text, 'html.parser')
print (soup)
#将字符串转为json
myjson = json.loads(str(soup))
while len(myjson['aweme_list'])==0:
resp = session.get(url,headers=headers)
print (resp)
#解析http报文
soup = BeautifulSoup(resp.text, 'html.parser')
print (soup)
#将字符串转为json
myjson = json.loads(str(soup))
print("!")
#获取游标,用于解析下一页视频
max_cursor = myjson['max_cursor']
with open(filename,"a+") as f:
for i in range(0,count):
try:
#解析json数据
video_url = myjson['aweme_list'][i]['video']['play_addr']['url_list'][0]
#写入文件
f.write(video_url+"\n")
except:
print("json第",c,"次解析时解析出错...")
finally:
c = c + 1
print (video_url)
#关闭文件
f.close()
#返回游标
return max_cursor
if name == 'main':
#参数一:用户id,参数2:你想下载的视频个数
start(用户id,300)
在Windows的命令行里执行sudo chmod +x download.sh命令,提示错误。。
这个用户id是 字母和数字 混合的吗?现在分享出来看不到纯数字的。。怎么处理呢?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.