Light

wjllp / douyin_spider Goto Github PK

View Code? Open in Web Editor NEW

107.0 6.0 25.0 18 KB

批量下载收藏的抖音短视频

Python 89.47% Shell 10.53%

douyin_spider's Introduction

douyin_spider

批量下载收藏的抖音短视频

使用环境

Python 3.*

使用方法

下载本项目到电脑
运行pip3 install -r requirements.txt安装所需环境
修改douyin.py中main方法中start(args1, args2)中的两个参数，参数1表示你的id(注意这个id并不是app中的抖音号，获取用户id的方法是进入到任意一个人的主页，然后以链接的形式分享到其他软件中，在其链接中可看到用户id)，参数2表示想要下载的数量。
本地执行更改download.sh的权限，使其具有运行权限。本地执行sudo chmod +x download.sh
文件会保存在当前目录下的video的目录中。

douyin_spider's People

Contributors

Stargazers

Watchers

douyin_spider's Issues

这个是不是必须用Linux？

在Windows的命令行里执行sudo chmod +x download.sh命令，提示错误。。

自己写了一点点暂时测试可用

在下面url加了两个参数_signatur和dytk，可以在网页内xhr中获取，思路是多次循环失败直到成功继续下一步。

#code:utf-8
import requests
from bs4 import BeautifulSoup
import json
session = requests.session()
headers={'User-Agent':'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}#增加ua，不见得管用。。。

#保存url的文件名
filename = "urls.txt"
c = 0
def start(userid,count):
#一次请求最多能获取到的url数
maxCount = 35
#计算出需要发送多少次请求（向上取整）
page = int((count + maxCount - 1) / maxCount)
#初始游标为0
max_cursor = 0
for i in range(0,page):
print ('此时count为：',count)
print ('当前游标为：',max_cursor)
#如果需获取的视频数大于最大能获取的数，则传入maxCount，并减小count的值
if (count > maxCount):
max_cursor = download(userid,maxCount,max_cursor)
count = count - maxCount
#最后count被减到小于maxCount的时候，传入count
else:
max_cursor = download(userid,count,max_cursor)

#参数：用户id，用于下载指定用户的收藏视频。count：下载数量。max_cursor：游标
def download(userid,count,max_cursor):
global c
url = 'https://www.douyin.com/aweme/v1/aweme/favorite/?user_id='+str(userid)+'&count='+str(count+1)+'&max_cursor='+str(max_cursor)+'&aid=1128&_signature=请注意这里！！！！！！！！&dytk=请注意这里！！！！！！！！'
print (url)
#get请求，并保存响应报文
resp = session.get(url,headers=headers)
print (resp)
#解析http报文
soup = BeautifulSoup(resp.text, 'html.parser')
print (soup)
#将字符串转为json
myjson = json.loads(str(soup))
while len(myjson['aweme_list'])==0:
resp = session.get(url,headers=headers)
print (resp)
#解析http报文
soup = BeautifulSoup(resp.text, 'html.parser')
print (soup)
#将字符串转为json
myjson = json.loads(str(soup))
print("!")

#获取游标，用于解析下一页视频
max_cursor = myjson['max_cursor']
with open(filename,"a+") as f:
    for i in range(0,count):
        try:
            #解析json数据
            video_url = myjson['aweme_list'][i]['video']['play_addr']['url_list'][0]
            #写入文件
            f.write(video_url+"\n")
        except:
            print("json第",c,"次解析时解析出错...")
        finally:
            c = c + 1
            print (video_url)

#关闭文件
f.close()
#返回游标
return max_cursor

if name == 'main':
#参数一：用户id，参数2：你想下载的视频个数
start(用户id,300)

这个代码是不是没有用了

大神：这个代码我运行好像获取不到数据了

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

wjllp / douyin_spider Goto Github PK

douyin_spider's Introduction

douyin_spider

使用环境

使用方法

douyin_spider's People

Contributors

Stargazers

Watchers

Forkers

douyin_spider's Issues

这个是不是必须用Linux？

test

用户id

自己写了一点点暂时测试可用

这个代码是不是没有用了

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent