Git Product home page Git Product logo

douyin_spider's Introduction

douyin_spider

批量下载收藏的抖音短视频

使用环境

Python 3.*

使用方法

  • 下载本项目到电脑
  • 运行pip3 install -r requirements.txt安装所需环境
  • 修改douyin.py中main方法中start(args1, args2)中的两个参数,参数1表示你的id(注意这个id并不是app中的抖音号,获取用户id的方法是进入到任意一个人的主页,然后以链接的形式分享到其他软件中,在其链接中可看到用户id),参数2表示想要下载的数量。
  • 本地执行更改download.sh的权限,使其具有运行权限。本地执行sudo chmod +x download.sh
  • 文件会保存在当前目录下的video的目录中。

douyin_spider's People

Contributors

guohui-ma avatar wjllp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

douyin_spider's Issues

用户id

这个用户id是 字母和数字 混合的吗?现在分享出来看不到纯数字的。。怎么处理呢?

自己写了一点点 暂时测试可用

在下面url加了两个参数_signatur和dytk,可以在网页内xhr中获取,思路是多次循环失败直到成功继续下一步。

#code:utf-8
import requests
from bs4 import BeautifulSoup
import json
session = requests.session()
headers={'User-Agent':'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}#增加ua,不见得管用。。。

#保存url的文件名
filename = "urls.txt"
c = 0
def start(userid,count):
#一次请求最多能获取到的url数
maxCount = 35
#计算出需要发送多少次请求(向上取整)
page = int((count + maxCount - 1) / maxCount)
#初始游标为0
max_cursor = 0
for i in range(0,page):
print ('此时count为:',count)
print ('当前游标为:',max_cursor)
#如果需获取的视频数大于最大能获取的数,则传入maxCount,并减小count的值
if (count > maxCount):
max_cursor = download(userid,maxCount,max_cursor)
count = count - maxCount
#最后count被减到小于maxCount的时候,传入count
else:
max_cursor = download(userid,count,max_cursor)

#参数:用户id,用于下载指定用户的收藏视频。count:下载数量。max_cursor:游标
def download(userid,count,max_cursor):
global c
url = 'https://www.douyin.com/aweme/v1/aweme/favorite/?user_id='+str(userid)+'&count='+str(count+1)+'&max_cursor='+str(max_cursor)+'&aid=1128&_signature=请注意这里!!!!!!!!&dytk=请注意这里!!!!!!!!'
print (url)
#get请求,并保存响应报文
resp = session.get(url,headers=headers)
print (resp)
#解析http报文
soup = BeautifulSoup(resp.text, 'html.parser')
print (soup)
#将字符串转为json
myjson = json.loads(str(soup))
while len(myjson['aweme_list'])==0:
resp = session.get(url,headers=headers)
print (resp)
#解析http报文
soup = BeautifulSoup(resp.text, 'html.parser')
print (soup)
#将字符串转为json
myjson = json.loads(str(soup))
print("!")

#获取游标,用于解析下一页视频
max_cursor = myjson['max_cursor']
with open(filename,"a+") as f:
    for i in range(0,count):
        try:
            #解析json数据
            video_url = myjson['aweme_list'][i]['video']['play_addr']['url_list'][0]
            #写入文件
            f.write(video_url+"\n")
        except:
            print("json第",c,"次解析时解析出错...")
        finally:
            c = c + 1
            print (video_url)

#关闭文件
f.close()
#返回游标
return max_cursor

if name == 'main':
#参数一:用户id,参数2:你想下载的视频个数
start(用户id,300)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.