Git Product home page Git Product logo

douyin_spider's Issues

自己写了一点点 暂时测试可用

在下面url加了两个参数_signatur和dytk,可以在网页内xhr中获取,思路是多次循环失败直到成功继续下一步。

#code:utf-8
import requests
from bs4 import BeautifulSoup
import json
session = requests.session()
headers={'User-Agent':'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'}#增加ua,不见得管用。。。

#保存url的文件名
filename = "urls.txt"
c = 0
def start(userid,count):
#一次请求最多能获取到的url数
maxCount = 35
#计算出需要发送多少次请求(向上取整)
page = int((count + maxCount - 1) / maxCount)
#初始游标为0
max_cursor = 0
for i in range(0,page):
print ('此时count为:',count)
print ('当前游标为:',max_cursor)
#如果需获取的视频数大于最大能获取的数,则传入maxCount,并减小count的值
if (count > maxCount):
max_cursor = download(userid,maxCount,max_cursor)
count = count - maxCount
#最后count被减到小于maxCount的时候,传入count
else:
max_cursor = download(userid,count,max_cursor)

#参数:用户id,用于下载指定用户的收藏视频。count:下载数量。max_cursor:游标
def download(userid,count,max_cursor):
global c
url = 'https://www.douyin.com/aweme/v1/aweme/favorite/?user_id='+str(userid)+'&count='+str(count+1)+'&max_cursor='+str(max_cursor)+'&aid=1128&_signature=请注意这里!!!!!!!!&dytk=请注意这里!!!!!!!!'
print (url)
#get请求,并保存响应报文
resp = session.get(url,headers=headers)
print (resp)
#解析http报文
soup = BeautifulSoup(resp.text, 'html.parser')
print (soup)
#将字符串转为json
myjson = json.loads(str(soup))
while len(myjson['aweme_list'])==0:
resp = session.get(url,headers=headers)
print (resp)
#解析http报文
soup = BeautifulSoup(resp.text, 'html.parser')
print (soup)
#将字符串转为json
myjson = json.loads(str(soup))
print("!")

#获取游标,用于解析下一页视频
max_cursor = myjson['max_cursor']
with open(filename,"a+") as f:
    for i in range(0,count):
        try:
            #解析json数据
            video_url = myjson['aweme_list'][i]['video']['play_addr']['url_list'][0]
            #写入文件
            f.write(video_url+"\n")
        except:
            print("json第",c,"次解析时解析出错...")
        finally:
            c = c + 1
            print (video_url)

#关闭文件
f.close()
#返回游标
return max_cursor

if name == 'main':
#参数一:用户id,参数2:你想下载的视频个数
start(用户id,300)

用户id

这个用户id是 字母和数字 混合的吗?现在分享出来看不到纯数字的。。怎么处理呢?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.