Git Product home page Git Product logo

xiyaowong / spiders Goto Github PK

View Code? Open in Web Editor NEW
622.0 21.0 209.0 3.67 MB

Python爬虫,返回一定格式的信息,下载,使用flask提供简易api。抖音无水印、皮皮虾、快手、网易云音乐、qq音乐、咪咕音乐、荔枝FM音频、知乎视频、最右语音、视频、微博......

License: MIT License

Python 99.88% Shell 0.12%
qqmusic 163music douyin kuaishou tudou lizhifm zhihu zuiyou music video

spiders's Introduction

新情况

这是很久没管的旧项目,代码质量和风格一言难尽,部分爬虫仍然可用。现计划用 fastAPI 框架搭建一个简单解析 API 服务,功能仍是简单粗糙,但用于学习或日常使用还是可以的

切换到fastapi 分支即可


  • 都是相对简单的爬虫,熟练应该看一眼就懂了,如果是初学者,里面有些东西还是值得看一看的。

  • 爬虫文件详情在这里 extractor


pip3 install -r requirements.txt
python3 extract.py

可能还需要安装 nodejs

  • screenshot

    example.gif

  • release

  • 欢迎star⭐ & fork

spiders's People

Contributors

xiyaowong avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

spiders's Issues

kuaishou

I seems your fix that allows downloads for kuaishou has broken the desktop links again, the same error 'title': '快手,记录世界记录你'

大神你好 acfun的你的接口是m3u的可以增加mp4的吗?

Douyin issue

The douyin.py produces all the valid information and video + audio The video link cannot be downloaded

修复bilibili视频下载

import re
import requests


def get(url: str) -> dict:
    """
    imgs、videos
    """
    data = {}
    headers = {
        "user-agent":
        "Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Mobile/15A372 Safari/604.1",
        "Referer": "https://www.bilibili.com/",
    }

    av_number_pattern = r'(BV[0-9a-zA-Z]*)'
    cover_pattern = r"image: '(.*?)',"
    video_pattern = r"video_url: '(.*?)',"
    title_pattern = r'title":"(.*?)",'

    av = re.findall(av_number_pattern, url)
    if av:
        av = av[0]
    else:
        data["msg"] = "链接可能不正确,因为我无法匹配到av号"
        return data
    url = f"https://www.bilibili.com/video/{av}"

    with requests.get(url, headers=headers, timeout=10) as rep:
        if rep.status_code == 200:
            cover_url = re.findall(cover_pattern, rep.text)
            if cover_url:
                cover_url = cover_url[0]
                if '@' in cover_url:
                    cover_url = cover_url[:cover_url.index('@')]
                data["imgs"] = ['https:'+cover_url]

            video_url = re.findall(video_pattern, rep.text)
            title_text = re.findall(title_pattern, rep.text)
            if video_url:
                video_url = video_url[0]
                data["videos"] = ['https:' + video_url.replace('upos-hz-mirrorakam.akamaized.net','upos-sz-mirrorkodo.bilivideo.com')]
            if title_text:
                data["videoName"] = title_text[0]
        else:
            data["msg"] = "获取失败"
        return data


if __name__ == "__main__":
    print(get(input("url: ")))

西瓜视频

按照流程先来三连.
楼主牛批,楼主牛批,楼主牛批!
希望能添加西瓜视频的,好像也是支持无水印的!

很棒,给个赞

我有个邪恶的想法,把这个库的python代码移植到android中,直接用安卓手机直接就能使用。博主支持不?

短视频的链接能爬吗

我想弄一个自动化,想问下怎么自动根据关键字爬取短视频链接,然后下载下来

皮皮虾的不行了

皮皮虾的无法获取到链接了,博主什么时候有时间,给更新一下呗。

kuaishou

Traceback (most recent call last):
File "kuaishou.py", line 53, in
Rprint(get(input("url: ")))
NameError: name 'Rprint' is not defined

set back to pprint and it's perfect

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.