Git Product home page Git Product logo

spider163's Introduction

spider163 logo

抓取网易云音乐

抓取热门歌单

$ python playlist.py  1 10
$ # 抓取热门歌单前十页的歌单名字和链接
$ python playlist.py 粤语 1 42
$ # 抓取全部粤语歌单

抓取歌单内歌曲

$ python music.py playlist 376259016
$ # 抓取编号为 376259016 的歌单
$ python music.py database
$ # 抓取存储的热门歌单里面的歌曲,批量抓取

抓取歌曲评论

$ python comment.py
$ # 自动抓取已存储歌曲,并保持去重复
$ python comment.py 407450223
$ # 抓取歌曲编号对应的评论

数据库结构

  • 数据库名字:默认spider
  • 数据库配置:配置在spider163.conf的db字段
$ mysql> desc playlist163;
$ +-------------+--------------+------+-----+-------------------+----------------+
$ | Field       | Type         | Null | Key | Default           | Extra          |
$ +-------------+--------------+------+-----+-------------------+----------------+
$ | id          | int(11)      | NO   | PRI | NULL              | auto_increment |
$ | title       | varchar(150) | YES  |     |                   |                |
$ | link        | varchar(120) | YES  |     |                   |                |
$ | cnt         | varchar(20)  | YES  |     | 0                 |                |
$ | dsc         | varchar(50)  | YES  |     | all               |                |
$ | create_time | datetime     | YES  |     | CURRENT_TIMESTAMP |                |
$ | over        | varchar(20)  | YES  | MUL | N                 |                |
$ +-------------+--------------+------+-----+-------------------+----------------+
$ 7 rows in set (0.00 sec)
$ mysql> desc music163;
$ +-------------+--------------+------+-----+-------------------+----------------+
$ | Field       | Type         | Null | Key | Default           | Extra          |
$ +-------------+--------------+------+-----+-------------------+----------------+
$ | id          | int(11)      | NO   | PRI | NULL              | auto_increment |
$ | song_id     | int(11)      | YES  |     | NULL              |                |
$ | song_name   | varchar(200) | YES  |     |                   |                |
$ | author      | varchar(350) | YES  |     |                   |                |
$ | over        | varchar(5)   | YES  | MUL | N                 |                |
$ | create_time | datetime     | YES  |     | CURRENT_TIMESTAMP |                |
$ | comment     | int(11)      | YES  |     | 0                 |                |
$ +-------------+--------------+------+-----+-------------------+----------------+
$ 7 rows in set (0.00 sec)
$ mysql> desc comment163;
$ +---------+--------------+------+-----+---------+----------------+
$ | Field   | Type         | Null | Key | Default | Extra          |
$ +---------+--------------+------+-----+---------+----------------+
$ | id      | int(11)      | NO   | PRI | NULL    | auto_increment |
$ | song_id | int(11)      | YES  |     | NULL    |                |
$ | txt     | mediumtext   | YES  |     | NULL    |                |
$ | author  | varchar(100) | YES  |     | 注销    |                |
$ | liked   | int(11)      | YES  | MUL | 0       |                |
$ +---------+--------------+------+-----+---------+----------------+
$ 5 rows in set (0.00 sec)
$ mysql> desc exception;
$ +-------+--------------+------+-----+---------+----------------+
$ | Field | Type         | Null | Key | Default | Extra          |
$ +-------+--------------+------+-----+---------+----------------+
$ | id    | int(11)      | NO   | PRI | NULL    | auto_increment |
$ | eid   | int(11)      | YES  |     | 0       |                |
$ | scene | varchar(300) | YES  |     | NULL    |                |
$ | tb    | varchar(30)  | YES  |     | NULL    |                |
$ +-------+--------------+------+-----+---------+----------------+
$ 4 rows in set (0.00 sec)

TODO

  • 增加抓取歌单页面个性推荐歌单
  • 增加抓取排行榜 ✔️
  • 严格去重复 ✔️
  • 优化代码结构,冗余代码过多

BUG

  • 若干歌单无法抓取,待重现定位
  • ...

THANKS

  • 给网易一个大感谢!

欢迎关注微信公众账号:程天写代码

guojingcoooool

spider163's People

Watchers

Vera Zou avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.