Git Product home page Git Product logo

cloudmusic-crawler's Introduction

新版即将到来。。。

Introduction

看见有人写了一篇我用Python分析了42万字的歌词,为了搞清楚民谣歌手们在唱些什么,觉得挺好玩的,于是就想自己也实现一下。于是本作品就诞生了。

爬虫

爬虫部分主要是调用已有的 API。这部分的工作可以参考NetEase-MusicBox,该作品作者实现了网易云音乐的命令行版,我用了一下还不错。主要参考了该作者的api.py部分。

Screenshot3.png

文件处理

该部分主要的工作是将所有歌词写入一个文件,同时每个作者的所有歌词也放入一个文件,以备后面的分析之用。

Screenshot4.png

本次获取的歌词大概 26000 行。

文本分析

分词用的是“结巴”中文分词

我首先选取了一位歌手作为代表分析了一下词频,如下所示:

shisanfigure_2.png

figure_bar01.png

figure_pie01.png

做了一个词云:

shisanfigure_1.png

然后。把所有的歌词都分析了一下,得到了如下饼状图:

fm3.png

还做了一个词云,如下所示:

fm0.png

接下来的工作

  • 情绪分析
  • 云音乐的评论很精彩,可以做一下评论,看看有什么发现

如何使用

git clone https://github.com/GreatV/CloudMusic-Crawler.git

cd CloudMusic-Crawler

python3 -m venv venv

source venv/bin/activate

pip install -r requirements.txt

cd NEMCrawler

python NEM_spider.py

python text_mining.py

firefox render.html

cloudmusic-crawler's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cloudmusic-crawler's Issues

新人,麻烦指点一下

在项目执行后没找到render.html文件
image
这个不知道是什么情况,修改main为__main__后可以但是找不到html文件

非常抱歉,再次打扰您

在虚拟环境都做完之后,第一个python文件运行正常,之后第二个文件一直是说stop_words.txt有无法解码的字符,于是修改成ANSI字符,这个问题也就没有了,又报错说是,只能在POSIX System并行,我想重新启动一下,于是整个项目删掉,当我重新Dowload一份新的项目后,第一个文件又运行不了了。
image
就是这个情况

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.