Git Product home page Git Product logo

cloudmusic-crawler's Introduction

新版即将到来。。。

Introduction

看见有人写了一篇我用Python分析了42万字的歌词,为了搞清楚民谣歌手们在唱些什么,觉得挺好玩的,于是就想自己也实现一下。于是本作品就诞生了。

爬虫

爬虫部分主要是调用已有的 API。这部分的工作可以参考NetEase-MusicBox,该作品作者实现了网易云音乐的命令行版,我用了一下还不错。主要参考了该作者的api.py部分。

Screenshot3.png

文件处理

该部分主要的工作是将所有歌词写入一个文件,同时每个作者的所有歌词也放入一个文件,以备后面的分析之用。

Screenshot4.png

本次获取的歌词大概 26000 行。

文本分析

分词用的是“结巴”中文分词

我首先选取了一位歌手作为代表分析了一下词频,如下所示:

shisanfigure_2.png

figure_bar01.png

figure_pie01.png

做了一个词云:

shisanfigure_1.png

然后。把所有的歌词都分析了一下,得到了如下饼状图:

fm3.png

还做了一个词云,如下所示:

fm0.png

接下来的工作

  • 情绪分析
  • 云音乐的评论很精彩,可以做一下评论,看看有什么发现

如何使用

virtualenv newenv

git clone https://github.com/GreatV/CloudMusic-Crawler.git

cd CloudMusic-Crawler

pip install -r requirements.txt

cd NEMCrawler

python NEM_spider.py

python text_mining.py

firefox render.html

cloudmusic-crawler's People

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.