Git Product home page Git Product logo

bilibiliwordcloud's Introduction

bilibiliWordCloud

image image image image

该程序实现了利用基于scrapy框架编写的爬虫程序爬取b站番剧短评,然后利用jieba库对爬取的短评分词,最后使用wordcloud展示的功能。( ps: 多年前写的代码,最近偶然整理文件夹时看到了,跑了一下居然还能用,觉得还挺有趣,就传上来了。)

使用指南

先确保你电脑安装了docker,以下docker的安装教程只针对MacOS用户。

$ brew install --cask docker # 确保你已经安装了brew 

运行docker,你也可以直接在你应用里找到Docker.app然后双击运行启动。

$ open /Applications/Docker.app

如果你想直接拉取我编译好的镜像并运行,你需要确认你的操作系统架构基于以下两种。

linux/arm64
linux/amd64

拉取镜像运行

$ docker pull godmountain/bilibili-wordcloud:latest
$ docker run --name bilibili -e media_id=1586 godmountain/bilibili-wordcloud:latest # media_id参数指定了爬取番剧
.
.
.
$ docker cp bilibili:/proj/bilibili/output.jpg /your/local/path # 别忘记了修改后面这个路径

当你在本地打开output.jpg后你就能看到

media_id的含义请参照 -> 如何找到对应番剧的media_id

怎样在本地构建你自己的docker镜像

进入Dockerfile所在目录,然后使用build指令构建你自己的镜像。

$ ls # 先确保你在的目录下有Dockerfile文件
.
├── Dockerfile
├── LICENSE
├── README.md
├── bilibili
├── images
└── requirements.txt
$ docker build . -t bilibili
$ docker run --name bilibili -e media_id=1586 bilibili

等待程序执行结束后将wordcloud生成的图片拷贝到本地查看

$ docker cp bilibili:/proj/bilibili/output.jpg /your/local/path # 别忘记了修改后面这个路径

打开你的b站找到一部你想爬取的番剧或者电影,只要有短评这个选项的都能爬。( ps: 请注意是短评不是评论🤪. ) 然后点击下面图示中的查看全部选项。

复制打开页面的链接中md后面的那串数字,在这个例子中是'1586'

bilibiliwordcloud's People

Contributors

mgmcn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

liulime

bilibiliwordcloud's Issues

更换alpine

利用alpine虽然能缩小编译镜像的size,但是太浪费时间了,以至于github workflow的release都没法儿成功build一次。😂 得改得改...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.