Light

wangtongxue / doubanspider Goto Github PK

View Code? Open in Web Editor NEW

This project forked from sunny-313/doubanspider

0.0 1.0 0.0 11.45 MB

应用Python3、urllib库，Flask框架、Echarts.js、WordCloud库，sqllite数据库等技术实现

CSS 23.65% JavaScript 5.33% HTML 68.67% Python 2.35%

doubanspider's Introduction

豆瓣Top数据爬取及可视化分析

本项目使用的语言为Python3, 用到的几个模块有：BeautifulSoup（爬数据），pandas（数据处理），Echarts.js（可视化），WordCloud库生成词云，部分图表由Tableau生成。

- 获取数据：使用urllib库获取豆瓣页面，BeautifulSoup进行网页解析，正则表示式抽取内容，获得豆瓣电影排行数据； - 存储数据：利用python的xlwt库将抽取的数据datalist写入Excel表格； - 数据可视化：利用Echarts丰富的可视化图表进行爬取数据的分析、利用WorldCloud依照特定图片合成词云； - 应用flask框架完成网站搭建并能够本地访问。

数据获取

计划要抓取的字段包括：电影详情链接、图片链接、影片中文名、影片外国名、评分、评价数、概况、相关信息等
需要抓取的影片信息有250条，每页25部影片，一共有10页。简单浏览网页不难发现，翻页的链接不需要从页面底端抓取，直接修改url参数即可。

数据分析

将清洗好的文件导入Tableau，制作图表分析

可以分析**电影在近些年的发展情况，跟世界上其他国家相比处在什么水平；
可以分析近些年有哪些国家/地区电影质量越来越高，哪些越来越差；
可以分析观众对不同电影类型的关注度差异有多大；等等。

doubanspider's People

Contributors

Watchers

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.