The pythonspider from liuweixu

Liu Weixu's Python Spider Program

此处是我的爬虫的学习和实践的整理，部分程序有相应的笔记

注：

如果发现爬取失败的问题，可以发邮箱：[email protected] 或 [email protected]，这样我可以及时地收到消息。

如果碰上jupyter notebook文件不能加载的问题，可以登上nbviewer (jupyter.org) ，然后输入github上的jupyter文件的url，就可以看到文件的内容。

BeautifulSoup4的使用
- 笔趣阁里的《极品家丁》爬取笔记
- 高考一分一段表爬取
Xpath的使用（比较简单，可以直接利用Google的“Copy Xpath”功能来辅助）
- AcFun的网页版里面的番剧页面的爬虫
- b站通过av号爬取视频的封面
PyQuery
- 用Pyquery重写崔庆才的《Python3网络爬虫开发实战》的猫眼爬取
Ajax处理（需要在开发者工具中选择network的XHR或JS，找到符合条件的网址，一般内容为json格式）
- Bilibili新番表爬虫
- 半次元周榜部分爬虫
- 疫情数据爬取（国外数据），截止到2020年12月31日（没有添加国内数据）
- 教育部第四轮评估爬取
- 股票历史数据（涉及到字典的使用）
Scrapy
- Scrapy爬取Bing美图
- b站画友的最热图片
- 半次元的周榜上的封面图
- 唯1图片的动漫美女图片下载
- 对scrapyd爬虫实验网站的爬虫
- 火熊网图片爬取下载（这个涉及到表单的提交，个人认为这个比较重要）
- 爬取《极品家丁》小说（笔趣看的小说）
- 爬取阳光高考网的院校库的大学信息（scrapy和openpyxl相结合，涉及到open_spider()和close_spider()的用法）

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.

Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

TensorFlow

An Open Source Machine Learning Framework for Everyone

Django

The Web framework for perfectionists with deadlines.

Laravel

A PHP framework for web artisans

D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

web

Some thing interesting about web. New door for the world.

server

A server is a program made to process requests and deliver data to clients.

Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

Visualization

Some thing interesting about visualization, use data art

Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.

Microsoft

Open source projects and samples from Microsoft.

Google

Google ❤️ Open Source for everyone.

Alibaba

Alibaba Open Source for everyone

D3

Data-Driven Documents codes.

Tencent

China tencent open source team.

liuweixu / pythonspider Goto Github PK

pythonspider's Introduction