Git Product home page Git Product logo

crawler's Introduction

crawler

这个 Repo 主要记录平时做的一些小爬虫

1. 贴吧相关爬虫

爬取的内容主要是 [发帖时间、帖子标题、作者、作者 url、回帖数量、点击数、阅读数]等等,没有涉及到帖子的具体内容。

已完成的部分包括:

2. 数字货币相关爬虫

爬取的内容主要是数字货币的 price、volume、marketcap 等信息

已完成的部分包括:

3. 其他爬虫

已完成的部分包括:


说明

  • 爬虫的思路主要分为两类

    对于一些比较繁琐,有很多 js、ajax 脚本的网站(如百度指数),或者直接提供数据下载按钮的网站,如果能直接从后来看到调用 json 数据的 api 链接,就直接访问 api;否则采用 selenium 模拟浏览器爬取

    对于大部分是静态数据的页面,采用 etree + xpath 的方式进行爬取

  • 网站经常会进行改版,因此爬虫需要阶段性更新(如百度指数,从2017开始更新过三个版本)

  • 不同爬虫是不同阶段写的,因此代码里的代码风格并没有统一!暂时还没有批量修正,这也是需要反思的一点

  • 目前也只是因为感兴趣,阶段性地接触了一些爬虫知识,希望后续能有机会系统性地看看相关书籍

crawler's People

Contributors

liuzf13 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

crawler's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.