Git Product home page Git Product logo

crawl's Introduction

crawl

一个简单、易用、小而健全的爬虫

QuickStart

只需三步,你就可以部署一个爬取 gocn 网站的所有新闻的爬虫

  • 第一步,你需要去 github 上生成一个自己的 token Settings ——> Developer settings ——> Personal access tokens ——> Generate new token

    然后,配置自己的环境变量 export GITHUB_TOKEN=(第一步生成的 token),或者将代码中全局 Token 修改为自己 token

    var Token = GetValueFromEnv("GITHUB_TOKEN")
    
  • 第二步,需要在本地安装 redis,并且启动程序之前需要先启动本地 redis,端口使用默认端口 6379,因为程序默认使用 redis 进行去重。redis 的安装 可以参考 redis安装

  • 第三步,git clone 代码仓库,并且在后台进程中运行爬虫,每 6 个小时爬取当天新闻并进行 github 推送。

    git clone https://github.com/lubanproj/crawl.git
    cd crawl
    go build -v 
    ./crawl &
    

特性

  • 支持每天定时爬取
  • 支持分页爬取
  • 支持数据去重
  • 支持 github 推送

展示效果

展示效果如下:

GoCN 每日新闻 (2020-03-29)

  1. Go 编译器指南 https://www.caffeinatedwonders.com/2019/12/26/tour-of-go-compilers/
  2. 从 gRPC 的重试策略说起 https://gocn.vip/topics/10135
  3. Go 实现 LeetCode 全集 https://github.com/austingebauer/go-leetcode
  4. 分布式从 ACID、CAP、BASE 的理论推进 https://gocn.vip/topics/10121
  5. dubbogo 1.4 最新特性 https://gocn.vip/topics/10119

GoCN 每日新闻 (2020-03-26)

  1. 结构体转 map https://www.liwenzhou.com/posts/Go/struct2map/
  2. Go 每日一库之 sjson  https://segmentfault.com/a/1190000022148617
  3. 用面向对象设计原则理解 Go 中 interface https://mp.weixin.qq.com/s/MqQ6b-Z_wvYe9YpNI5LDeA
  4. Go 项目简单接入 travis ci https://juejin.im/post/5e7592c0518825494a3fadd9
  5. 微服务设计模式 https://mp.weixin.qq.com/s/mHHPaYEvon4zFHHDNP8A9A
  6. [深圳] 腾讯 PCG 技术运营部招聘 Go 后台开发 https://gocn.vip/topics/10108

GoCN 每日新闻 (2020-03-23)

  1. 使用 Go 基准测试解决旅行商问题的精确算法 https://medium.com/@damien.leroux.pro/benchmark-an-exact-algorithm-solving-the-traveling-salesman-problem-with-go-e502b0ca3d0e
  2. 关于收集,标准化和集中化处理 Golang 日志的一些建议 https://segmentfault.com/a/1190000022106356
  3. Golang 三种方式实现超时退出 https://juejin.im/post/5e774a73e51d4526c70fd0a4
  4. Go 进程的 HeapReleased 上升,但是 RSS 不下降造成内存泄漏? https://pengrl.com/p/20033
  5. 分享一个 微信 Golang SDK https://gocn.vip/topics/10094

详情可见:go_read

crawl's People

Contributors

lubanproj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

lubanclub

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.