Git Product home page Git Product logo

crawler's Introduction

用Java语言实现的一个简易网络爬虫

目标

爬取知乎网的推荐页面的所有问题、问题链接、问题链以及关于对问题的回答,然后将它们打印在控制台。后序进阶会将爬去结果保存到数据库中,详细教程见我博客codingXiaxw' blog 。博客地址为:http://codingxiaxw.cn 爬虫系列项目地址为: http://codingxiaxw.cn/2016/10/20/20-%E7%94%A8Java%E5%86%99%E7%BD%91%E7%BB%9C%E7%88%AC%E8%99%AB%E4%B8%80%E4%B9%8B%E9%A1%B9%E7%9B%AE%E4%BB%8B%E7%BB%8D/

博客用了以下四篇文章来介绍,目前完成了前三篇,所以附上经过前三篇文章的教程后得到的爬取效果:

博客一览表:

效果图

搭建环境

目前只用到IntelliJ IDEA + chrome 后序完成第四篇文章介绍的知识时,会用到数据库MySQL

项目介绍

其中只有三个.java文件是需要的,SpaceMessage.java、Demo.java、Main.java、Main2.java、Main3.Java这四个文件是我玩玩写的demo,只需看Spider.java、Main4.java、Zhihu.java这三个文件。

  • Spider.java:1.sendGet()方法获取网页源码。2.regexString()方法用于返回和网页源码匹配到的结果(即我们需要的内容)。
  • Zhihu.java:封装我们需要的内容。
  • Main4.java:传入需要爬取页面的url。

使用方法

crawler's People

Contributors

codingxiaxw avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.