Git Product home page Git Product logo

jdspider's Introduction

JDSpider

京东爬虫

可以实现输入一个关键字后自动爬取相关的商品信息,也可以用于自定义爬取商品的评论。

使用方法

1.创建一个JD_spider实例对象
2.选择爬取方法直接调用,一共有3种方法:

  • crawl:用于抓取商品信息;
  • crawl_comment:用于抓取评论;
  • get_urls:返回爬取到的商品信息的url列表
    具体如下:
    def crawl(self,serach_name):
        '''
        此方法用于爬取商品信息列表
        :param serach_name: 为所爬取类别关键字,如:'苹果手机',类型为str
        '''
    def crawl_comment(self,urls,pages=None):
        '''
        此方法用于爬取评论
        :param urls: 为商品详情页的url,如:'https://item.jd.com/11856959514.html'
                    可以为一个或多个,参数类型为str或者list
        :param pages: 设置爬取的页数,默认 None表示全部爬取
        '''
    def get_urls(self,number = None):
        '''
        用于从爬到的商品信息中获取url列表
        :param number: 获取url的个数,默认None表示全部获取
        :return: 返回商品对应的url列表(可赋值给crawl_comment用于获取商品对应的评论)
        '''

爬取的商品信息存储格式如下:

1.商品信息:直接存储在txt文本中
2.评论信息:按名称分类存储在评论文件夹内

jdspider's People

Contributors

hahaha108 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.