Git Product home page Git Product logo

autohome_spider's Introduction

汽车之家爬虫

这个爬虫可以用于从汽车之家批量获取关于某一车型的评论,可以用于进一步的自然语言分析

汽车之家爬虫介绍

最初开始做这个爬虫是因为想要了解一下一款车型的口碑如何,后来干脆整理了一下做成了一个通用的爬虫,大家如果有自己感兴趣的车型的话只需要替换代码中的车辆代码即可,代码示例中以宝马5系和奔驰E级用于测试和分析。

配置需求

本爬虫用python编写,在python3.5上测试运行正常,使用到的包如下:

from selenium import webdriver
from bs4 import BeautifulSoup
import json
import time

除此之外,本爬虫调用了chrome webdriver用于爬虫工作,可根据自己需求换为firefox或是phantomjs。

获取的数据

运用本爬虫,可获得以下数据:相关车型论坛中的评论数据,相关车型的口碑数据,相关车型的各单项评分。

其他

感谢@qjing666在这个爬虫中的工作和努力。

autohome_spider's People

Contributors

panda0881 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.