Git Product home page Git Product logo

xiecheng's Introduction

xiecheng

众源时空信息聚合平台

##基本信息:

基于scrapy+selenium的爬去策略,以南京市为例,抽取南京市酒店的基本信息数据与酒店点评数据

##使用Python 库: 1.scrapy,网上安装方法许多,可自行下载相关依赖

2.selenium
可以直接使用pip进行安装

Selenium也是一个用于Web应用程序测试的工具。Selenium测试直接运行在浏览器中,就像真正的用户在操作一样。

##使用驱动: 1.Chrome驱动
下载地址:http://npm.taobao.org/mirrors/chromedriver

selenium调用需要,需下载系统对应版本,将其放置到系统能直接访问的文件夹,如放在{PYTHON_HOME}/Scripts文件夹中

##数据库:

###数据库名:xiecheng 1.hotellianjie(存储酒店url)

DROP TABLE IF EXISTS `hotellianjie`;
CREATE TABLE `hotellianjie` (
  `guid` varchar(255) DEFAULT NULL,
  `lianjie` varchar(255) DEFAULT NULL,
  `city` varchar(30) DEFAULT NULL,
  `comm_num` int(30) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

2.hotelinfo(酒店基本信息数据)

DROP TABLE IF EXISTS `hotelinfo`;
CREATE TABLE `hotelinfo` (
  `guid` varchar(255) DEFAULT NULL,
  `city` varchar(30) DEFAULT NULL,
  `title` varchar(60) DEFAULT NULL,
  `price` decimal(10,1) DEFAULT NULL,
  `score` int(20) DEFAULT NULL,
  `recommend` varchar(120) DEFAULT NULL,
  `area` varchar(120) DEFAULT NULL,
  `havawifi` varchar(20) DEFAULT NULL,
  `discussNum` int(11) DEFAULT NULL,
  `common_facilities` varchar(500) DEFAULT NULL,
  `activity_facilities` varchar(255) DEFAULT NULL,
  `service_facilities` varchar(255) DEFAULT NULL,
  `room_facilities` varchar(255) DEFAULT NULL,
  `around_facilities` varchar(255) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

3.hotelcommentinfo(存储酒店评论数据)

DROP TABLE IF EXISTS `hotelcommentinfo`;
CREATE TABLE `hotelcommentinfo` (
  `hotelname` varchar(50) DEFAULT NULL,
  `username` varchar(40) DEFAULT NULL,
  `commentscore` varchar(40) DEFAULT NULL,
  `intime` varchar(40) DEFAULT NULL,
  `tourstyle` varchar(40) DEFAULT NULL,
  `praisenum` int(11) DEFAULT NULL,
  `commenttime` varchar(60) DEFAULT NULL,
  `comment` varchar(1000) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

xiecheng's People

Contributors

ahuliuyang avatar

Stargazers

 avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.