Git Product home page Git Product logo

hive-jdbc-storage-handler's Introduction

#Hive Storage Handler for JDBC#

The Hive Storage Handler For JDBC by Qubole, which is a fork of HiveJdbcStorageHandler, helps users read from and write to JDBC databases using Hive, and also enabling them to run SQL queries to analyze data that resides in JDBC tables. Optimizations such as FilterPushDown have also been added.

##Building from Source##

  • Download the code from Github:
  $ git clone https://github.com/qubole/Hive-JDBC-storage-Handler.git
  $ cd Hive-JDBC-storage-Handler
  • Build using Maven (add -DskipTests to build without running tests):
  $ mvn clean install -Phadoop-1
  • The JARs for the storage handler can be found in the target/ folder. Use qubole-hive-JDBC-0.0.4.jar in the hive session (see below).

##Usage##

  • Add the JAR to the Hive session. <path-to-jar> is the path to the above mentioned JAR. For using this with Qubole hive, upload the JAR to an S3 bucket and provide its path.
  ADD JAR <path-to-jar>;
  • Each record in the JDBC corresponds to a row in the Hive table.

  • While creating the Hive table, use

  STORED BY 'org.apache.hadoop.hive.jdbc.storagehandler.JdbcStorageHandler'

##Table Creation## Table can be created in 2 ways:

  • First is, Column mappings can be explicitly given along with the table creation statement.
DROP TABLE HiveTable;
CREATE EXTERNAL TABLE HiveTable(
  id INT,
  id_double DOUBLE,
  names STRING,
  test INT
)
STORED BY 'org.apache.hadoop.hive.jdbc.storagehandler.JdbcStorageHandler'
TBLPROPERTIES (
  "mapred.jdbc.driver.class"="com.mysql.jdbc.Driver",
  "mapred.jdbc.url"="jdbc:mysql://localhost:3306/rstore",
  "mapred.jdbc.username"="root",
  "mapred.jdbc.input.table.name"="JDBCTable",
  "mapred.jdbc.output.table.name"="JDBCTable",
  "mapred.jdbc.password"="",
  "mapred.jdbc.hive.lazy.split"= "false"
);

  • Second is, no table mappings are specified, SerDe class automatically generates those mappings.
CREATE EXTERNAL TABLE HiveTable
row format serde 'org.apache.hadoop.hive.jdbc.storagehandler.JdbcSerDe'
STORED BY 'org.apache.hadoop.hive.jdbc.storagehandler.JdbcStorageHandler'
TBLPROPERTIES (
  "mapred.jdbc.driver.class"="com.mysql.jdbc.Driver",
  "mapred.jdbc.url"="jdbc:mysql://localhost:3306/rstore",
  "mapred.jdbc.username"="root",
  "mapred.jdbc.input.table.name"="JDBCTable",
  "mapred.jdbc.output.table.name" = "JDBCTable",
  "mapred.jdbc.password"="",
  "mapred.jdbc.hive.lazy.split"= "false"
);
NOTE: "mapred.jdbc.hive.lazy.split"= "true" property enables 
       split computation to be done by mappers internally.

##Sample Queries##

HIVE-JDBC Storage Handeler supports alomost all types of possible SQL queries. Some examples of supported queries are:

####Queries to Read from DB ####

* select * from HiveTable;
* Select count(*) from HiveTable;
* select id from HiveTable where id > 50000;
* select names from HiveTable;
* select * from HiveTable where names like ‘D%’;
* SELECT * FROM HiveTable ORDER BY id DESC;

####Group By Queries ####

* select id, sum(id_double) as sum_double from HiveTable group by id;

Join Queries

* select HiveTable_1.*, HiveTable_2.* from HiveTable_1 a join HiveTable_2 b 
  on (a.id = b.id) where a.id > 90000 and b.id > 97000 ;

Queries to insert data into DB

* Insert Into Table HiveTable_1 select * from HiveTable_2;
*Insert Into Table HiveTable_1 select * from HiveTable_2 
 where id > 50000 and upper(names) = 'ANN';

Support For FilterPushDown

 set hive.optimize.ppd = false;

##Contributions##

hive-jdbc-storage-handler's People

Contributors

divyanshu25 avatar shubhamtagra avatar myui avatar hava101 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.