Git Product home page Git Product logo

spark.hbase's Introduction

Spark HBase Database

Prototypes a Spark-HBase-Database app without Spark read-write connectors. Specifically, this app has no Spark connectors to read-write vis-a-via HBase or a JDBC store. And while JDBC read-write connector support ships with Spark 2.4, updates are not supported! Hence the challenge of building a true Spark clusterable app without Spark read-write connectors.

Serialization

Spark task serialization issues are a challenge, to put it mildly. Earlier versions of this app relied too much on a task closure accessing external hbase and h2 proxies. The current implementation temporarily creates a pre-Spark session hbase and h2 proxy. Only after all pre-Spark session work has been completed, will a Spark session be created. A Dataset is then created from a sequence of pre-scanned HBase row keys. Then an hbase and h2 proxy are created within the Spark task closure, with the intention that all hbase and h2 code will execute on a Spark worker node. Only pre-Spark session code should execute on the Driver client.

Pre Spark Session

  1. Create HBase and H2 proxies.
  2. Create HBase key-value table.
  3. Put key-value pairs into HBase key-value table.
  4. Scan HBase key-value table for all row keys.
  5. Create H2 key-value table.
  6. HBase and H2 proxies are destroyed by GC.

Spark Session

  1. Create HBase and H2 proxies.
  2. Create Spark session.
  3. Create Dataset from sequence of HBase row keys.
  4. Foreach row key Get Json value via HBase client.
  5. Convert Json value to Scala object.
  6. Insert Scala object into key-value H2 table.
  7. Update Scala object in H2 key-value table.
  8. HBase and H2 proxies are destroyed by GC.
  9. Spark session is closed.

Install

Normally I would use Homebrew to install, start and stop HBase. But, in this case, I strongly recommend following this guide: http://hbase.apache.org/book.html#quickstart

If useful, consider adding an export $HBASE_HOME/bin to your export $PATH entry.

HBase

  1. hbase/bin$ ./hbase shell

Run

  1. hbase/bin$ ./start-hbase.sh
  2. sbt clean compile run
  3. hbase/bin$ ./stop-hbase.sh

Web

  1. HBase: http://localhost:16010/master-status
  2. Spark: http://localhost:4040

Stop

  1. Control-C

Output

  1. ./target/app.log

spark.hbase's People

Contributors

objektwerks avatar

Stargazers

 avatar

Watchers

James Cloos avatar  avatar  avatar Kostas Georgiou avatar  avatar

Forkers

kleitonrufino

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.