The spark.hbase from objektwerks

Spark HBase Database

Prototypes a Spark-HBase-Database app without Spark read-write connectors. Specifically, this app has no Spark connectors to read-write vis-a-via HBase or a JDBC store. And while JDBC read-write connector support ships with Spark 2.4, updates are not supported! Hence the challenge of building a true Spark clusterable app without Spark read-write connectors.

Serialization

Spark task serialization issues are a challenge, to put it mildly. Earlier versions of this app relied too much on a task closure accessing external hbase and h2 proxies. The current implementation temporarily creates a pre-Spark session hbase and h2 proxy. Only after all pre-Spark session work has been completed, will a Spark session be created. A Dataset is then created from a sequence of pre-scanned HBase row keys. Then an hbase and h2 proxy are created within the Spark task closure, with the intention that all hbase and h2 code will execute on a Spark worker node. Only pre-Spark session code should execute on the Driver client.

Pre Spark Session

Create HBase and H2 proxies.
Create HBase key-value table.
Put key-value pairs into HBase key-value table.
Scan HBase key-value table for all row keys.
Create H2 key-value table.
HBase and H2 proxies are destroyed by GC.

Spark Session

Create HBase and H2 proxies.
Create Spark session.
Create Dataset from sequence of HBase row keys.
Foreach row key Get Json value via HBase client.
Convert Json value to Scala object.
Insert Scala object into key-value H2 table.
Update Scala object in H2 key-value table.
HBase and H2 proxies are destroyed by GC.
Spark session is closed.

Install

Normally I would use Homebrew to install, start and stop HBase. But, in this case, I strongly recommend following this guide: http://hbase.apache.org/book.html#quickstart

If useful, consider adding an export $HBASE_HOME/bin to your export $PATH entry.

HBase

hbase/bin$ ./hbase shell

Run

hbase/bin$ ./start-hbase.sh
sbt clean compile run
hbase/bin$ ./stop-hbase.sh

Web

HBase: http://localhost:16010/master-status
Spark: http://localhost:4040

Stop

Control-C

Output

./target/app.log

objektwerks / spark.hbase Goto Github PK

spark.hbase's Introduction

Spark HBase Database

Serialization

Pre Spark Session

Spark Session

Install

HBase

Run

Web

Stop

Output

spark.hbase's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent