Git Product home page Git Product logo

nwatechsummit-2015's Introduction

#Kudu Meetup Spark Streaming Example

For the moment kudu-spark (from tmalaska/SparkOnKudu) is included as there are no published artifacts yet.

Follow instructions to setup kudu sandbox or run against your own: http://getkudu.io/docs/quickstart.html

ssh into VM: ssh [email protected]

Install git: sudo yum install git

Clone repo: git clone https://github.com/silicon-valley-data-science/nwatechsummit-2015.git

Install sbt: http://www.scala-sbt.org/0.13/tutorial/Installing-sbt-on-Linux.html From link (may need refreshed):

curl https://bintray.com/sbt/rpm/rpm | sudo tee /etc/yum.repos.d/bintray-sbt-rpm.repo
sudo yum install sbt

Build:

cd ~/nwatechsummit-2015
sbt package

Download Spark 1.5.1 for Hadoop 2.6+ (http://spark.apache.org/downloads.html) From link (may need refreshed) and expand into home dir:

cd ~
wget http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-hadoop2.6.tgz
tar xvf spark-1.5.1-bin-hadoop2.6.tgz

Create Kudu Tables:

cd ~/nwatechsummit-2015/bin
./CreateMeetupKuduTable.sh
./CreateMeetupLoadSummaryKuduTable.sh
./CreateMeetupPredictionKuduTable.sh

Setup and start Kafka locally (or use existing install): http://kafka.apache.org/documentation.html#quickstart

Clone Meetup Stream Kafka Loader: git clone https://github.com/silicon-valley-data-science/strataca-2015.git

Install Maven on sandbox - http://preilly.me/2013/05/10/how-to-install-maven-on-centos/ :

Build Meetup Stream Kafka Loader

cd ~/strataca-2015/Building-a-Data-Platform/tailer2kafka/
mvn install

Start Curl to File: nohup bin/run_curl_meetup_stream.sh &

Start Tail File to Kafka: nohup bin/run_tailer2kafka.sh &

Run Streaming Prediction:

cd ~/nwatechsummit-2015/bin
./RunKuduMeetupStreamingPrediction.sh

Run Raw data to Kudu: ./RunKuduMeetup.sh

Go to http://quickstart.cloudera:8051, click tables tab then click each meetup table (for example kudu_meetup_rsvps) to get schema for impala table

Run impala-shell: impala-shell

Paste in create tables from above thenquery kudu meetup table:

select * from kudu_meetup_rsvps limit 2;
select * from kudu_meetup_rsvps_predictions limit 2;
select * from kudu_meetup_rsvps_load_summary limit 2;

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.