Git Product home page Git Product logo

couchbase-spark-mba's Introduction

Spark - Couchbase Market Basket Analysis

This repository is companion code for a blog post on our site: http://blogs.avalonconsult.com/blog/big-data/combining-operational-and-analytical-big-data-using-couchbase-and-spark-a-market-basket-analysis-example/.

This project is an all in one environment that setups a Vagrant machine with Couchbase and Spark installed. And has a Spark process that will generate some basic Market Basket Analytics in the form of product recommendations.

Sample data taken from:

Brijs T., Swinnen G., Vanhoof K., and Wets G. (1999), The use of association rules for product
assortment decisions: a case study, in: Proceedings of the Fifth International Conference on
Knowledge Discovery and Data Mining, San Diego (USA), August 15-18, pp. 254-260. ISBN:
1-58113-143-7.

Can be found under the 'retail' link here: http://fimi.ua.ac.be/data/

Prerequisites

  1. Install Virtualbox: https://www.virtualbox.org/wiki/Downloads

  2. Install Vagrant: http://www.vagrantup.com/downloads.html

  3. Install necessary Vagrant plugins:

vagrant plugin install vagrant-hostmanager
vagrant plugin install vagrant-cachier
  1. Install Ansible
brew install ansible

Getting Started

Start by bringing up the Vagrant machine, it is configured to install everything you need to run the analysis

cd vagrant
vagrant up

Load the sample data by SSHing into the machine and running the tocb.py script.

cd vagrant
vagrant ssh
cd /vagrant
python tocb.py

At the top level of the repo, go into the Spark project, build it, then move the jar file into the Vagrant shared folder:

cd mba
mvn clean package
cp target/mba-1.0-SNAPSHOT.jar ../vagrant/

Once the jar is in place you can SSH into the vagrant machine and run the process:

cd vagrant
vagrant ssh
/opt/spark-1.4.1-bin-hadoop2.6/bin/spark-submit --class com.avalonconsult.mba.MBA /vagrant/mba-1.0-SNAPSHOT.jar

You can access the Couchbase UI at retail.vagrant:8091 with credentials: couchbase//couchbase

couchbase-spark-mba's People

Contributors

kruthar avatar

Watchers

James Cloos avatar José Enrique Pérez Fernández avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.