Git Product home page Git Product logo

maprdb_python_examples's Introduction

Introduction

 .d88888b. 888888        d8888 8888888 
d88P" "Y88b  "88b       d88888   888   
888     888   888      d88P888   888   
888     888   888     d88P 888   888   
888     888   888    d88P  888   888   
888     888   888   d88P   888   888   
Y88b. .d88P   88P  d8888888888   888   
 "Y88888P"    888 d88P     888 8888888     TM
            .d88P                      
          .d88P"                       
         888P"

This repo contains code examples for using python-bindings with JSON and MapR-DB (via the OJAI, Open JSON Application Interface). Optionally, the code in the .py files posted here can be used to build a sample application with our partner, Visual Action. The HTML and Javascript files to build the full application are not present in this repo, but the complete JSON data flow can be built from these files and they serve as example reference code for getting started with OJAI and MapR-DB in Python.

Prerequisites

System Level Prerequisites

You must have a MapR-DB instance with OJAI running to use this example code. Go to maprdb.io to download the current developer snapshot, which consists of an easy-to-use virtual machine with all the software you need pre-installed. This VM will give you a single-node Hadoop cluster running MapR.

Python3 is Required

If you are using one of the MapR pre-supplied VMs, you may need to install python3. Future versions of the sandbox will contain this preinstalled. To install python3 on the sandbox, follow these steps:

  • As root, run: yum install zlib-devel yum install openssl-devel yum install git

On the MapR sandbox, you can install the python 3.3 version from the CentOS 6 Software Collections as follows:

yum install centos-release-scl
yum install python33

Enable python33 for your current shell session and install pip and pandas:

scl enable python33 bash
easy_install-3.3 pip
pip install pandas

Install matplotlob

yum -y install libpng-devel freetype-devel
pip install matplotlib

Install Apache Maven

The python-bindings repo requires some java dependencies to build, and so it needs Maven.

You can also install Maven from SCL:

yum install maven30

Install the MapR-DB Python Package

To use Python with MapR-DB (as this example does), you will need the package python-bindings installed.

To install it, clone the repo and build:

scl enable maven30 bash
git clone https://github.com/mapr-demos/python-bindings.git
cd python-bindings
python setup.py install

Edit Variables and Prepare Files

The code examples that load data into MapR-DB do not require that the tables be made in advance, but they will delete the existing tables if they are there. Edit load.py and change the variables TABLE_SENSORS_PATH and TABLE_MAINT_PATH to change the path of the tables. These default to /user/mapr/mdata and /user/mapr/mdata.

The data set used in this application is in this wellsensors repo. A small pair of datasets for the well and maintenance information (1M and 50K lines, respectively) is included in this repo. To generate a larger dataset on your own, use the schema from the above repo with Ted Dunning's excellent log-synth tool to generate a set of arbitrary size. Watch out, JSON can get big, fast!

Be sure to uncompress ws.json.gz and maint.json.gz before running the scripts. If you are generating the graphs with makeplots.py, edit the OUTPUT_PATH variable at the top of that file to set the output directory for the graph images.

Using the Code Examples

The following files correspond to several code examples you can use to get familiar with how to load JSON into MapR-DB using Python.

  • load.py - reads each dataset (ws.json and maint.json), loads the JSON documents into MapR-DB
  • makeplots.py - reads the data and makes a series of plots, using Spark. This file can be executed in Spark by using spark-submit makeplots.py.
  • summary.py - reads the data outputs some summary statistics using a Pandas dataframe

These files are meant to be run in sequence, i.e. run load.py first, then either makeplots.py using the spark-submit command, or simply run summary.py with python3 and you can view basic statistics about the dataset.

Additional Resources

Follow the instructions in this video to build the application on your own machine.

Questions? Visit the maprdb.io page for more information and a support forum.

maprdb_python_examples's People

Contributors

namato avatar vicenteg avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.