This repository includes jupyter files that puts in practice the tools spark offers.
Spark is a fast and general cluster computing system for Big Data. It provides high-level
APIs in Scala, Java, Python, and R, and an optimized engine that supports general
computation graphs for data analysis
Functional programming
This repository includes jupyter files that puts in practice python's high order functions.
Several examples are given to show how convinient and simple HOF can be in analyzing
data using citibike.csv file.
Install Pyspark
Run the Jupyter with the following command:
~/anaconda2/CSC_599_WORK$ PYSPARK_DRIVER_PYTHON=`which jupyter` PYSPARK_DRIVER_PYTHON_OPTS='notebook
--no-browser' ~/Downloads/spark/bin/pyspark
Must specify the location of pyspark