- hadoop (on osx brew install hadoop)
- spark (on osx brew install apache-spark)
- python (2.7)
- virtualenv
Create virtual environment virtualenv venv
[first time only]
To install dependencies activate virtualenv by typing . ./venv/bin/activate
and pip install -r requirements.txt
[first time only]
run python utils/downloader_service.py
to download sample data [first time only]
run python utils/data_processing.py
to see example data set processing