Use Spark on Heroku in a single dyno. Experiment inexpensively with Spark in the Common Runtime.
Production-quality Spark clusters may be deployed into Private Spaces using spark-in-space.
This buildpack provides the following three processes, children of the main web process:
- Nginx proxy for
* basic password authentication set via environment variable
SPACE_PROXY_BASIC_AUTH
- format
username:{PLAIN}password
- Spark master
* web UI
https://your-spark-app.herokuapp.com/
* REST APIhttps://your-spark-app.herokuapp.com/rest
- one Spark worker
*
https://your-spark-app.herokuapp.com/worker
๐จ This app should not be scaled beyond a single dyno. (There is no coordination mechanism between multiple instances; implicitly use 127.0.0.1:7077
as Spark Master.)
Because Spark Singularity is contained in a single dyno with only port 80 exposed, there are two options for submitting jobs:
- Spark's REST API, proxied at
https://your-spark-app.herokuapp.com/rest
- Declare the Spark jobs to submit on start-up by adding each classname on an individual line in
Jobfile
:
org.example.SparkWordCounter
org.example.SparkWordCluster
heroku create
heroku addons:create bucketeer --as SPARK_S3
heroku buildpacks:add -i 1 https://github.com/heroku/heroku-buildpack-space-proxy.git
heroku buildpacks:add -i 2 heroku/scala
heroku buildpacks:add -i 3 https://github.com/heroku/spark-in-space.git
heroku buildpacks:add -i 4 https://github.com/dpiddy/heroku-buildpack-runit.git
heroku buildpacks:add -i 5 https://github.com/kr/heroku-buildpack-inline.git
These processes will run out of memory without the large 14GB RAM dynos.
heroku scale web=0
heroku run bin/spark-local-job spark.in.space.Import -s Performance-L
heroku scale web=1:Performance-L
heroku logs -t
# Once complete, avoid ongoing PL dyno charges,
heroku scale web=0:Standard-1x