Project status: alpha.
Spark image built with official instructions presents several problems:
- Does not support
s3a://
urls for application and dependency jars. Application has to build custom Docker image to bundle application jar. - Built with Scala 2.11, rather than 2.12. Although Spark officially provides Scala 2.12 version of distribution, it doesn't include Hadoop dependencies.
This image addresses these issues.
-
Download and extract Hadoop binary distribution (any version above 2.8) into
build/
directory. Rename it ashadoop
. -
Download and extract Spark binary distribution (without pre-packaged Hadoop dependencies) into
build/
directory. Rename it asspark
. -
Build Spark image:
docker build -t <tag> -f Dockerfile .