Apache Spark in a Docker container, based on java:8
. Heavily inspired by gettyimages/docker-spark. Cluster deployed with Docker Compose and orchestrated with Docker Swarm.
Official image for Cassandra is used and installed on the same nodes as Spark slaves.
In scripts/
, run :
./bootstrap.sh
This script will set up a key-value store using Consul, a Swarm cluster with a master and a slave, and an overlay network for multi-host networking.
Connect to the Swarm master with:
eval $(docker-machine env --swarm swarm-master)
The cluster configuration should be visible by running:
docker info
Deploy the containers on constrained nodes with:
docker-compose --x-networking --x-network-driver=overlay up -d
In scripts/
, run:
./deploy_cassandra.sh
After deployment is complete, you can run cqlsh
this way:
docker run -it --rm --net container:cass1 cassandra:2.2.4 cqlsh
A folder /data
is mounted as a shared volume outside of the container to save Cassandra data.
- Multi-host networking: https://docs.docker.com/engine/userguide/networking/get-started-overlay/
- Networking in Compose: https://docs.docker.com/compose/networking/