Docker container for a 3-node hadoop-yarn cluster (not for production).
The present deployment supports 5 services:
- hdfs
- spark
- zookeeper
- drill
- hbase
3 Linux nodes with docker engine installed in each one of them.
Resource requirements depend on the use case. I am using 4gb ram and 40gb storage on each node for testing purposes.
If the nodes do not share the same subnet, install openssh-server in every node and use it to produce public keys and provide the relevant authorizations to each node.
Choose one of the nodes to be the leader of the docker swarm.
Enter the leader node and type docker swarm init
.
The output of the above command generates the command that you should run in the remaining 2 nodes.
Copy the command and paste it in each one of the 2 remaining nodes, so that the nodes join the swarm.
If necessary, check the instructions here: https://docs.docker.com/engine/reference/commandline/swarm_init/.
Load the repository's files and folders in each one of the nodes.
Create an overlay network.
Run . overlay.sh
in the leader node.
Assign an id and a name in every node.
Run . node_id.sh 1
in master node, . node_id.sh 2
in worker-1 node and . node_id.sh 3
in worker-2 node.
Run . node_name.sh master
in master node, . node_name.sh worker-1
in worker-1 node and . node_name.sh worker-2
in worker-2 node.
Pull the image from my docker hub (https://hub.docker.com/repositories/bsamot10).
Run . pull.sh
in every node.
Start the containers.
Run . run.sh
in every node.
Start services in every node.
Run . spark-start-services.sh
in master node, to start hdfs and spark services in evey node.
Run . zookeeper-start-services.sh
in every node, to start zookeeper, drill and hbase services in every node.
Enter the containers to verify that the services are running.
Run . shell.sh
in every node.
Run jps
inside the containers.
If everything has gone well, the jps
command should print all services.