hadoopCluster

Docker container for a 3-node hadoop-yarn cluster (not for production).

The present deployment supports 5 services:

hdfs
spark
zookeeper
drill
hbase

Requirements

3 Linux nodes with docker engine installed in each one of them.

Resource requirements depend on the use case. I am using 4gb ram and 40gb storage on each node for testing purposes.

Step 0

If the nodes do not share the same subnet, install openssh-server in every node and use it to produce public keys and provide the relevant authorizations to each node.

Step 1

Choose one of the nodes to be the leader of the docker swarm.

Enter the leader node and type docker swarm init.

The output of the above command generates the command that you should run in the remaining 2 nodes.

Copy the command and paste it in each one of the 2 remaining nodes, so that the nodes join the swarm.

If necessary, check the instructions here: https://docs.docker.com/engine/reference/commandline/swarm_init/.

Step 2

Load the repository's files and folders in each one of the nodes.

Step 3

Create an overlay network.

Run . overlay.sh in the leader node.

Step 4

Assign an id and a name in every node.

Run . node_id.sh 1 in master node, . node_id.sh 2 in worker-1 node and . node_id.sh 3 in worker-2 node.

Run . node_name.sh master in master node, . node_name.sh worker-1 in worker-1 node and . node_name.sh worker-2 in worker-2 node.

Step 5

Pull the image from my docker hub (https://hub.docker.com/repositories/bsamot10).

Run . pull.sh in every node.

Step 6

Start the containers.

Run . run.sh in every node.

Step 7

Start services in every node.

Run . spark-start-services.sh in master node, to start hdfs and spark services in evey node.

Run . zookeeper-start-services.sh in every node, to start zookeeper, drill and hbase services in every node.

Step 8

Enter the containers to verify that the services are running.

Run . shell.sh in every node.

Run jps inside the containers.

If everything has gone well, the jps command should print all services.

bsamot10 / hadoopcluster Goto Github PK

hadoopcluster's Introduction

hadoopCluster

Requirements

Step 0

Step 1

Step 2

Step 3

Step 4

Step 5

Step 6

Step 7

Step 8

hadoopcluster's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent