Git Product home page Git Product logo

hadoopcluster's Introduction

hadoopCluster

Docker container for a 3-node hadoop-yarn cluster (not for production).

The present deployment supports 5 services:

  1. hdfs
  2. spark
  3. zookeeper
  4. drill
  5. hbase

Requirements

3 Linux nodes with docker engine installed in each one of them.

Resource requirements depend on the use case. I am using 4gb ram and 40gb storage on each node for testing purposes.

Step 0

If the nodes do not share the same subnet, install openssh-server in every node and use it to produce public keys and provide the relevant authorizations to each node.

Step 1

Choose one of the nodes to be the leader of the docker swarm.

Enter the leader node and type docker swarm init.

The output of the above command generates the command that you should run in the remaining 2 nodes.

Copy the command and paste it in each one of the 2 remaining nodes, so that the nodes join the swarm.

If necessary, check the instructions here: https://docs.docker.com/engine/reference/commandline/swarm_init/.

Step 2

Load the repository's files and folders in each one of the nodes.

Step 3

Create an overlay network.

Run . overlay.sh in the leader node.

Step 4

Assign an id and a name in every node.

Run . node_id.sh 1 in master node, . node_id.sh 2 in worker-1 node and . node_id.sh 3 in worker-2 node.

Run . node_name.sh master in master node, . node_name.sh worker-1 in worker-1 node and . node_name.sh worker-2 in worker-2 node.

Step 5

Pull the image from my docker hub (https://hub.docker.com/repositories/bsamot10).

Run . pull.sh in every node.

Step 6

Start the containers.

Run . run.sh in every node.

Step 7

Start services in every node.

Run . spark-start-services.sh in master node, to start hdfs and spark services in evey node.

Run . zookeeper-start-services.sh in every node, to start zookeeper, drill and hbase services in every node.

Step 8

Enter the containers to verify that the services are running.

Run . shell.sh in every node.

Run jps inside the containers.

If everything has gone well, the jps command should print all services.

hadoopcluster's People

Contributors

bsamot10 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.