Light

raktim00 / hadoop-hdfs-mr-multi-node-cluster-aws-ansible Goto Github PK

View Code? Open in Web Editor NEW

1.0 3.0 2.0 23 KB

Provisioning EC2 Instances & then setting up Hadoop Multi Node Storage (HDFS) & Compute (MR) Cluster on them using Ansible Automation

License: MIT License

Jinja 100.00%

hadoop hadoop-hdfs hadoop-mapreduce aws ec2-instance ansible ansible-role ansible-playbook

hadoop-hdfs-mr-multi-node-cluster-aws-ansible's Introduction

Hadoop HDFS & MapReduce Multi Node Cluster Setup on AWS EC2 Instances using Ansible Automation

Let's see the problem Statement :

Create Ansible Role to launch 9 AWS EC2 Instances.
Dynamically fetch the IPs & create the Inventory to run the further Ansible Roles on those Instances.
Create Role to configure Hadoop Name Node (Master), Data Node (Worker), Job Tracker Node, Task Tracker Node & Client Node.
Finally configure 1st & 2nd & 3rd Instance as Name Node, Job Tracker & Client Node, also configure other 3 systems as Data Node & another 3 as Task Tracker.

Video Demonstration : https://bit.ly/3tICiLd

How to do this practical on your system :

Install Ansible v 2.10 on your local linux system.
Next clone this repository & go inside the folder "hadoop-ws". This is our workspace & it contain everything.
In this workspace we need to put two files - hadoop_instance.pem file & cred.yml file.
Now this "hadoop_instance.pem" file, you need to create on your AWS Account & then download the file in your Workspace - "hadoop-ws".
Next run chmod 400 hadoop_instance.pem to secure your AWS key pair from other user on your linux system.
Next run ansible-vault create cred.yml & it will open the vi editor on your linux system. So here put your AWS access key & secret key in YAML format.

This file data should look like

access_key : ABCDEFGHIJK
secret_key : abcdefghijk12345

Next go to "hadoop-ws/roles/ec2/vars/" folder & edit the "main.yml" file. Here you only just need to change the "subnet_name" variable with your "AWS account subnet id".
Note : As I am using AWS default VPC, that's why I haven't mentioned that on my "hadoop-ws/roles/ec2/tasks/main.yml" file. But if you want to use your own created VPC, then you need to put that extra option here.

Finally it's time to Deploy this whole setup, For that run - `ansible-playbook setup.yml --ask-vault-pass` & provide your vault (cred.yml) password & see the magic of Ansible.

hadoop-hdfs-mr-multi-node-cluster-aws-ansible's People

Contributors

Stargazers

Watchers

Forkers

mohamedafrid-lab akankshas77

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.