Git Product home page Git Product logo

awsscripts's Introduction

Scripts for Cloudera deployments in AWS

This github consists of the following artifacts to make provisioning of AWS resources easy for the Cloudera AWS reference architecture:

  1. Cloudformation templates for network context setup (VPC, subnet etc)
  2. Python scripts for instance provisioning

The Cloudformation templates are used to setup the network context, which consists of the following things:

  1. VPC
  2. Subnets
  3. DNS configuration
  4. Gateways
  5. Security groups

Once the network context is created, the setup.py script can be used to provision the instances. To have a fully functioning cluster, the following steps need to be performed after the network context is created:

  1. Create master and slave instances
  2. Prepare the instances (disable SElinux, create partitions, resize root volume etc)
  3. Install Cloudera Manager Server on master instance
  4. Install Cloudera Manager Agents on slave instances and master instances
  5. Deploy Hadoop using CM

The current implementation of the script performs steps 1 and 2. The installation of CM and CDH has to be done manually at this point. Once the steps 1 and 2 are completed, a /tmp/init-complete file is created. After this, you have to restart the instances for some of the configurations to take place. This restart can take some time because the root volume resize takes effect during the restart.

Sample Cloudformation templates are in the cfntemplates folder and the provisioning scripts are in the scripts folder. You have to write a config file as well. A sample config file is available in the scripts folder.

Setup

To use these scripts, the following setup steps need to be done.

  1. Install Boto

     pip install boto
    
  2. Set the following environment variables with your AWS credentials. Put them in your bashrc.

     export AWS_ACCESS_KEY=<my_access_key>
     export AWS_SECRET_KEY=<my_secret_key>
    
  3. Setup the configs for the scripts. These are the in the scripts/config file.

Example usage

$ ./setup.py -h
usage: setup.py [-h] -c CONFIG action

positional arguments:
  action                Possible options: create_network_context,
                        read_network_context, create_slaves, create_masters,
                        list_slaves, list_masters, create_db

optional arguments:
  -h, --help            show this help message and exit
  -c CONFIG, --config CONFIG
                        Path of config file

$ ./setup.py create_network_context

$ ./setup.py create_slaves

$ ./setup.py create_masters

$ ./setup.py list_slaves

$ ./setup.py list_masters

Steps to spin up a cluster

  1. Set up configs

  2. Create network context (VPC, subnets, security groups)

     ./setup.py -c <config_file> create_network_context
    
  3. Create master instance

     ./setup.py -c <config_file> create_masters
    
  4. Create slave instances and point them to the master instance, where the master instance's IP address is the output of the create_masters command

     ./setup.py -c <config_file> -s <master_ip_address> create_slaves
    

Once the instances are provisioned, the setup script will be executed on them, which will create the mount points, create partitions, install Cloudera Manager server and agents, resize the root volumes and once all steps are done, reboot the instances. When the instances come back up, you'll be able to access the Cloudera Manager UI at http://master_public_ip_address:7180

awsscripts's People

Contributors

amansk avatar

Watchers

Shravan Pabba avatar James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.