Git Product home page Git Product logo

chef-bach's Introduction

Overview

This is a set of Chef cookbooks to bring up Hadoop and Kafka clusters. In addition, there are a number of additional services provided with these cookbooks - such as DNS, metrics, and monitoring - see below for a partial list of services provided by these cookbooks.

Hadoop

Each Hadoop head node is Hadoop component specific. The roles are intended to be run so that they can be layered in a highly-available manner. E.g. multiple BCPC-Hadoop-Head-* machines will correctly build a MySQL, Zookeeper, HDFS JournalNode, etc. cluster and deploy the named component as well. Further, for components which support HA, the intention is one can simply add the role to multiple machines and the right thing will be done to support HA (except in the case of HDFS).

To setup HDFS HA, please follow the following model from your Bootstrap VM:

  • Install the cluster once with a non-HA HDFS:
    • with a BCPC-Hadoop-Head-Namenode-NoHA role
    • with the following node variable [:bcpc][:hadoop][:hdfs][:HA] = false
    • ensure at least three machines are installed with BCPC-Hadoop-Head roles
    • ensure at least one machine is a datanode
    • run cluster-assign-roles.sh <Environment> Hadoop successfully
  • Re-configure the cluster with an HA HDFS:
    • change the BCPC-Hadoop-Head-Namenode-NoHA machine's role to BCPC-Hadoop-Head-Namenode
    • set the following node variable [:bcpc][:hadoop][:hdfs][:HA] = true on all nodes (e.g. in the environment)
    • run cluster-assign-roles.sh <Environment> Hadoop successfully

Setup

These recipes are currently intended for building a BACH cluster on top of Ubuntu 12.04 servers using Chef 11. When setting this up in VMs, be sure to add a few dedicated disks (for HDFS data nodes) aside from boot volume. In addition, it's expected that you have three separate NICs per machine, with the following as defaults (and recommendations for VM settings):

  • eth0 - management traffic (host-only NIC in VM)
  • eth1 - reserved traffic (host-only NIC in VM)
  • eth2 - compute traffic (host-only NIC in VM)

You should look at the various settings in cookbooks/bcpc/attributes/default.rb and tweak accordingly for your setup (by adding them to an environment file).

Cluster Bootstrap

The provided scripts which sets up a Chef and Cobbler server via Vagrant permits imaging of the cluster via PXE.

Once the Chef server is set up, you can bootstrap any number of nodes to get them registered with the Chef server for your environment - see the next section for enrolling the nodes.

Make a cluster

To build a new BACH cluster, you have to start with building head nodes first. (This assumes that you have already completed the bootstrap process and have a Chef server available.) Since the recipes will automatically generate all passwords and keys for this new cluster, the nodes must temporarily become admin's in the chef server, so that the recipes can write the generated info to a databag. The databag will be called configs and the databag item will be the same name as the environment (Test-Laptop in this example). You only need to leave the node as an admin for the first chef-client run. You can also manually create the databag & item (as per the example in data_bags/configs/Example.json) and manually upload it if you'd rather not bother with the whole admin thing for the first run.

To assign machines a role, one can update the cluster.txt file and ensure all necessary information is provided as per cluster-readme.txt.

Using the script tests/automated_install.sh, one can run through what is the expected "happy-path" install for a single machine running (by default) four VirtualBox VMs. This simple install supports only changing DNS, proxy and VM resource settings. (This is the basis of our automated build tests.)

Other Deployment Flavors

In addition to the "happy-path" integration test using automated_install.sh there are ways to deploy on OpenStack or to bare-metal hosts. Lastly, for those using test-kitchen there are various test-kitchen suites one can run as well.

A view of the various full-cluster deployment types: Flow Chart of BACH Deployment Flavors -- VBox, OpenStack, Vagrant Bootstrap/Baremetal, Baremetal Only

Using a BACH cluster

Once the nodes are configured and bootstrapped, BACH services will be accessible via the floating IP. (For the Test-Laptop environment, it is 10.0.100.5.)

For example, you can go to https://10.0.100.5:8888 for the Graphite web interface. To find the automatically-generated service credentials, look in the data bag for your environment.

ubuntu@bcpc-bootstrap:~$ knife data bag show configs Test-Laptop | grep mysql-root-password
mysql-root-password:       abcdefgh

For example, to check on HDFS:

ubuntu@bcpc-vm1:~$ HADOOP_USER_NAME=hdfs hdfs dfsadmin -report
Configured Capacity: 40781217792 (37.98 GB)
Present Capacity: 40114298221 (37.36 GB)
DFS Remaining: 39727463789 (37.00 GB)
DFS Used: 386834432 (368.91 MB)
DFS Used%: 0.96%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Live datanodes (1):

Name: 192.168.100.13:50010 (f-bcpc-vm3.bcpc.example.com)
Hostname: bcpc-vm3.bcpc.example.com
Decommission Status : Normal
Configured Capacity: 40781217792 (37.98 GB)
DFS Used: 386834432 (368.91 MB)
Non DFS Used: 666919571 (636.02 MB)
DFS Remaining: 39727463789 (37.00 GB)
DFS Used%: 0.95%
DFS Remaining%: 97.42%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 12
Last contact: Fri Aug 14 21:08:23 EDT 2015

BACH Services

BACH currently relies upon a number of open-source packages:

Thanks to all of these communities for producing this software!

Contributing

To propose changes to Chef-BACH, please fork the GitHub repo and issue a pull request with your proposed change. We are endeavoring to follow the following standards:

Libraries:

All Chef Code:

Static Code Analysis:

Otherwise generally follow bbatsov/ruby-style-guide

The GitHub workflow we follow is captured below: Flow Chart of GitHub Workflow from Issue or PR created to PR in progress and Code Review to Issue or PR resolved

chef-bach's People

Contributors

cbaenziger avatar amithkanand avatar pchandra avatar bijugs avatar mihalis68 avatar jerenkrantz avatar http-418 avatar kiiranh avatar pu239ppy avatar caiush avatar ekund avatar drraywang avatar chrislongo avatar rrichardson avatar indexzero avatar dspadea avatar amscanne avatar leochen4891 avatar dbahir avatar ericvw avatar sjain1991 avatar apaprocki avatar asears avatar aceofsteel avatar hkothari avatar kpfleming avatar kamidzi avatar

Watchers

James Cloos avatar Wayne avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.