Need Help?: Issues Tracking | [email protected]
Contributing: Contribution Guide
License: Apache 2.0
Ansible Haddop is a playbook that help you to deploy a new Hadoop (CDH4) and Spark cluster on a CentOS 6 or RHEL 6 environment using Ansible.
The playbooks can:
- Deploy a fully functional Hadoop cluster with High Availability (HA) and automatic failover.
- Deploy additional nodes to scale the cluster (datanodes and spark workers)
- Hadoop CDH4
- Zookeeper
- Journalnode
- HDFS
- Apache Spark
- Elasticsearch --OPTIONAL--
- Ganglia --OPTIONAL--
- Ansible 1.6+
- CentOS 6.5+ or RedHat servers
edit the files:
hosts
: Set the hosts and servicesgroup_vars/all
: to change/add more configuration parameters (ex: hdfs path, spark port etcetc)
site_name: mycluster # The name of your cluster
with_elasticsearch: True/False # If true, deploy an Elasticsearch cluster.
update_iptables: True/False # If True, change iptables file to add ip_range.
update_hosts: True/False # If True, set the hosts file to every host in the cluster.
install_oracle_jdk: True/False # If True, download and Install Oracle JDK from oracle server.
with_ganglia: True/False # If True, deploy ganglia to monitor your cluster health.
spark:
version: 1.2.1 # Set the version of Spark you want to deploy
elasticsearch:
version: 1.4.1 # Set the version of Elasticsearch you want to deploy
# (**only if with_elasticsearch is True**)
To run with Ansible:
./deploy
To e.g. just install ZooKeeper, add the zookeeper
tag as argument.
available tags:
- elasticsearch
- hadoop
- ntp
- zookeeper
- slaves
- spark
- ganglia
./deploy zookeeper
Dont forget to open the port of the hosts if you want to access to your cluster remotely.
- HDFS : active: master:50070 - active
- HDFS : stand by: master2:50070 - standby
- Spark Master : active: master:8080
- Spark Master2 : stand by: master2:8080
- Elasticsearch: eshost:9200
- Ganglia: monitor:80
restart all services run
./restart
If you want just restart some services run:
./restart serviceName
List of service that can be restarted
- zookeepers
- journalnodes
- elasticsearch
- namenodes
- datanodes
- sparkmasters
- sparckworkers