Learning BOSH - Geode Cluster Release

This is just a personal repository, used to acquire some basic BOSH knowledge. It should not be used as an official reference or as the source of any truth, it's just something I'm using for learning and testing purposes, it could be used for the same goal by others as well and that's why I'm making it public. Keep in mind that, leaving aside the quotes extracted from the official documentation links, there could be errors and/or mistakes within my line of thouhgt and how I implement the release itself.

The steps, based on Creating a Release, show how to create a BOSH release from scratch and how to deploy it to a local BOSH-Lite environment for further testing. The main idea is to have a small Apache Geode Cluster up and running. We are using BOSH-Lite on VirtualBox for the sake of simplicity, but the release could be easily deployed to any of the supported IaaS providers since, after all, that's one of the magical things that BOSH can do for us.

For a complete reference about BOSH terminology, please refer to the Official BOSH Documentation.

BOSH-Lite

The full (and official) set of instructions to install BOSH-Lite on Virtual Box can be found in the bosh-deployment repository.

First things first: below is the folder structure we're going to use going forward, it's not mandatory neither required, just a personal preference to keep things organized, I'll refer to it quite often within this guide.

workspace
├── config
│   └── (...)
├── downloads
│   └── (...)
└── git
    ├── (...)
    └── (...)

When you are getting started with something new, sometimes it is useful to have some automated script to delete everything and start from scratch (specially if you don't want to memorize all of the steps required), so I've created a simple one do so:

#!/bin/bash
set -x

WORKSPACE_DIRECTORY=/workspace
sudo route add -net 10.244.0.0/16 192.168.50.6
BOSH_DEPLOYMENT_ROOT_DIRECTORY=$WORKSPACE_DIRECTORY/git/bosh-deployment

# Delete Environment
bosh delete-env \
  $BOSH_DEPLOYMENT_ROOT_DIRECTORY/bosh.yml \
  -o $BOSH_DEPLOYMENT_ROOT_DIRECTORY/virtualbox/cpi.yml \
  -o $BOSH_DEPLOYMENT_ROOT_DIRECTORY/virtualbox/outbound-network.yml \
  -o $BOSH_DEPLOYMENT_ROOT_DIRECTORY/bosh-lite.yml \
  -o $BOSH_DEPLOYMENT_ROOT_DIRECTORY/bosh-lite-runc.yml \
  -o $BOSH_DEPLOYMENT_ROOT_DIRECTORY/jumpbox-user.yml \
  --state $WORKSPACE_DIRECTORY/config/state.json \
  --vars-store $WORKSPACE_DIRECTORY/config/creds.yml \
  -v internal_ip=192.168.50.6 \
  -v internal_gw=192.168.50.1 \
  -v internal_cidr=192.168.50.0/24 \
  -v outbound_network_name=NatNetwork \
  -v director_name="Bosh Lite Director"

# Clean Previous Settings
rm -Rf ~/.bosh
rm -Rf ~/VirtualBox\ VMs/
ssh-keygen -R 192.168.50.6
rm -Rf $BOSH_DEPLOYMENT_ROOT_DIRECTORY
rm -f $WORKSPACE_DIRECTORY/config/creds.yml
rm -f $WORKSPACE_DIRECTORY/config/state.json

# Install Bosh Lite
git clone https://github.com/cloudfoundry/bosh-deployment $BOSH_DEPLOYMENT_ROOT_DIRECTORY

bosh create-env \
  $BOSH_DEPLOYMENT_ROOT_DIRECTORY/bosh.yml \
  -o $BOSH_DEPLOYMENT_ROOT_DIRECTORY/virtualbox/cpi.yml \
  -o $BOSH_DEPLOYMENT_ROOT_DIRECTORY/virtualbox/outbound-network.yml \
  -o $BOSH_DEPLOYMENT_ROOT_DIRECTORY/bosh-lite.yml \
  -o $BOSH_DEPLOYMENT_ROOT_DIRECTORY/bosh-lite-runc.yml \
  -o $BOSH_DEPLOYMENT_ROOT_DIRECTORY/jumpbox-user.yml \
  --state $WORKSPACE_DIRECTORY/config/state.json \
  --vars-store $WORKSPACE_DIRECTORY/config/creds.yml \
  -v internal_ip=192.168.50.6 \
  -v internal_gw=192.168.50.1 \
  -v internal_cidr=192.168.50.0/24 \
  -v outbound_network_name=NatNetwork \
  -v director_name="Bosh Lite Director"

# Alias and Certs
bosh int $WORKSPACE_DIRECTORY/config/creds.yml --path /jumpbox_ssh/private_key > ~/.ssh/bosh-virtualbox.key
chmod 600 ~/.ssh/bosh-virtualbox.key
bosh -e 192.168.50.6 --ca-cert <(bosh int $WORKSPACE_DIRECTORY/config/creds.yml --path /director_ssl/ca) alias-env bosh-lite

# Upload Stemcell & Cloud Config
export BOSH_CLIENT=admin
export BOSH_CLIENT_SECRET=`bosh int $WORKSPACE_DIRECTORY/config/creds.yml --path /admin_password`
bosh -e bosh-lite update-cloud-config $BOSH_DEPLOYMENT_ROOT_DIRECTORY/warden/cloud-config.yml
bosh -e bosh-lite upload-stemcell https://bosh.io/d/stemcells/bosh-warden-boshlite-ubuntu-trusty-go_agent

Remember that you need to pause your Virtual Box VM whenever your machine goes to sleep or gets rebooted, otherwise the VM will be halted by the OS and you will need to install your software again from scratch.

The following script could be used to do some management on the VM without manually accesing the VirtualBox UI:

#!/bin/bash
set -x

WORKSPACE_DIRECTORY=/workspace
BOSH_LITE_VM_ID=$(cat $WORKSPACE_DIRECTORY/config/state.json | python -c "import json,sys;obj=json.load(sys.stdin);print obj['current_vm_cid'];")

case $1 in
  ssh)
    ssh -i ~/.ssh/bosh-virtualbox.key [email protected]
    ;;
  pause)
    echo "Pausing Bosh_Lite VM with ID $BOSH_LITE_VM_ID..."
    VBoxManage controlvm $BOSH_LITE_VM_ID savestate
    echo "Pausing Bosh_Lite VM with ID $BOSH_LITE_VM_ID... Done!."
    ;;
  resume)
    echo "Resuming Bosh_Lite VM with ID $BOSH_LITE_VM_ID..."
    VBoxManage startvm $BOSH_LITE_VM_ID --type=headless
    echo "Resuming Bosh_Lite VM with ID $BOSH_LITE_VM_ID... Done!"
    ;;
  *)
    echo "Usage: bosh_vm {ssh|pause|resume}" ;;
esac

Getting Started

We have a BOSH-Lite environment installed on our local Virtual Box that we can use to play now, so it's time to get started.

The first step when creating a release is to initialize the release directory itself, which can be achieved by using the BOSH CLI init-release command: $ bosh init-release --git --dir=geode-bosh-release. Once that's done, we should have the following directory structure and we're ready to start with the actual work:

workspace
├── config
│   ├── creds.yml
│   └── state.json
├── downloads
└── git
    ├── bosh-deployment
    │   └── (...)
    └── geode-bosh-release
        ├── config
        │   ├── blobs.yml
        │   └── final.yml
        ├── jobs
        ├── packages
        └── src

With the release root directory created, we now want to build our dependencies graph and start defining which jobs and packages we need. For this particular example we know that we'll have a small Apache Geode Cluster, composed of locators and servers. Apache Geode is written in Java, so we also know that we need to have Oracle Java installed on every single VM for the member to work properly. We are also going to build Apache Geode from its source code instead of using the available binaries, and to be able to do so we'll need to have Gradle installed on the compillation VMs as well.

Once the deploy finishes we'll want to verify that the cluster and its components are up and running, so we are going to configure a BOSH errand and use the Apache Geode GFSH Tool to implemente some smoke tests on the cluster.

Below are some BOSH concepts, extracted from the official documentation:

A release job represents a specific chunk of work that the release performs. Jobs describe pieces of the service or application you are releasing.

A package is a component of a BOSH release that contains a packaging spec file and a packaging script. Each package also references source code or pre-compiled software that you store in the src directory of a BOSH release directory. Packages provide source code and dependencies to jobs.

An errand is a short-lived job that an operator can run multiple times after the deploy finishes. Examples: smoke tests, comprehensive test suites, CF service broker binding and unbinding.

Putting it all together, we have three jobs and three packages:

Packages

Package	Name	Compile Dependencies	Runtime Dependencies
Java	java	NONE	NONE
Gradle	gradle	NONE	java
Apache Geode	geode	gradle, java	java

Jobs

Job	Name	Runtime Dependencies
Locator	locator	java, geode
Server	server	java, geode
Smoke Tests	smoke-tests	java, geode

Building the Packages

A package is a component of a BOSH release that contains a packaging spec file and a packaging script. Each package also references source code or pre-compiled software that you store in the src directory of a BOSH release.

BOSH comes with a handy command to create the default skeleton for each package, generate-package, so we should execute it for each one of our packages under the release directory:

$ bosh generate-package java
$ bosh generate-package gradle
$ bosh generate-package geode
$ tree
.
├── config
│   ├── blobs.yml
│   └── final.yml
├── jobs
├── packages
│   ├── geode
│   │   ├── packaging
│   │   └── spec
│   ├── gradle
│   │   ├── packaging
│   │   └── spec
│   └── java
│       ├── packaging
│       └── spec
└── src

We're building a dev release for testing purposes only so we can safely use a local blobstore to store our blobs instead of an external one, but keep in mind that this approach doesn't work when creating a final release, for that particular scenario we would need to upload the blobs to an actual S3 blobstore.

That said, we need to download the required files that our packages are going to use into the downloads folder, configure our local blobstore, add the blobs and inform BOSH where the blobs are through the BOSH CLI add-blob command. For this release we're going to use apache-geode-src-1.1.1.tar.gz, gradle-3.5-bin.zip and jdk-8u131-linux-x64.tar.gz.

tree -L 2
.
├── config
│   ├── creds.yml
│   ├── custom-cloud-config.yml
│   └── state.json
├── downloads
│   ├── apache-geode-src-1.1.1.tar.gz
│   ├── gradle-3.5-bin.zip
│   └── jdk-8u131-linux-x64.tar.gz
└── git
    ├── bosh-deployment
    └── geode-bosh-release

The blobstore can be configured through the final.yml file, and to inform BOSH where the blobs are located we must issue the BOSH CLI add-blob command:

config/final.yml

---
blobstore:
  provider: local
  options:
    blobstore_path: /tmp/bosh/blobs
name: geode-bosh

$ export WORKSPACE_DIRECTORY=/workspace
$ bosh add-blob $WORKSPACE_DIRECTORY/downloads/gradle-3.5-bin.zip gradle/gradle-3.5-bin.zip
$ bosh add-blob $WORKSPACE_DIRECTORY/downloads/jdk-8u131-linux-x64.tar.gz java/jdk-8u131-linux-x64.tar.gz
$ bosh add-blob $WORKSPACE_DIRECTORY/downloads/apache-geode-src-1.1.1.tar.gz geode/apache-geode-src-1.1.1.tar.gz

Now that the blobs have been added to the blobstore, it's time to define the content of each package through the spec file, along with the instructions to install it using the packaging script. The first two (Java and Gradle) are straightforward so we'll start with them, the last one requires some extra work.

Java

We need to update the spec file with the relevant information and implement the packaging script, which will be used by BOSH to install the package on the VMs. The implementation just needs to extract the contents of the Java binary distribution into the installation folder assigned by BOSH, which can be referenced from the script through the BOSH_INSTALL_TARGET variable.

packages/java/spec

---
name: java
dependencies: []
files:
- java/jdk-8u131-linux-x64.tar.gz # http://download.oracle.com/otn-pub/java/jdk/8u131-b11/d54c1d3a095b4ff2b6607d096fa80163/jdk-8u131-linux-x64.tar.gz

packages/java/packaging

set -e -x
echo "Installing Java..."
tar xzf java/jdk-8u131-linux-x64.tar.gz -C ${BOSH_INSTALL_TARGET} --strip-components=1
echo "Installing Java... Done!"

Gradle

Again, and like the previous package, we have to fill the details within the spec file and implement the packaging script to uncrompress the content of the binary file into the assigned installation folder: BOSH_INSTALL_TARGET.

packages/gradle/spec

---
name: gradle
dependencies: []
files: []
- gradle/gradle-3.5-bin.zip # https://services.gradle.org/distributions/gradle-3.5-bin.zip

packages/gradle/packaging

set -e -x
echo "Installing Gradle..."
tar xzf gradle/gradle-3.5-bin.zip -C ${BOSH_INSTALL_TARGET} --strip-components=1
echo "Installing Gradle... Done!"

Apache Geode

Binary downloads for Apache Geode are only provided for the convenience of the users and are not official releases, so we need to actually download and build the source code from scratch. It would be way easier to just download and uncompress the binary distribution like we did before for the other two packages, of course, but doing so would prevent us from learning how to actually write some custom scripts to compile and deploy packages instead of just uncompressing already compiled code.

Considering that Apache Geode is entirely written in Java and that it also provides a clear way to build the framework from the source code, this could be also used as a great example when dealing with our own applications as well. In the normal case where the package is an actual application written by us, however, the source code would be placed into the src folder instead of treat it as a blob (we're not doing it right here because the source code is huge).

The blob with the source code is already downloaded into our local blobstore, so we just need to fill the details within the spec file (note that this package has two compile time dependencies) and, just like before, implement the packaging script. The implementation for this script will be slighty different, we need to uncrompress the source code into a temporary folder, build Apache Geode and move the result to the folder assigned by BOSH: BOSH_INSTALL_TARGET.

packages/geode/spec

---
name: geode
dependencies:
- java
- gradle
files:
- geode/apache-geode-src-1.1.1.tar.gz #http://ftp.heanet.ie/mirrors/www.apache.org/dist/geode/1.1.1/apache-geode-src-1.1.1.tar.gz

packages/geode/packaging

set -e -x
echo "Installing Apache Geode..."
export JAVA_HOME=/var/vcap/packages/java
export PATH=$PATH:$JAVA_HOME/bin:/var/vcap/packages/gradle/bin
tar xzf geode/apache-geode-src-1.1.1.tar.gz
pushd apache-geode-src-1.1.1
 ./gradlew build -Dskip.tests=true
 cp -a geode-assembly/build/install/apache-geode/* ${BOSH_INSTALL_TARGET}
popd
echo "Installing Apache Geode... Done!"

Implementing the Jobs

A job represents a specific chunk of work that the release performs. It typically includes metadata that specifies available configuration options, ERB configuration files, a Monit file that describes how to start, stop and monitor processes, start and stop scripts for each process and additional hook scripts.

We previously identified two main jobs within our release (locator and server), and one special type of job (errand) that will run after the deploy finishes to verify that our primary jobs are working as expected in terms of functionality (Monit already takes care of verifying the job is up and running).

The Locator is a Geode process that tells new, connecting members where running members are located and provides load balancing for server use. The Server, on the other hand, is a process that runs as a long-lived, configurable member of a client/server system; it is used primarily for hosting long-lived data regions and running standard Geode processes.

Considering that the focus is to show how to create a BOSH release, and not how to configure and use Apache Geode, we won't go deep into the details about the configuration and functionatilies offered by this framework. Our cluster will be up and accepting client connections once we're done anyway, it'll use Geode PDX Serialization to avoid deploying custom model jars to the servers, will be persistent (data will remain there after restarts) and it'll have two Data Regions: one partitioned and one replicated.

That said, it's time for us to create and configure the three jobs that are part of our release: the Locator, the Cache Server and the Smoke-Tests. BOSH comes with a handy command to create the default skeleton for each job, generate-job, so we should execute it for each one of our jobs under the release directory:

$ bosh generate-job locator
$ bosh generate-job server
$ bosh generate-job smoke-tests
$ tree
.
├── config
│   ├── blobs.yml
│   └── final.yml
├── jobs
│   ├── locator
│   │   ├── monit
│   │   ├── spec
│   │   └── templates
│   └── server
│   │   ├── monit
│   │   ├── spec
│   │   └── templates
│   └── smoke-tests
│       ├── monit
│       ├── spec
│       └── templates
├── packages
│   ├── geode
│   │   ├── packaging
│   │   └── spec
│   ├── gradle
│   │   ├── packaging
│   │   └── spec
│   └── java
│       ├── packaging
│       └── spec
└── src

By default BOSH creates two files and one directory under each job's root folder:

The spec file defines job metadata (like properties, dependencies, and files used by the job).
The monit file is used to monitor, start and stop the job itself through Monit. There are several advanced options to monitor processes using monit, for the sake of simplicity we're going to use the most basic one.
The templates directory, which will contain several files that can be used to configure, control, and manage the job.

Locator

The locator is a Geode process that tells new, connecting members where running members are located and provides load balancing for server use.

We need to tell BOSH how to actively monitor the locator process through the monit file and how to manage its state through the ctl script. The locator has tons of configuration options that can be used to modify its behavior, but for the sake of simplicity we'll just set specific values for the most common ones and allow the operator to manually configure some of them using the deployment manifest, which passes the instance-specific information to the VM through the agent.

Sometimes it's useful to do some work before or after the job is started, and for these scenarios where we need to hook custom logic we can make use of the job-lifecycle scripts. Having that in mind, we'll define our own post-start script, which allows the job to execute any additional commands against a machine and/or persistent data before considering the release job as successfully started.

jobs/locator/spec

The Locator process requires Java and Apache Geode to be installed on the VM so we use the dependencies section to let BOSH know about it. The templates section specifies which ERB templates the job has, along with the path where the resulting file will be place on the destination VM.

It's recommend that every member belonging to a Geode cluster knows about all of the available locators within that cluster, so we need to pass the list of configured locators when starting every single Geode process. To be able to do this in a dynamic, ordered and manageable fashion we'll use BOSH links. To instruct BOSH that the job needs to import some information we use the consumes statement, and to let BOSH know that the job is actually exporting some information we use provides. BOSH links automatically export some instance methods and properties by default, to also export our own defined properties we must declare them through the properties statement under the provides section.

As previously stated, there are tons of properties that can be used to modify the locator behavior and configuration, for the sake of simplicity we're going to allow the operator to modify only some of them through the properties section, and we'll use defaults for the rest. For a complete list of available properties and what they do please refer to the Apache Geode Reference.

---
name: locator

templates:
  ctl.erb: bin/ctl
  post-start.erb: bin/post-start
  locator.properties.erb: config/locator.properties

packages:
- java
- geode

consumes:
- name: locator
  type: locator

provides:
- name: locator
  type: locator
  properties:
  - http.port
  - locator.port

properties:
  jmx.port:
    description: The port at which the JMX Manager will listen to for client connections.
    default: 1099
  http.port:
    description: Port at which the embedded HTTP service listens on.
    default: 7070
  locator.port:
    description: Port the locator will listen on.
    default: 10334
  log.level:
    description: Log level that will be used by the locator.
    default: config

jobs/locator/monit

The monit file, in its most basic form, specifies the process ID (pid) file for the job and defines wich commands should be ran to start and stop the process, along with the user required to run the job.

check process locator
  with pidfile /var/vcap/store/locator/vf.gf.locator.pid
  start program "/var/vcap/jobs/locator/bin/ctl start"
  stop program "/var/vcap/jobs/locator/bin/ctl stop"
  group vcap

jobs/locator/templates/ctl.erb

This template is used by Monit to start and stop the process on the VM when needed. BOSH defines several standard folders and options to implement it, but the actual implementation is strongly tied to the job itself, along with the commands and packages required to manage it.

Keep in mind that each template file is evaluated with ERB before being sent to each instance. As an example, this template will be processed by ERB and placed on the VM under the /var/vcap/jobs/locator/bin/ folder (as specified by our spec file). Templates have access to merged job property values, built by merging default property values and operator specified property values in the deployment manifest, which gives us a huge power and configuration dynamism. For practical purposes, and as a simple example, this allow us to use the IP assigned to the VM through <%= "#{spec.id}" %>, custom job defined properties through <%= p('propertyName') %>, and several other use cases, the possibilities are endless.

Remember when I said that every single Geode process requires the full list of locators at the start?, well, it is the perfect usage of BOSH links links + ERB templates.

#!/bin/bash
set -e -x

WORK_DIR=/var/vcap/store/locator
LOG_DIR=/var/vcap/sys/log/locator
CONF_DIR=/var/vcap/jobs/locator/config
LOCATOR_NAME=locator_<%= "#{spec.id}" %>
LOCATOR_BIND_ADDRESS=<%= "#{spec.address}" %>

export JAVA_HOME=/var/vcap/packages/java
export GEODE_HOME=/var/vcap/packages/geode
export PATH=$PATH:$JAVA_HOME/bin:$GEODE_HOME/bin

mkdir -p $LOG_DIR $WORK_DIR
touch $LOG_DIR/ctl.std{out,err}.log
touch $LOG_DIR/post-start.std{out,err}.log
chown -R vcap:vcap $LOG_DIR $WORK_DIR

exec > >(tee --append "$LOG_DIR"/ctl.stdout.log )
exec 2> >(tee --append "$LOG_DIR"/ctl.stderr.log)

case $1 in

  start)
    echo [`date '+%F %T'`]: "Starting locator $LOCATOR_NAME[<%= p('locator.port') %>]..."

    gfsh start locator \
      --force=true --connect=false \
      --log-level=<%= p('log.level') %> \
      --enable-cluster-configuration=true \
      --initial-heap=256m --max-heap=256m \
      --dir=$WORK_DIR --name=$LOCATOR_NAME \
      --properties-file=$CONF_DIR/locator.properties \
      --port=<%= p('locator.port') %> --bind-address=$LOCATOR_BIND_ADDRESS --mcast-port=0 \
      --locators=<%= link('locator').instances.map { |l| "#{l.address}[#{p('locator.port')}]"}.join(",") %> \
      --J=-Dgemfire.http-service-port=<%= p('http.port') %> --J=-Dgemfire.http-service-bind-address=$LOCATOR_BIND_ADDRESS \
      --J=-Dgemfire.jmx-manager=true --J=-Dgemfire.jmx-manager-start=true --J=-Dgemfire.jmx-manager-port=<%= p('jmx.port') %> --J=-Djava.rmi.server.hostname=$LOCATOR_BIND_ADDRESS

    echo [`date '+%F %T'`]: "Starting locator $LOCATOR_NAME[<%= p('locator.port') %>]... Done!"
    ;;

  stop)
    echo [`date '+%F %T'`]: "Stopping locator $LOCATOR_NAME[<%= p('locator.port') %>]..."
    gfsh stop locator --dir=$WORK_DIR
    echo [`date '+%F %T'`]: "Stopping locator $LOCATOR_NAME[<%= p('locator.port') %>]... Done!"
    ;;

  *)
    echo "Usage: ctl {start|stop}" ;;
esac

jobs/locator/templates/locator.properties.erb

This file is used to further configure the Geode Locator, empty in our release, but left within it anyway as an easy way to configure the internals of the Locator without requiring a modification to the ctl script or the spec file.

### Geode Properties using default values ###

jobs/locator/templates/post-start.erb

Another ERB template that will be called, according to the job-lifecycle execution order, after Monit starts our Locator job. The content shouldn't take much of our attention since it's related mainly to Apache Geode and not to BOSH itself. It basically verifies that the Locator is receiving connections by using the GFSH Tool, and that the Pulse Web Application running embedded into the Locator is also accepting requests.

#!/bin/bash
set -e

export JAVA_HOME=/var/vcap/packages/java
export GEODE_HOME=/var/vcap/packages/geode
export PATH=$PATH:$JAVA_HOME/bin:$GEODE_HOME/bin

RETRIES=10
SLEEP_TIME=5
LOCATOR_NAME=locator_<%= "#{spec.id}" %>
LOCATOR_BIND_ADDRESS=<%= "#{spec.address}" %>
LOCATOR_CONNECTION_STRING=$LOCATOR_BIND_ADDRESS[<%= p('locator.port') %>]
PULSE_WEB_APP_CONNECTION_STRING=http://$LOCATOR_BIND_ADDRESS:<%= p('http.port') %>/pulse/login.html

echo [`date '+%F %T'`]: "Verifying locator $LOCATOR_CONNECTION_STRING..."

for i in $(seq 1 "$RETRIES"); do

    if gfsh -e "connect --locator=$LOCATOR_CONNECTION_STRING"; then
      sleep "$SLEEP_TIME"
      PULSE_STATUS=$(curl -s --head -w %{http_code} $PULSE_WEB_APP_CONNECTION_STRING -o /dev/null)

      if [[ $PULSE_STATUS == 200 ]]; then
        echo [`date '+%F %T'`]: "Verifying locator $LOCATOR_CONNECTION_STRING... Done!."
        exit 0
      else
        echo [`date '+%F %T'`]: "Verifying locator $LOCATOR_CONNECTION_STRING... Failure (Pulse)."
        exit 1
      fi
    else
      sleep "$SLEEP_TIME"
    fi
done

echo [`date '+%F %T'`]: "Verifying locator $LOCATOR_CONNECTION_STRING... Failure (Locator)."
exit 1

Server

The Geode server is a process that runs as a long-lived, configurable member of a client/server system.

Just like we did with the locator, we need to tell BOSH how to monitor the process through the monit file and how to manage its state through the ctl script. The server also has tons of configuration options that can be used to modify its behavior but, again and for the sake of simplicity, we'll use just the most common ones and allow the operator to manually configure some of these properties using the deployment manifest. We'll also define a post-start script to execute some sanity checks before the job can be considered as up and running.

jobs/server/spec

The Server process requires Java and Apache Geode to be installed on the VM so we use the dependencies section to let BOSH know about it.

As said before, we need to know the full list of configured locators within the cluster when starting the server, so we'll use BOSH links again to get this information from the deployment manifest. The list of properties to further configure the server will be set in the server.properties file, empty again to use the default values. The cache.xml file is used to configure several functional aspects of the server (like regions, disk stores, queues, listeners, etc.), the one included is a basic example with two persistent regions, along with the Automatic Reflection-Based PDX Serialization. We also define some properties with default values (heap memory to use, tcp port to listen for client connections, etc) that can be further overridden by the operator when implementing the deployment manifest.

---
name: server

templates:
  ctl.erb: bin/ctl
  post-start.erb: bin/post-start
  cache.xml.erb: config/cache.xml
  server.properties.erb: config/server.properties

packages:
- java
- geode

consumes:
- name: locator
  type: locator

provides:
- name: server
  type: server
  properties:
  - http.port
  - server.port

properties:
  http.port:
    description: Port at which the embedded HTTP service listens on.
    default: 7070
  server.port:
    description: Port the server will listen on.
    default: 40404
  log.level:
    description: Log level that will be used by the server.
    default: config
  newGen.size:
    description: Heap memory that will be assigned to the young generation space (megabyte).
    default: 128
  oldGen.size:
    description: Heap memory that will be assigned to the old generation space (megabyte).
    default: 1024

jobs/server/monit

check process server
  with pidfile /var/vcap/store/server/vf.gf.server.pid
  start program "/var/vcap/jobs/server/bin/ctl start"
  stop program "/var/vcap/jobs/server/bin/ctl stop"
  group vcap

jobs/server/templates/cache.xml.erb

The full details about how to configure this file can be found in the Official Geode User Gudie.

<?xml version="1.0" encoding="UTF-8"?>
<cache
    xmlns="http://geode.apache.org/schema/cache"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://geode.apache.org/schema/cache http://geode.apache.org/schema/cache/cache-1.0.xsd"
    version="1.0">

  <cache-server/>

  <disk-store name="diskStore"/>

  <pdx persistent="true">
    <pdx-serializer>
      <class-name>org.apache.geode.pdx.ReflectionBasedAutoSerializer</class-name>
      <parameter name="classes">
        <string>com.company.model.*</string>
      </parameter>
    </pdx-serializer>
  </pdx>

  <region name="replicatedRegion">
    <region-attributes refid="REPLICATE_PERSISTENT" statistics-enabled="true" disk-store-name="diskStore"/>
  </region>

  <region name="partitionedRegion">
    <region-attributes refid="PARTITION_REDUNDANT_PERSISTENT" statistics-enabled="true" disk-store-name="diskStore">
      <partition-attributes redundant-copies="2"/>
    </region-attributes>
  </region>
</cache>

jobs/server/templates/ctl.erb

This template is used by Monit to start and stop the process on the VM when needed. We use Geode GFSH to start and stop the server, pay special attention to the folders and techniques used to redirect the standard output and error streams, along with the properties extracted from the deployment manifest and the path to the configuration files used.

#!/bin/bash
set -e -x

WORK_DIR=/var/vcap/store/server
LOG_DIR=/var/vcap/sys/log/server
CONF_DIR=/var/vcap/jobs/server/config
SERVER_NAME=server_<%= "#{spec.id}" %>
SERVER_BIND_ADDRESS=<%= "#{spec.address}" %>
HEAP_OPTIONS="--J=-Xmn<%= p('newGen.size') %>m --J=-Xmx<%= p('oldGen.size') %>m --J=-Xms<%= p('oldGen.size') %>m --J=-XX:+AlwaysPreTouch"
JAVA_GC_OPTS="--J=-XX:+UseParNewGC --J=-XX:+UseConcMarkSweepGC --J=-XX:CMSInitiatingOccupancyFraction=70 --J=-XX:+UseCMSInitiatingOccupancyOnly --J=-XX:+DisableExplicitGC --J=-XX:+CMSClassUnloadingEnabled"
JAVA_GC_PRINT_OPTS="--J=-verbose:gc --J=-Xloggc:$SERVER_NAME-gc.log --J=-XX:+PrintGCDateStamps --J=-XX:+PrintGCDetails --J=-XX:+PrintTenuringDistribution --J=-XX:+PrintGCApplicationConcurrentTime --J=-XX:+PrintGCApplicationStoppedTime"

export JAVA_HOME=/var/vcap/packages/java
export GEODE_HOME=/var/vcap/packages/geode
export PATH=$PATH:$JAVA_HOME/bin:$GEODE_HOME/bin

mkdir -p $LOG_DIR $WORK_DIR
touch $LOG_DIR/ctl.std{out,err}.log
touch $LOG_DIR/post-start.std{out,err}.log
chown -R vcap:vcap $LOG_DIR $WORK_DIR

exec > >(tee --append "$LOG_DIR"/ctl.stdout.log )
exec 2> >(tee --append "$LOG_DIR"/ctl.stderr.log)

case $1 in

  start)
    echo [`date '+%F %T'`]: "Starting server $SERVER_NAME[<%= p('server.port') %>]..."

    gfsh start server \
      --force=true \
      --use-cluster-configuration=true \
      --log-level=<%= p('log.level') %> \
      --dir=$WORK_DIR --name=$SERVER_NAME \
      --properties-file=$CONF_DIR/server.properties --cache-xml-file=$CONF_DIR/cache.xml \
      --server-port=<%= p('server.port') %> --server-bind-address=$SERVER_BIND_ADDRESS --mcast-port=0 \
      --start-rest-api=true --http-service-port=<%= p('http.port') %> --http-service-bind-address=$SERVER_BIND_ADDRESS \
      --locators=<%= link('locator').instances.map { |l| "#{l.address}[#{link('locator').p('locator.port')}]"}.join(",") %> \
      --J=-Dgemfire.statistic-sample-rate=1000 --J=-Dgemfire.statistic-sampling-enabled=true --J=-Dgemfire.statistic-archive-file=$WORK_DIR/$SERVER_NAME.gfs $HEAP_OPTIONS $JAVA_GC_OPTS $JAVA_GC_PRINT_OPTS

    echo [`date '+%F %T'`]: "Starting server $SERVER_NAME[<%= p('server.port') %>]... Done!"
    ;;

  stop)
    echo [`date '+%F %T'`]: "Stopping server $SERVER_NAME[<%= p('server.port') %>]..."
    gfsh stop server --dir=$WORK_DIR
    echo [`date '+%F %T'`]: "Stopping server $SERVER_NAME[<%= p('server.port') %>]... Done!"
    ;;

  *)
    echo "Usage: ctl {start|stop}" ;;
esac

jobs/server/templates/post-start.erb

The content shouldn't take much of our attention since it's related mainly to Apache Geode and not to BOSH itself. It basically verifies that the Server is part of the distributed system by using the GFSH tool to check the list of members from the existing locators, and that the Geode REST API running embedded into the Server is also correctly deployed and handling requests.

#!/bin/bash
set -e

export JAVA_HOME=/var/vcap/packages/java
export GEODE_HOME=/var/vcap/packages/geode
export PATH=$PATH:$JAVA_HOME/bin:$GEODE_HOME/bin

SERVER_NAME=server_<%= "#{spec.id}" %>
SERVER_BIND_ADDRESS=<%= "#{spec.address}" %>
REST_API_CONNECTION_STRING=http://$SERVER_BIND_ADDRESS:<%= p('http.port') %>/geode/swagger-ui.html
LOCATORS=<%= link('locator').instances.map { |l| "#{l.address}[#{link('locator').p('locator.port')}]"}.join("\\ ") %>

echo [`date '+%F %T'`]: "Verifying server $SERVER_NAME..."

for locator in $LOCATORS; do
  gfsh -e "connect --locator=$locator" -e "list members" | grep -q $SERVER_NAME
  if [ $? != 0 ]; then
      echo [`date '+%F %T'`]: "Verifying server $SERVER_NAME... Failure!. Locator $locator doesn't see the server as part of the cluster."
      exit 1
  fi
done

REST_API_STATUS=$(curl -s --head -w %{http_code} $REST_API_CONNECTION_STRING -o /dev/null)
if [[ $REST_API_STATUS != 200 ]]; then
  echo [`date '+%F %T'`]: "Verifying locator $LOCATOR_CONNECTION_STRING... Failure! (REST API)."
  exit 1
fi

echo [`date '+%F %T'`]: "Verifying server $SERVER_NAME... Done!"
exit 0

jobs/server/templates/server.properties.erb

This file is used to further configure the Apache Geode Server, it's actually empty in our release, but left within the deployment anyway as an easy way to configure the internals of the Server without requiring a modification to the ctl script or the spec file.

### Geode Properties using default values ###

Smoke Tests

This is an special type of job, a BOSH errand, a short-lived job that an operator can run multiple times after the deploy finishes. It should be executed manually, and we'll be using this functionality to make sure our locators and servers, along with the applications embedded within them, are up and running.

jobs/smoke-tests/spec

The errand will use the GFSH Tool, so it has the same dependencies as the server and locator jobs. We'll use BOSH links to get some connection information from the servers and locators as well.

---
name: smoke-tests

templates:
  errands.erb: bin/run

packages:
- java
- geode

consumes:
- name: locator
  type: locator
- name: server
  type: server

properties: {}

jobs/smoke-tests/monit

This job and won't be monitored by Monit, so we can leave the file empty.

jobs/smoke-tests/templates/errands.erb

This template is used to run the errand on the VM. We use Geode GFSH to connect to the locators and verify the list of connected members, along with curl tool to check that the embedded Web Applications are running.

#!/bin/bash
set -e

export JAVA_HOME=/var/vcap/packages/java
export GEODE_HOME=/var/vcap/packages/geode
export PATH=$PATH:$JAVA_HOME/bin:$GEODE_HOME/bin

verify_membership() {
  LOCATORS=<%= link('locator').instances.map { |l| "#{l.address}[#{link('locator').p('locator.port')}]"}.join("\\ ") %>
  SERVERS_AMOUNT="<%= link('server').instances.size %>"
  LOCATORS_AMOUNT="<%= link('locator').instances.size %>"

  echo [`date '+%F %T'`]: "Verifying cluster membership..."

  for locator in $LOCATORS; do
    cluster_servers=$(gfsh -e "connect --locator=$locator" -e "list members" | grep "^server_"  | wc -l)
    cluster_locators=$(gfsh -e "connect --locator=$locator" -e "list members" | grep "^locator_"  | wc -l)

    if [[ "$cluster_servers" -ne "$SERVERS_AMOUNT" ]]; then
        echo [`date '+%F %T'`]: "Verifying cluster membership... Failure!. Locator $locator doesn't report the expected amount of servers."
        exit 1
    fi

    if [[ "$cluster_locators" -ne "$LOCATORS_AMOUNT" ]]; then
        echo [`date '+%F %T'`]: "Verifying cluster membership... Failure!. Locator $locator doesn't report the expected amount of locators."
        exit 1
    fi
  done

  echo [`date '+%F %T'`]: "Verifying cluster membership... Done!."
}

verify_applications() {
  PULSE_ENDPOINTS=<%= link('locator').instances.map { |l| "#{l.address}:#{link('locator').p('http.port')}"}.join("\\ ") %>
  REST_API_ENDPOINTS=<%= link('server').instances.map { |s| "#{s.address}:#{link('server').p('http.port')}"}.join("\\ ") %>

  echo [`date '+%F %T'`]: "Verifying cluster applications..."

  for pulseEndpoint in $PULSE_ENDPOINTS; do
    PULSE_STATUS=$(curl -s --head -w %{http_code} http://$pulseEndpoint/pulse/login.html -o /dev/null)

    if [[ $PULSE_STATUS != 200 ]]; then
      echo [`date '+%F %T'`]: "Verifying cluster applications... Failure!. Pulse can't be located on $pulseEndpoint."
      exit 1
    fi
  done

  for restEndpoint in $REST_API_ENDPOINTS; do
    REST_API_STATUS=$(curl -s --head -w %{http_code} http://$restEndpoint/geode/swagger-ui.html -o /dev/null)

    if [[ $REST_API_STATUS != 200 ]]; then
      echo [`date '+%F %T'`]: "Verifying cluster applications... Failure!. The REST API can't be located on $restEndpoint."
      exit 1
    fi
  done

  echo [`date '+%F %T'`]: "Verifying cluster applications... Done!."
}

verify_configuration() {
  echo "Nothing to do here yet..."
}

verify_membership
verify_applications
verify_configuration

Creating & Uploading the Release

At this point all artifacts needed to create the dev release are in place, the folder structure of the release directory should be as follows:

.
├── blobs
│   ├── geode
│   │   └── apache-geode-src-1.1.1.tar.gz
│   ├── gradle
│   │   └── gradle-3.5-bin.zip
│   └── java
│       └── jdk-8u131-linux-x64.tar.gz
├── config
│   ├── blobs.yml
│   └── final.yml
├── jobs
│   ├── locator
│   │   ├── monit
│   │   ├── spec
│   │   └── templates
│   │       ├── ctl.erb
│   │       ├── locator.properties.erb
│   │       └── post-start.erb
│   ├── server
│   │   ├── monit
│   │   ├── spec
│   │   └── templates
│   │       ├── cache.xml.erb
│   │       ├── ctl.erb
│   │       ├── post-start.erb
│   │       └── server.properties.erb
│   └── smoke-tests
│       ├── monit
│       ├── spec
│       └── templates
│           └── errands.erb
├── packages
│   ├── geode
│   │   ├── packaging
│   │   └── spec
│   ├── gradle
│   │   ├── packaging
│   │   └── spec
│   └── java
│       ├── packaging
│       └── spec
└── src

As we would have expected, BOSH comes with handy commands to create and upload the release: create-release and upload-release. There's no need to be connected to the BOSH Director when creating the release, but it's mandatory when uploading it:

# Make the Credentials available for the BOSH CLI
$ export WORKSPACE_DIRECTORY=/workspace
$ export BOSH_CLIENT=admin
$ export BOSH_CLIENT_SECRET=`bosh int $WORKSPACE_DIRECTORY/config/creds.yml --path /admin_password`

# Create the Release
$ bosh -n create-release --force

# Upload the Release
$ bosh -n -e bosh-lite upload-release

That's all, our release is uploaded to the BOSH Director and, from now on, we can reference it from a deployment manifest to start deploying stuff!.

Creating the Deployment Manifest

The deployment manifest is a YAML file that defines the components and properties of the BOSH deployment. When an operator initiates a new deployment using the CLI, the Director receives a manifest and creates or updates a deployment with matching name.

For the sake of simplicity we'll keep this file as small as possible (it's under the "manifests" folder for simplicity, but it should not even be part of this repository because it actually doesn't belog to the release). Keep in mind that the deployment manifest is a world on its own and that it has to be aligned with the cloud-config used by the Director, the IaaS being used, the available stemcells, networks and releases, etc.

The sample manifest uploaded to this repository defines a cluster composed by 2 locators and 4 servers, along with the smoke-test job as an errand, but you can change almost anything just by modifying this file (number of instances, IPs, types of VM, size of disks, VM memory, jobs properties, etc.). The official reference can be found at Manifest v2 Schema.

---
name: geode-cluster

releases:
- name: geode-bosh
  version: latest

stemcells:
- alias: default
  os: ubuntu-trusty
  version: "latest"

update:
  canaries: 1
  max_in_flight: 2
  canary_watch_time: 60000
  update_watch_time: 60000

instance_groups:
- name: locator
  instances: 2
  azs: [z1, z2, z3]
  vm_type: default
  stemcell: default
  networks:
  - name: default
  jobs:
  - name: locator
    release: geode-bosh
    provides:
      locator: { as: locator }
- name: server
  instances: 4
  azs: [z1, z2, z3]
  vm_type: default
  stemcell: default
  networks:
  - name: default
  jobs:
  - name: server
    release: geode-bosh
    provides:
      server: { as: server }
    consumes:
      locator: { from: locator }
- name: smoke-tests
  instances: 1
  azs: [z1,z2,z3]
  lifecycle: errand
  vm_type: default
  stemcell: default
  networks:
  - name: default
  jobs:
  - name: smoke-tests
    release: geode-bosh
    consumes:
      server: { from: server }
      locator: { from: locator }

Deploying and Executing the Smoke Tests

We have the deployment manifest finished, the specified jobs have been uploaded as part of the release, and the rest of the referenced components (networks, stemcells, vm types, availability zones, etc.) have been previously uploaded within the cloud-config in the first step, so now we're officialy ready to create the deployment through the BOSH CLI deploy command. We also want to run the smoke-tests after the deploy finishes, and that can be achieved by running the BOSH CLI run-errand command.

Putting it all together:

$ bosh -n -e bosh-lite -d geode-cluster deploy manifests/geode-deployment.yml
$ bosh -e bosh-lite -d geode-cluster run-errand smoke-tests

At this point our Geode cluster, deployed through BOSH, is finally up and running!!.

We can get details about the instances and VMs running in our deployment (along with their IPs, CPU usage, disk usage, etc.) through the following BOSH CLI commands:

$ bosh -e bosh-lite -d geode-cluster vms --vitals
$ bosh -e bosh-lite -d geode-cluster instances --details

We can use a web browser to access the Pulse Web Application deployed within the locators at http://locatorIp:7070/pulse. We can also use a web browser to access the Swagger UI deployed on the servers at http://serverIp:7070/geode/swagger-ui.html, and execute some REST operations throught the Geode REST API. We could even use the Apache Geode GFSH Tool to connect to the cluster from the command line and start managing it. In summary, we can do anything we'd normally do with a regular and manually installed Geode Cluster, and that's just because this is a regular Geode Cluster, the only difference is that now we have an easy way to version, package and deploy the cluster in a reproducible manner!!.

Below is, yet another, bash script I've used a lot while creating and testing the release:

#!/bin/bash
WORKSPACE_DIRECTORY=/workspace
export BOSH_CLIENT=admin
export BOSH_CLIENT_SECRET=`bosh int $WORKSPACE_DIRECTORY/config/creds.yml --path /admin_password`

function create_release() {
  set -x
  bosh -n create-release --force
  set +x
}

function upload_release() {
  set -x
  bosh -n -e bosh-lite upload-release
  set +x
}

function create_deployment() {
  set -x
  bosh -n -e bosh-lite -d geode-cluster deploy manifests/geode-deployment.yml
  set +x
}

function run_errands() {
  set -x
  bosh -e bosh-lite -d geode-cluster run-errand smoke-tests
  set +x
}

function delete_deployment() {
  set -x
  rm -Rf .dev_builds
  rm -Rf dev_releases
  bosh -n -e bosh-lite -d geode-cluster delete-deployment --force
  bosh -n -e bosh-lite delete-release geode-bosh
  set +x
}

selection=
until [ "$selection" = "6" ]; do
  echo "##############################################################################################################################################################################"
  echo "Select an option:"
  echo "  1 - Create Release."
  echo "  2 - Upload Release."
  echo "  3 - Create Deployment."
  echo "  4 - Execute Deployment Errands."
  echo "  5 - Clean Deployment."
  echo "  6 - Exit."
  echo "##############################################################################################################################################################################"
  echo -n "Enter a choice: "
  read selection
  echo ""
  case $selection in
    1)
      create_release
      ;;
    2)
      upload_release
      ;;
    3)
      create_deployment
      ;;
    4)
      run_errands
      ;;
    5)
      delete_deployment
      ;;
    6)
      exit
      ;;
        *) echo 'Invalid option, please select an option between 1 and 6';;
    esac
done

jujoramos / geode-bosh-release Goto Github PK

geode-bosh-release's Introduction