stakater / aws-terraform Goto Github PK

View Code? Open in Web Editor NEW

This project forked from xuwang/aws-terraform

1.0 14.0 1.0 535 KB

CoreOS, Terraform, Shell Scripting

License: MIT License

Makefile 26.00% HCL 16.55% Shell 56.95% GCC Machine Description 0.50%

aws-terraform's Introduction

AWS CoreOS cluster provisioning with Terraform

Overview

This is a practical implementation of [CoreOS cluster architectures ] (https://coreos.com/os/docs/latest/cluster-architectures.html) built on AWS.

The cluster follows CoreOS production cluster model that contains an autoscaling etcd cluster, and an autoscaling worker cluster for hosted containers. You can optionally add an admiral cluster for shared services such as CI, private docker registry, logging and monitoring, etc.

The entire infrastructure is managed by Terraform.

For other type of Unix cluster, see a similar repo aws-linux-cluster.

Setup AWS credentials

Go to AWS Console.

Signup AWS account if you don't already have one. The default EC2 instances created by this tool is covered by AWS Free Tier (https://aws.amazon.com/free/) service.
Create a group coreos-cluster with AdministratorAccess policy.
Create a user coreos-cluster and Download the user credentials.
Add user coreos-cluster to group coreos-cluster.

Install tools

If you use Vagrant, you can skip this section and go to Quick Start section.

Instructions for install tools on MacOS:

Install Terraform

$ brew update
$ brew install terraform

$ mkdir -p ~/bin/terraform
$ cd ~/bin/terraform
$ curl -L -O https://dl.bintray.com/mitchellh/terraform/terraform_0.6.0_darwin_amd64.zip
$ unzip terraform_0.6.0_darwin_amd64.zip

Install Jq
```
$ brew install jq
```

Install AWS CLI

$ brew install awscli

$ sudo easy_install pip
$ sudo pip install --upgrade awscli

For other platforms, follow the tool links and instructions on tool sites.

Quick start

Clone the repo:

$ git clone https://github.com/xuwang/aws-terraform.git
$ cd aws-terraform

Run Vagrant ubuntu box with terraform installed (Optional)

If you use Vagrant, instead of install tools on your host machine, there is Vagranetfile for a Ubuntu box with all the necessary tools installed:

$ vagrant up
$ vagrant ssh
$ cd aws-terraform

Configure AWS profile with `coreos-cluster` credentials

$ aws configure --profile coreos-cluster

Use the downloaded aws user credentials when prompted.

The above command will create a coreos-cluster profile authentication section in ~/.aws/config and ~/.aws/credentials files. The build process bellow will automatically configure Terraform AWS provider credentials using this profile.

To build:

This default build will create one etcd node and one worker node cluster in a VPC, with application buckets for data, necessary iam roles, polices, keypairs and keys. The instance type for the nodes is t2.micro. You can review the configuration and make changes if needed. See Customization for details.

$ make
... build steps info ...
... at last, shows the worker's ip:
worker public ips: 52.27.156.202
...

To see the list of resources created:

$ make show
...
  module.etcd.aws_autoscaling_group.etcd:
  id = etcd
  availability_zones.# = 3
  availability_zones.2050015877 = us-west-2c
  availability_zones.221770259 = us-west-2b
  availability_zones.2487133097 = us-west-2a
  default_cooldown = 300
  desired_capacity = 1
  force_delete = true
  health_check_grace_period = 0
  health_check_type = EC2
  launch_configuration = terraform-4wjntqyn7rbfld5qa4qj6s3tie
  load_balancers.# = 0
  max_size = 9
  min_size = 1
  name = etcd
  tag.# = 1
....

Login to the worker node:

$ ssh -A [email protected]

CoreOS beta (723.3.0)
[email protected] ~ $ fleetctl list-machines
MACHINE     IP      METADATA
289a6ba7... 10.0.1.141  env=coreos-cluster,platform=ec2,provider=aws,region=us-west-2,role=etcd2
320bd4ac... 10.0.5.50   env=coreos-cluster,platform=ec2,provider=aws,region=us-west-2,role=worker

Destroy all resources

$ make destroy_all

This will destroy ALL resources created by this project.

Customization

The default values for VPC, ec2 instance profile, policies, keys, autoscaling group, lanuch configurations etc., can be override in resources/terraform/module-.tf` files.
AWS profile and cluster name are defined at the top of Makefile:
```
AWS_PROFILE := coreos-cluster
CLUSTER_NAME := coreos-cluster
```
These can also be customized to match your AWS profile and cluster name.

Build multi-node cluster

The number of etcd nodes and worker nodes are defined in resource/terraform/module-etcd.tf and resource/terraform/module-worker.tf

Change the cluster_desired_capacity in the file to build multi-nodes etcd/worker cluster, for example, change to 3:

    cluster_desired_capacity = 3

Note: etcd minimum, maximum and cluster_desired_capacity should be the same and in odd number, e.g. 3, 5, 9

You should also change the aws_instance_type from micro to medium or large if heavy docker containers to be hosted on the nodes:

    image_type = "t2.medium"
    root_volume_size =  12
    docker_volume_size =  120

To build:

$ make all
... build steps info ...
... at last, shows the worker's ip:
worker public ips:  52.26.32.57 52.10.147.7 52.27.156.202
...

$ ssh -A [email protected]
CoreOS beta (723.3.0)

[email protected] ~ $ etcdctl cluster-health
cluster is healthy
member 34d5239c565aa4f6 is healthy
member 5d6f4a5f10a44465 is healthy
member ab930e93b1d5946c is healthy

core@ip-10-0-1-92 ~ $ etcdctl member list
34d5239c565aa4f6: name=i-65e333ac peerURLs=http://10.0.1.92:2380 clientURLs=http://10.0.1.92:2379
5d6f4a5f10a44465: name=i-cd40d405 peerURLs=http://10.0.1.185:2380 clientURLs=http://10.0.1.185:2379
ab930e93b1d5946c: name=i-ecfa0d1a peerURLs=http://10.0.1.45:2380 clientURLs=http://10.0.1.45:2379

[email protected] ~ $ fleetctl list-machines
MACHINE     IP      METADATA
0d16eb52... 10.0.1.92   env=coreos-cluster,platform=ec2,provider=aws,region=us-west-2,role=etcd2
d320718e... 10.0.1.185  env=coreos-cluster,platform=ec2,provider=aws,region=us-west-2,role=etcd2
f0bea88e... 10.0.1.45   env=coreos-cluster,platform=ec2,provider=aws,region=us-west-2,role=etcd2
0cb636ac... 10.0.5.4    env=coreos-cluster,platform=ec2,provider=aws,region=us-west-2,role=worker
4acc8d6e... 10.0.5.112  env=coreos-cluster,platform=ec2,provider=aws,region=us-west-2,role=worker
fa9f4ea7... 10.0.5.140  env=coreos-cluster,platform=ec2,provider=aws,region=us-west-2,role=worker

Manage individual platform resources

You can create individual resources and the automated-scripts will create resources automatically based on dependencies.

$ make help

Usage: make (<resource> | destroy_<resource> | plan_<resource> | refresh_<resource> | show | graph )
Available resources: vpc s3 route53 iam etcd worker
For example: make worker # to show what resources are planned for worker

Currently defined resources:

Resource	Description
vpc	VPC, gateway, and subnets
s3	S3 buckets
iam	Setup a deployment user and deployment keys
route53	Setup public and private hosted zones on Route53 DNS service
elb	Setup application ELBs
efs	EFS cluster
etcd	Setup ETCD2 cluster
worker	Setup application docker hosting cluster
admiral	Central service cluster (fleet-ui, monitoring, logging, etc)
docker_registry	Private docker registry cluster
rds	RDS servers
cloudtrail	Setup AWS CloudTrail

To build the cluster step by step:

$ make init
$ make vpc
$ make etcd
$ make worker

Make commands can be re-run. If a resource already exists, it just refreshes the terraform status.

This will create a build/ directory, copy all terraform files to the build dir, and execute correspondent terraform cmd to build the resource on AWS.

To destroy a resource:

$ make destroy_<resource>

Technical notes

Etcd cluster is on an autoscaling group. It should be set with a fixed, odd number (1,3,5..), and cluster_desired_capacity=min_size=max_size.
Cluster discovery is managed with stakater/etcd-aws-cluster image. etcd cluster is formed by self-discover through its auto-scaling group and then an etcd initial cluster is updated automatically to s3://AWS-ACCOUNT-CLUSTER-NAME-cloudinit/CLUSTER-NAME_etcd/initial-cluster s3 bucket. Worker nodes join the cluster by downloading the etcd initial-cluster file from the s3 bucket during their bootstrap.
AWS resources are defined in resources and modules directories. The build process will copy all resource files from resources to a build directory. The terraform actions are performed under build, which is ignored in .gitignore. The original Terraform files in the repo are kept intact.
Makefiles and shell scripts are used to give us more flexibility on tasks Terraform leftover. This provides stream-lined build automation.
All nodes use a common bootstrap shell script as user-data, which downloads initial-cluster file and nodes specific cloud-config.yaml to configure the node. If cloud-config changes, no need to rebuild an instance. Just reboot it to pick up the change.
CoreOS AMI is generated on the fly to keep it up-to-data. Default channel can be changed in Makefile.
Terraform auto-generated launch configuration name and CBD feature are used to allow launch configuration update on a live autoscaling group, however, running ec2 instances in the autoscaling group has to be recycled outside of the Terraform management to pick up the new LC.
For a production system, the security groups defined in etcd, worker, and admiral module should be carefully reviewed and tightened.

First steps

To control your cluster with fleet, you use the fleetctl command. As you can read here, fleet has no built-in security mechanism. If you want to use fleetctl from your workstation, you need to configure fleet to use an SSH tunnel. I found that an easy way to do this is to configure the SSH user and private key in ~/.ssh/config and then export the FLEETCTL_TUNNEL variable on the command line. Like so:

Host coreos User core HostName IdentityFile ~/.ssh/your_aws_private_key.pem

And:

export FLEETCTL_TUNNEL=

It doesn’t matter which instance you use as the other end of your SSH tunnel, as long as you use the EC2 instance’s public IP address. Of course the IP address in your SSH config must be the same as what you export in the environment variable.

Also, make sure to add your private key to ssh-agent, to make sure the ssh commands work:

ssh-add ~/.ssh/your_aws_private_key.pem

Once you’ve done this, the following command:

fleetctl list-machines

Should show you the servers in your cluster:

MACHINE IP METADATA 015a6f3a... 10.104.242.206 - 3588db25... 10.73.200.139 -

Host coreos User core HostName 52.36.252.184 IdentityFile /Users/rasheed/Documents/projects/stakater/aws-terraform-xuwang/aws-terraform/build/keypairs/gocd.pem

export FLEETCTL_TUNNEL=52.36.252.184

ssh-add /Users/rasheed/Documents/projects/stakater/aws-terraform-xuwang/aws-terraform/build/keypairs/gocd.pem

fleetctl commands

fleetctl submit hello.service fleetctl start hello.service fleetctl status hello.service fleetctl destroy hello.service

To see the output of the service, call:ß

fleetctl journal hello.service

Fleet is effectively a clustered layer on top of systemd. Fleet uses systemd unit files with an (optional) added section to tell fleet which machines it should run on. There is very little magic.

list systemd units

systemctl list-units | grep fleet

systemctl restart fleet.service

systemd

introduction to systemd: https://coreos.com/docs/launching-containers/launching/getting-started-with-systemd/

fleet

introduction to fleet: https://coreos.com/fleet/docs/latest/launching-containers-fleet.html

# DONT

Don't modify the cluster-name. If you do then please do update the "s3-cloudconfig-bootstrap.sh" as well. Specifically this path:

# Bucket path for the cloud-config.yaml
bucket=${accountId}-stakater-cloudinit

Troubleshooting

Two types of units can be run in your cluster — standard and global units. Standard units are long-running processes that are scheduled onto a single machine. If that machine goes offline, the unit will be migrated onto a new machine and started.

Global units will be run on all machines in the cluster.

The fleet logs (sudo journalctl -u fleet) will provide more clarity on what’s going on under the hood.
There are two fleetctl commands to view units in the cluster: list-unit-files, which shows the units that fleet knows about and whether or not they are global, and list-units, which shows the current state of units actively loaded into machines in the cluster.

$ fleetctl list-unit-files

You can view all of the machines in the cluster by running list-machines:

$ fleetctl list-machines

$ fleetctl list-units

Check the fleet service to see what errors it gives us:

$ systemctl status -l fleet

For each of our essential services, we should check the status and logs. The general way of doing this is:

systemctl status -l journalctl -b -u

If we check the etcd logs, we will see something like this:

journalctl -b -u etcd

When your CoreOS machine processes the cloud-config file, it generates stub systemd unit files that it uses to start up fleet and etcd. To see the systemd configuration files that were created and are being used to start your services, change to the directory where they were dropped:

cd /run/systemd/system ls -F

to list all units

systemctl

Services usually fail because of a missing dependency (e.g. a file or mount point), missing configuration, or incorrect permissions. In this example we see that the dev-mqueue unit with type mount fails. As the type is a mount, the reason is most likely because mounting a particular partition failed.

By using the systemctl status command we can see the details of the dev-mqueue.mount unit:

[root@localhost ~]# systemctl status dev-mqueue.mount

online tool to validate cloud-config

https://coreos.com/validate/

Can you check to see if the service is enabled (systemctl is-enabled etcd2)? If it's not enabled, it may be a dependency of something that is enabled. You can test with systemctl list-dependencies etcd2 --reverse

check status of a service

systemctl status -l gocd

There’s a few things worth pointing out:

The container is clearly dependent on having Docker running, hence the Requires line. The After line is also needed to avoid race conditions.
Before we start the container, we first stop and remove any existing container with the same name and then pull the latest version of the image. The “-” at the start means systemd won’t abort if the command fails.
This means that our container will be started from scratch each time. If you want to persist data then you’ll need to do something with volumes or volume containers, or change the code to restart the old container if it exists.
We’ve used TimeoutStartSec=0 to turn off timeouts, as the docker pull may take a while.

systemd unit status

You can check units status by:

$ sudo systemctl status gocd-agent-1

Or the unit logs by:

$ sudo journalctl -exu gocd-agent-1

Usually, the log info will tell what's going on.

to see the docker logs

docker logs <IMAGE_NAME>

I don't know what is SIGKILL'ing the process. Perhaps there is something in the full system journal around that time that might indicate journalctl --since "2015-03-20 08:49"? Try running dmesg too? Maybe the kernel is killing it?

Step 1: get into the coreos machine:

ssh -i /home/vagrant/aws-terraform/build/keypairs/gocd.pem core@

Step 2: get list of running docker containers

docker ps

Step 3: to check logs of particular container/service

journalctl -exu gocd-agent-1

journalctl -exu gocd-agent-cd-prod.service

Step 4:

aws-terraform's People

Contributors

Stargazers

Watchers

Forkers

isabella232

aws-terraform's Issues

allow user to create a single coreos instance without cluster

give user an option to setup an alone instance instead of as part of cluster

e.g. ELK instance

read AWS region from variable; and shouldn't be hard coded

find this text in all files us-west-2 and ensure its read from the variable and not hardcoded; in case its not defined the tool should stop working

allow user to update cluster name

currently cluster name is hard coded at couple of places which breaks everything once we start modifying it

have possibility to deploy multiple versions of the same app in one environment

things to keep in mind:

- versioning of the backend systems (if possible)

get rid of these warnings

perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = (unset),
    LC_ALL = (unset),
    LC_CTYPE = "UTF-8",
    LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = (unset),
    LC_ALL = (unset),
    LC_CTYPE = "UTF-8",
    LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = (unset),
    LC_ALL = (unset),
    LC_CTYPE = "UTF-8",
    LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = (unset),
    LC_ALL = (unset),
    LC_CTYPE = "UTF-8",
    LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = (unset),
    LC_ALL = (unset),
    LC_CTYPE = "UTF-8",
    LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = (unset),
    LC_ALL = (unset),
    LC_CTYPE = "UTF-8",
    LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = (unset),
    LC_ALL = (unset),
    LC_CTYPE = "UTF-8",
    LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = (unset),
    LC_ALL = (unset),
    LC_CTYPE = "UTF-8",
    LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = (unset),
    LC_ALL = (unset),
    LC_CTYPE = "UTF-8",
    LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = (unset),
    LC_ALL = (unset),
    LC_CTYPE = "UTF-8",
    LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = (unset),
    LC_ALL = (unset),
    LC_CTYPE = "UTF-8",
    LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = (unset),
    LC_ALL = (unset),
    LC_CTYPE = "UTF-8",
    LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = (unset),
    LC_ALL = (unset),
    LC_CTYPE = "UTF-8",
    LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = (unset),
    LC_ALL = (unset),
    LC_CTYPE = "UTF-8",
    LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = (unset),
    LC_ALL = (unset),
    LC_CTYPE = "UTF-8",
    LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = (unset),
    LC_ALL = (unset),
    LC_CTYPE = "UTF-8",
    LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = (unset),
    LC_ALL = (unset),
    LC_CTYPE = "UTF-8",
    LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = (unset),
    LC_ALL = (unset),
    LC_CTYPE = "UTF-8",
    LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").

user packer to build the ami

replace current hand written scripts and use packer to build ami

add docker registry

add docker registery

https://github.com/coreos/unit-examples/tree/master/docker-registry-s3

add yeoman generator to setup the basic infrastructure

build base ami including the common services

build base ami including the common services i.e.

registrator
filebeat
etc.

and use that base image to build the app image

add consul module

https://github.com/mvanholsteijn/coreos-container-platform-as-a-service/blob/master/config/cloudformation.template.jinja

rename the repo to stakater

rename this repo to stakater

setup the development using clustered coreos

setup bastion host

measure time to deploy an application after commit

measure time to deploy an application after commit in dev and qa environment

add gocd module

one gocd server
multiple gocd agents

setup mongodb app

we need to setup mongodb app with clustering support

upgrade to terraform 0.7 RC

Update code to make it compatible with terraform 0.7RC

fetch upstream changes

we must fetch upstream changes; as some of them are very relevant and will help to increase the code quality

define separate path for DEV & PROD environments

need to define the path and way; and possibility in the framework that DEV environment runs on Clustered CoreOS and in PROD it can be run using AMI

how it will be done?
define systemd units?

provide the flexibility in the tool to allow choose between the different possible solutions

fix and remove hardcoded wait after base instance creation

Currently, a "wait" of 60 seconds has been added after base instance creation, so that the instance can download required data before ami creation takes place.

We need to replace this hard-coded wait with a mechanism that makes sure that all required files have been successfully downloaded and it is now safe to start ami creation.

Possible solutions:
1- ssh into the instance after its created and wait till the files don't appear? How many files will we check?

append cluster name to the user and role

lets append cluster name to the user and role; so, that all of us can use same aws account to test stakater

configure registrator

Registrator automatically registers and deregisters services for any Docker container by inspecting containers as they come online. Registrator supports pluggable service registries, which currently includes Consul, etcd and SkyDNS 2

https://github.com/gliderlabs/registrator

setup registrator systemd unit in base image

setup stage environment which is duplicate of prod environment

setup pre-stage environment which is duplication of dev environment

setup pre-stage environment which is duplication of dev environment; but the application will deployed to this environment over night only.

old application docker image should be deleted before creating a new one

since we are already pushing the docker images to docker registry so, we don't need to store them locally

the reason we need to remove them is that it fills up local disk space

separate vpc per environment

add a unit (app) to take backup, upload, download & restore backup from s3

create unit which offers following features:

take backup of a volume
tag the backup
upload the backup to s3
download and restore the backup from s3
create configurable cron job which will run and take care of first 3 steps...

we will use this unit to take backup & restore databases e.g. mysql; and anyother database

this will be efficient and fast;

ideally this unit should find all volumes attached to the host on which it is running; it should do the above

create DNS entry for the deployed app in dev environment

to make the application easily accessible and be rememberable we should add a DNS entry; the DNS entry should include:

app name
environment name
app version

e.g. app01.dev.stakater.com

add documentation to explain the stages of the gocd pipeline

add documentation to explain the stages of the pipeline

create a wiki which explains all stages of the pipeline

implement GetAZs replacement; AZ to region mapping

right now AZs are hard coded and we are stuck in using us-west-2 region and can't switch to use different region

and it will be best to dynamically have the mapping between region and AZ; and based on selected region the AZ list should be populated

it is currently hard coded at lot of places; e.g. in all **-subnet.tf* files in vpc module; and variable files etc.

here is example of how to do it:
https://groups.google.com/forum/#!topic/terraform-tool/fwjGCohZPk0

variable aws_vpc_cidr_prefix { default = "172.20" }
variable aws_region { default = "us-east-1" }

variable aws_azs {
  default = { 
    us-east-1 = "us-east-1a,us-east-1c,us-east-1d"
    us-west-2 = "us-west-2a,us-west-2b,us-west-2c"
  }
}

#
# Each app subnet group is a /21 (172.20.0.0/21)
# each app subnet group is split across up to 4 seperate
# AZ's, i.e. 172.20.[0,4,8,12].0/22
#
# Example:
# gfcp-app              -> 172.20.0.0 - 172.20.15.255
# gfcp-app (us-east-1a) -> 172.20.0.0 - 172.20.3.255
# gfcp-app (us-east-1c) -> 172.20.4.0 - 172.20.7.255
# gfcp-app (us-east-1d) -> 172.20.8.0 - 172.20.11.255
# gfcp-app (unused)     -> 172.20.12.0 - 172.20.15.255
#
# This configuration gives:
# - 4 AZ's per region (currently only use 3)
# - 16 App networks
# - 4096 IP addresses per App network
# - 1024 IPaddresses per App, per AZ
#
variable aws_appnet_map {
  default = { 
    gfcp-app =   "0" # jboss, fuse
    gfcp-dmz =  "16" # boxes with public and private ips
    gfcp-rds =  "32" # main databases
    gfcp-web =  "48" # web-facing, apache ended up in dmz
    gfcp-tmp =  "64" # temporary instances, like data-pipeline
    undef80  =  "80" # unused block 80
    undef96  =  "96" # unused block 96
    undef112 = "112" # unused block 112
    undef128 = "128" # unused block 128
    undef144 = "144" # unused block 144
    undef160 = "160" # unused block 160
    undef176 = "176" # unused block 176
    ops-vpn  = "192" # vpn client ips
    ops-dmz  = "208" # nat boxes, bastion
    ops-rds  = "224" # ops databases
    ops-app  = "240" # puppet, rundeck, etc
  }
}

resource "aws_subnet" "gfcp-app" {
...
  count = "${length(split(",", lookup(var.aws_azs, var.aws_region)))}"
  availability_zone = "${element(split(",", lookup(var.aws_azs, var.aws_region)), count.index)}"
  cidr_block = "${var.aws_vpc_cidr_prefix}.${lookup(var.aws_appnet_map, "gfcp-app")+(4*count.index)}.0/22"
}
resource "aws_elb" "gfcp-single" {
...
subnets = [ "${aws_subnet.gfcp-app.*.id}" ]
...
}

resource "aws_autoscaling_group" "gfcp-app" {
...
  availability_zones = [ "${split(",", lookup(var.aws_azs, var.aws_region))}" ]
  vpc_zone_identifier = [ "${aws_subnet.gfcp-app.*.id}" ]
...
}

https://github.com/hashicorp/terraform/blob/master/examples/aws-asg/main.tf

# Specify the provider and access details
provider "aws" {
  region = "${var.aws_region}"
}

resource "aws_elb" "web-elb" {
  name = "terraform-example-elb"

  # The same availability zone as our instances
  availability_zones = ["${split(",", var.availability_zones)}"]
  listener {
    instance_port = 80
    instance_protocol = "http"
    lb_port = 80
    lb_protocol = "http"
  }

  health_check {
    healthy_threshold = 2
    unhealthy_threshold = 2
    timeout = 3
    target = "HTTP:80/"
    interval = 30
  }

}

resource "aws_autoscaling_group" "web-asg" {
  availability_zones = ["${split(",", var.availability_zones)}"]
  name = "terraform-example-asg"
  max_size = "${var.asg_max}"
  min_size = "${var.asg_min}"
  desired_capacity = "${var.asg_desired}"
  force_delete = true
  launch_configuration = "${aws_launch_configuration.web-lc.name}"
  load_balancers = ["${aws_elb.web-elb.name}"]
  #vpc_zone_identifier = ["${split(",", var.availability_zones)}"]
  tag {
    key = "Name"
    value = "web-asg"
    propagate_at_launch = "true"
  }
}

resource "aws_launch_configuration" "web-lc" {
  name = "terraform-example-lc"
  image_id = "${lookup(var.aws_amis, var.aws_region)}"
  instance_type = "${var.instance_type}"
  # Security group
  security_groups = ["${aws_security_group.default.id}"]
  user_data = "${file("userdata.sh")}"
  key_name = "${var.key_name}"
}

# Our default security group to access
# the instances over SSH and HTTP
resource "aws_security_group" "default" {
  name = "terraform_example_sg"
  description = "Used in the terraform"

  # SSH access from anywhere
  ingress {
    from_port = 22
    to_port = 22
    protocol = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  # HTTP access from anywhere
  ingress {
    from_port = 80
    to_port = 80
    protocol = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  # outbound internet access
  egress {
    from_port = 0
    to_port = 0
    protocol = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

setup ELK stack one single coreos node; but define separate systemd units file

setup ELK stack on single node but systemd units for each of the component must be defined separately in the apps folder
once the stack is up and running then manually send a entry to logstash; and verify it in Kibana

create different pipeline templates for "supporting apps" & "user apps"

supporting apps:

logstash
elasticsearch
etc. etc.

user apps:
any custom application

add maven nexus repo module

dynamically upload & download configs files to/from s3 bucket required by systemd unit

upload all (if any) config files in config folder under the module to the config s3 bucket; and then download all (if any) from s3 bucket in bootstrap script

lets assume you are running a docker container of logstash as systemd unit which needs a configuration file when it starts; and we need to support such scenarios

the idea is that user can put all configuration files under config folder under that module and we will dynamically upload them to s3 bucket

add retry on registry certificates upload

Registry Certificates are uploaded by the registry after it is created.
The upload fails quite often due to reasons like "time out" or "unable to upload please try again"

Add retry logic on uploading certificates so that certificate upload failure is minimized

backup the terraform state

the terraform state should be backed up

define gocd pipeline with stages

define gocd pipeline with following stages:

pull
build
test
push
deploy to dev
build machine image
deploy to test/stage
deploy to prod?

some steps should be automatic on each commit and some should be manual

deployment of customer specific apps;
deployment of infrastructurs apps ( e.g. logstash, kibana, etc. )

these are just initial thoughts...

https://www.go.cd/2015/12/28/gocd-continuous-delivery-through-pipelines.html

gocd
docker registry

create gocd dashboards

add build AMI workflow

add workflow that allows user to create an AMI given systemd units

Find out how cluster discovery is handled by dockerage/etcd-aws-cluster image.
If still required, add functionality to dynamically replace discovery URL each time make runs

stakater / aws-terraform Goto Github PK

aws-terraform's Introduction

AWS CoreOS cluster provisioning with Terraform

Table of Contents##

Overview

Setup AWS credentials

Install tools

Quick start

Clone the repo:

Run Vagrant ubuntu box with terraform installed (Optional)

Configure AWS profile with coreos-cluster credentials

To build:

To see the list of resources created:

Login to the worker node:

Destroy all resources

Customization

Build multi-node cluster

Manage individual platform resources

Technical notes

First steps

fleetctl commands

systemd

fleet

# DONT

Troubleshooting

systemd unit status

to see the docker logs

aws-terraform's People

Contributors

Stargazers

Watchers

Forkers

aws-terraform's Issues

- versioning of the backend systems (if possible)

Recommend Projects

Recommend Topics

Recommend Org

Configure AWS profile with `coreos-cluster` credentials