Git Product home page Git Product logo

ecs-ami-deploy's Introduction

Automate EC2 instance replacement with updated ECS optimized AMI for ECS clusters

What is this?

ecs-ami-deploy is a library of code for intelligently replacing instances in an autoscaling group (ASG) with the latest ECS Optimized image. Based on a couple assumptions, this library can replace all instances in an ECS cluster without causing downtime for any of the running services.

This process is available as:

  • A Go Module: import "github.com/silinternational/ecs-ami-deploy"
  • A command line application. See cli/ directory
  • A Lambda function that can be scheduled or triggered automatically whenever a new ECS optimized AMI is released

Why did we build this?

AWS releases new ECS optimized AMIs on a pretty frequent basis. It's generally a good idea to keep up to date with releases for security, performance, and other enhancements. However, replacing instances can be a complicated dance if you also want to maintain a high level of availability for the services you're running on the instances to be replaced. AWS has a feature to "refresh" instances in an autoscaling group, but unfortunately it isn't "ECS aware". Rather, it removes EC2 instances from the autoscaling group, terminates them, and replaces them. When the new instances are "healthy" it moves on to replace more instances. It does not take into account if the services you have running in ECS are stable yet when removing the next instance. As a result, refreshing instances in an ASG using the built-in feature can and will likely cause downtime for your ECS services.

That isn't really an acceptable solution for us, so we created ecs-ami-deploy to refresh instances with an awareness of the services running in ECS and to replace EC2 instances only when ECS services are stable. This process does assume more than one task per service is running so that when an EC2 instance is removed, ECS will launch a new task on a different instance while other tasks for this service remain where they are. Then one by one, EC2 instances are removed from service and terminated, and only when all services are stable again, that is they have zero pending tasks, then the next EC2 instance can be removed from service and so forth.

Idempotency

Gracefully replacing instances can take some time, especially for clusters with many instances supporting them. Since it is expected that this process will be run by a Lambda, which has a runtime limit of 15 minutes, the process was designed to be fault-tolerant and to pick up where it left off should it be killed due to timeout. One of the ways this is accomplished is that all EC2 instances in the ASG are tagged before being detached and terminated. On successive runs the process looks for tagged instances for the given cluster that are no longer in service and continues the graceful termination process while monitoring the ECS cluster services for stability.

If --force-replacement is enabled, the process will always replace all instances whether there is a newer AMI available or not. When --force-replacement is enabled the process is not idempotent.

Instance Replacement Process

  1. Look up latest AMI based on either the given AMI filter, or the default: amzn2-ami-ecs-hvm-*-x86_64-ebs
  2. Identify the ASG for the given ECS cluster to get current launch configuration and instances list
  3. Compare latest AMI with AMI in use by launch configuration
    1. If cluster is not using latest AMI, or force replacement is enabled, proceed to #4
    2. Else if using latest AMI already, jump to #10
  4. Create new launch configuration with new AMI
  5. Update ASG to use new launch config
  6. Detach existing instances from ASG and replace with new ones
  7. Wait for new instances to reach InService state with ASG
  8. Watch ECS cluster instances until all new ones are registered and available
  9. For each old instance that needs to be removed:
    1. Deregister one instance from ECS cluster
    2. Wait for zero pending tasks in cluster
    3. Terminate old ASG EC2 instance
  10. Scan all EC2 instances for any instances tagged for termination as part of this operation in case any were missed on a previous run due to timeout or something else. For each:
    1. Terminate instance
    2. Wait for zero pending tasks in cluster

Todo

  • Consistentify logging vs. returning errors
  • Add a logger that can send output to an email as well as stdout
  • Create Lambda wrapper and provide trigger examples for schedule and SNS when newer AMI is released

CLI Usage

  1. Grab the latest binary for your platform at https://github.com/silinternational/ecs-ami-deploy/releases
  2. The CLI makes use of AWS's SDK for Go, which can load authentication credentials from various places similar to the AWS CLI itself
  3. Run ecs-ami-deploy list-cluster to check if it's working and what clusters you have available.
  4. If you have multiple profiles configured in your ~/.aws/credentials file, you can use the -p or --profile flags to specify a different profile.
  5. The CLI defaults to region us-east-1, you can use the -r or -region flags to specify else
  6. The CLI has help information built in for the various subcommands and their supported flags, use -h or --help flags with each subcommand for more information.

ecs-ami-deploy's People

Contributors

fillup avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.