Git Product home page Git Product logo

spoptimize's People

Contributors

vrivellino avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

harmy

spoptimize's Issues

ECS Support

Hello!

Can spoptimize be used for EC2 instances with ECS?

Thanks!

Update deployment documentation

Split out getting-started/quick-deploy from advanced deployment topics.

Advanced deployment docs would include details on how to override defaults via environment variables.

Cancel spot requests for terminated instances

Perhaps wire-up a lambda to ASG termination notices to cancel spot requests. EC2 will eventually close them on its own, but open spot requests associated with terminate instances go against the account-holder's limit.

Simplify spot instance attachment; Add locking

When testing the initial implementation, I found that with an autoscaling group with desired-capacity of 1, spoptimize and autoscaling get into a loop:

  1. Instance is launch by ASG
  2. Spoptimize attaches a spot instance in same AZ as launched on-demand instance
  3. ASG launches a new instance in another AZ, attempting to rebalance
  4. The original instance and the spot instance get nuked
  5. Process repeats in the other AZ

Rather than worry about seamless attachments and terminations, spoptmize should instead:

  • terminate on-demand and attach spot in same step
  • provide a lock-out mechanism to prevent parallel executions from attaching & terminating in the same ASG

With locking implemented, there won't be any service downtime as long as the autoscaling group has more than one instance running. And an autoscaling group of 1 implies that some service downtime is acceptable.

Refactor handler.py & stepfns.py

Handler.py has zero test coverage, and it contains some logic that probably belongs in stepfns.py.

It'd great to get test coverage for handler.py and keep as much logic in stepfns.py.

All lambda return values should be defined somewhere (perhaps in a standalone module) so that a test can compare those strings in sam.yml.

Update readme to note MaxSpotInstanceCountExceeded

During testing, I came across this error from the request-spot lambda:

An error occurred (MaxSpotInstanceCountExceeded) when calling the RequestSpotInstances operation: Max spot instance count exceeded: ClientError
Traceback (most recent call last):
File "/var/task/handler.py", line 64, in handler
event['launch_subnet_id'], client_token)
File "/var/task/spoptimize/stepfns.py", line 106, in request_spot_instance
return spot_helper.request_spot_instance(launch_config, az, subnet_id, client_token)
File "/var/task/spoptimize/spot_helper.py", line 65, in request_spot_instance
Type='one-time', ClientToken=client_token)
File "/var/runtime/botocore/client.py", line 317, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/var/runtime/botocore/client.py", line 615, in _make_api_call
raise error_class(parsed_response, operation_name)
ClientError: An error occurred (MaxSpotInstanceCountExceeded) when calling the RequestSpotInstances operation: Max spot instance count exceeded

The solution was to request service limit increase via AWS Support.

Update readme to note protected & standby instances

Make a note in the documentation that protected and standby instances are not replaced by spoptimize and execution will stop if the launched instance is detected by spoptimize to be protected from scale-in or is marked as standby.

Revisit locking retry/back-off

Exclusive locking is implemented via step functions' retry semantics:

spoptimize/sam.yml

Lines 322 to 326 in 4aa555c

"Retry": [{
"ErrorEquals": [ "GroupLocked" ],
"IntervalSeconds": 5,
"MaxAttempts": 20,
"BackoffRate": 1.5

I'm not sure this is the right long-term solution. For larger autoscaling groups, it may take hours for all instances to be replaced after a deploy or mass update.

Perhaps allow for more than one instance to be replaced my Spoptimize (configurable via tag)? Or just have a static interval between retries?

Wait for cloudformation

If an auto-scaling group is managed by cloudformation and the associated cloudformation stack status is IN_PROGRESS wait for stack status to settle before proceeding with execution.

This will prevent spoptimize from doing anything during stack updates.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.