vrivellino / spoptimize Goto Github PK
View Code? Open in Web Editor NEWSpoptimize: Replace AWS AutoScaling instances with spot instances
License: Mozilla Public License 2.0
Spoptimize: Replace AWS AutoScaling instances with spot instances
License: Mozilla Public License 2.0
Launch Templates with auto-scaling is a thing.
Hello!
Can spoptimize be used for EC2 instances with ECS?
Thanks!
Split out getting-started/quick-deploy from advanced deployment topics.
Advanced deployment docs would include details on how to override defaults via environment variables.
Wire up a lambda to subscribe to termination notices from CloudWatch Events and terminate Spoptimize-launched spot instances via the autoscaling API.
Perhaps wire-up a lambda to ASG termination notices to cancel spot requests. EC2 will eventually close them on its own, but open spot requests associated with terminate instances go against the account-holder's limit.
When testing the initial implementation, I found that with an autoscaling group with desired-capacity of 1, spoptimize and autoscaling get into a loop:
Rather than worry about seamless attachments and terminations, spoptmize should instead:
With locking implemented, there won't be any service downtime as long as the autoscaling group has more than one instance running. And an autoscaling group of 1 implies that some service downtime is acceptable.
Allow a configuration override to prevent spoptimize from replacing all on-demand instances.
Handler.py has zero test coverage, and it contains some logic that probably belongs in stepfns.py.
It'd great to get test coverage for handler.py and keep as much logic in stepfns.py.
All lambda return values should be defined somewhere (perhaps in a standalone module) so that a test can compare those strings in sam.yml.
IAM policy currently allows the lambdas to pass any IAM role. This should be restricted.
Perhaps a parameter that allows the user to list ARNs?
During testing, I came across this error from the request-spot lambda:
An error occurred (MaxSpotInstanceCountExceeded) when calling the RequestSpotInstances operation: Max spot instance count exceeded: ClientError
Traceback (most recent call last):
File "/var/task/handler.py", line 64, in handler
event['launch_subnet_id'], client_token)
File "/var/task/spoptimize/stepfns.py", line 106, in request_spot_instance
return spot_helper.request_spot_instance(launch_config, az, subnet_id, client_token)
File "/var/task/spoptimize/spot_helper.py", line 65, in request_spot_instance
Type='one-time', ClientToken=client_token)
File "/var/runtime/botocore/client.py", line 317, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/var/runtime/botocore/client.py", line 615, in _make_api_call
raise error_class(parsed_response, operation_name)
ClientError: An error occurred (MaxSpotInstanceCountExceeded) when calling the RequestSpotInstances operation: Max spot instance count exceeded
The solution was to request service limit increase via AWS Support.
Make a note in the documentation that protected and standby instances are not replaced by spoptimize and execution will stop if the launched instance is detected by spoptimize to be protected from scale-in or is marked as standby.
Exclusive locking is implemented via step functions' retry semantics:
Lines 322 to 326 in 4aa555c
I'm not sure this is the right long-term solution. For larger autoscaling groups, it may take hours for all instances to be replaced after a deploy or mass update.
Perhaps allow for more than one instance to be replaced my Spoptimize (configurable via tag)? Or just have a static interval between retries?
If an auto-scaling group is managed by cloudformation and the associated cloudformation stack status is IN_PROGRESS
wait for stack status to settle before proceeding with execution.
This will prevent spoptimize from doing anything during stack updates.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.