-
Notifications
You must be signed in to change notification settings - Fork 522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added option to enable spot instance draining #1100
Added option to enable spot instance draining #1100
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for opening this pull request! We’re super excited to see the interest in the ECS variant of Bottlerocket, and adding new settings is a great way to help out! There are a few more things that will need to be done to take this PR across the finish line. I’m happy to describe them here; if you want to handle implementing them that would be awesome, but if you’d prefer to have us do it that’s also fine (though I can’t guarantee when we’ll be able to get to it). Let us know which you’d prefer!
You can take a look at this commit for inspiration, as it has most of the things you’ll want to do. Specifically:
- Add the new setting to
sources/models/src/lib.rs
in theECSSettings
struct. This is how we define the settings that show up in the Bottlerocket API and hook it up with the component that can read settings from user-data. - Read the new setting and inject it into the generated
ECSConfig
struct insources/api/ecs-settings-applier/src/ecs.rs
; you can see that on lines 79-100 of the file - (Not shown in the commit I linked you to) Bottlerocket settings are effectively stored in a database with a strict schema; adding new settings changes that schema. We’ll need a migration program (like in here, this is a good example) to handle adding the new setting to the schema.
- You don’t need to change
sources/models/src/aws-ecs-1/override-defaults.toml
even though it's show in the other commit; the default can remain unset.
f6229eb
to
b57a735
Compare
@samuelkarp Thanks a ton for the kind words, a very comprehensive review and concrete pointers! I suppose I missed a layer of indirection, things makes more sense now. I think I covered the points you listed, however i'm not sure about the versioning scheme of the project: I just put the migration in a |
b57a735
to
a9a4f9f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much, this looks super good! The last step I think is just for us to test and make sure the following scenarios are all working properly:
- upgrade from 1.0.0 and make sure the new setting is now available in the API
- downgrade from 1.0.1 to 1.0.0 and make sure the new setting is removed
- when enabling spot instance draining through the API, make sure the right codepath in the agent is executing
This is work that we can take care of; staging updates in a repo to execute upgrade-downgrade testing is not super trivial yet. But if you're able to build an AMI (cargo make && cargo make ami
) you can test whether the spot instance draining feature functions on a spot instance.
Spot termination notices are hard to test, i'm not sure I can trigger such a notification myself. However, i built an AMI, started an EC2 instance w/ user data:
...and it registered succesfully as a cluster instance, so the agent works in principle. The agent's introspection api does not seem to provide configuration details (possibly because there might be secrets in there): [ec2-user@ip-redacted ~]$ curl -s http://localhost:51678/v1/metadata
{"Cluster":"redacted","ContainerInstanceArn":"arn:aws:ecs:eu-west-1:redacted","Version":"Amazon ECS Agent - v1.43.0 (1ebf0604)"} However I checked whether the param is properly written to the config and that seems to be the case: [ec2-user@ip-redacted ~]$ sudo sheltie
bash-5.0# cat /etc/ecs/ecs.config.json
{"Cluster":"redacted","InstanceAttributes":{"bottlerocket.version":"1.0.0","bottlerocket.variant":"aws-ecs-1"},"PrivilegedDisabled":true,"AvailableLoggingDrivers":["json-file","awslogs","none"],"TaskIAMRoleEnabled":true,"TaskIAMRoleEnabledForNetworkHost":true,"SELinuxCapable":true,"OverrideAWSLogsExecutionRole":true,"SpotInstanceDrainingEnabled":true} |
I didn't know this until today, but Spot blocks allow specifying a duration for your instance to run, and termination notices are delivered at the end of the block. I was able to test an AMI based on your code, and correctly saw the ECS agent flip the container instance's state to I haven't had a chance to run the other test I want, which is upgrade-downgrade to check the migration. (I'm pretty confident that it'll work, but I still want to run the test.) The other thing that needs to be taken care of (and we can take care of this if you don't want to) is that the migration needs to be moved to 1.0.2 since we've released 1.0.1. |
Ah, wasn't aware, either. This will be helpful for testing spot tools.
👍
will do |
6163303
to
3f5e68c
Compare
I was able to test upgrade-downgrade and exercise the migration successfully, but there's one change I had to make: the
Can you add that to the PR? I'll be happy to approve it after. 😄 |
c5902be
to
41c04bc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🦄
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Hi @mkulke! Would you mind squashing your commits into a single commit? Then we can merge from there. (If we don't hear back from you in a couple days, we'll take care of the squash and merge.) |
Co-authored-by: Samuel Karp <samuelkarp@users.noreply.github.com>
260b7b3
to
58493ad
Compare
Issue number:
fixes #1099
Description of changes:
Added the option to have instance draining on spot instances, I used https://github.com/aws/amazon-ecs-agent/blob/a250409cf5eb4ad84a7b889023f1e4d2e274b7ab/agent/config/types.go as reference
Testing done:
Terms of contribution:
By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.