-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nodegroup scaling #254
Nodegroup scaling #254
Conversation
2b459d9
to
cc9c25b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Broadly, LGTM. I'll test it and have another look locally, as I'd like to better understand how it works. I may add some cosmetic changes, and probably a few lines in the docs. Thanks a lot, I think we should be able to merge and release this soon enough! 👍 🥇
cmd/eksctl/scale.go
Outdated
fs := cmd.Flags() | ||
|
||
fs.StringVarP(&cfg.ClusterName, "name", "n", "", "EKS cluster name") | ||
fs.IntVarP(&cfg.Nodes, "nodes", "N", 0, "total number of nodes (scale to this number)") |
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
pkg/cfn/manager/api.go
Outdated
} | ||
logger.Debug("changes = %#v", changeset.Changes) | ||
if err := c.doExecuteChangeset(stackName, changesetName); err != nil { | ||
logger.Warning("error executing Cloudformation changeset %s in stack %s. Check the Cloudformation console for further details", changesetName, stackName) |
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
pkg/cfn/manager/waiters.go
Outdated
logger.Debug("describeChangesetErr=%v", err) | ||
} else { | ||
logger.Critical("unexpected status %q while %s", *s.Status, msg) | ||
c.troubleshootStackFailureCause(i, desiredStatus) |
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
Scaling to zero is legit use-case actually (basically a way to save money),
we shouldn't prevent it as such. I just think there should be no default
value, that's all.
…On Mon, 15 Oct 2018, 12:34 pm Richard Case, ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In cmd/eksctl/scale.go
<#254 (comment)>:
> +
+ cmd := &cobra.Command{
+ Use: "nodegroup",
+ Short: "Scale a nodegroup",
+ Run: func(_ *cobra.Command, args []string) {
+ if err := doScaleNodeGroup(cfg); err != nil {
+ logger.Critical("%s\n", err.Error())
+ os.Exit(1)
+ }
+ },
+ }
+
+ fs := cmd.Flags()
+
+ fs.StringVarP(&cfg.ClusterName, "name", "n", "", "EKS cluster name")
+ fs.IntVarP(&cfg.Nodes, "nodes", "N", 0, "total number of nodes (scale to this number)")
We do have the following test later to cover this:
if cfg.Nodes < 1 {
return fmt.Errorf("number of nodes must be greater than 0. Use the --nodes/-N flag")
}
—
You are receiving this because your review was requested.
Reply to this email directly, view it on GitHub
<#254 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAPWSxXPxtpqu_kOynS3Z_-5DfV4Q1XIks5ulHGWgaJpZM4XahHY>
.
|
Good point. I'll change that. |
@richardcase so annoying GitHub doesn't attach email reply to the thread... sorry about that, I was hoping they fixed it. |
pkg/cfn/manager/nodegroup.go
Outdated
) | ||
|
||
const ( | ||
desirecCapacityPath = "Resources.NodeGroup.Properties.DesiredCapacity" |
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
d146b30
to
ade0f54
Compare
logger.Warning("error executing Cloudformation changeset %s in stack %s. Check the Cloudformation console for further details", changesetName, stackName) | ||
return err | ||
} | ||
return c.doWaitUntilStackIsUpdated(i) |
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
We may want to go ahead and add integration tests for this: Scale to 2 nodes, check for kubernetes nodes via kube api, scale back down to 1 (no wait?) |
That would be very nice. I'll also add this. |
pkg/cfn/manager/api.go
Outdated
func (c *StackCollection) doCreateChangesetRequest(i *Stack, action string, description string, templateBody []byte, | ||
parameters map[string]string, withIAM bool) (string, error) { | ||
|
||
changesetName := fmt.Sprintf("eksctl-%s-%d", action, time.Now().Unix()) |
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
pkg/cfn/manager/waiters.go
Outdated
} | ||
logger.Debug("start %s", msg) | ||
if waitErr := w.WaitWithContext(ctx); waitErr != nil { | ||
s, err := c.describeStackChangeset(i, changesetName) |
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
f1be016
to
7de2bba
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@richardcase I'm done with nitpicking here! we can merge and add tests in another PR, then cut a release, but I'm also equally happy to see test here - up to you :)
We can definitely have the integration tests in another PR |
7de2bba
to
2a3eafd
Compare
Initial version of nodegroup scaling has been added. This scales by modifying the CloudFormation template for the nodegroup. The modified template is used to create a changeset that is then executed. When scaling down/in (i.e. reducing the number of nodes) we rely solely on the the resulting change to the ASG. This means that the node(s) that are to be terminated aren't drained and so pods running on the terminating nodes may cause errors. In the future we may consider picking the EC2 instances to be terminated and then drain the nodes and then create a termination policy to ensure those nodes are killed. Issue #116 Signed-off-by: Richard Case <richard.case@outlook.com>
If the desired capacity is greater/less than the the current max/min of the ASG then it will be updated to match the desired node count. Signed-off-by: Richard Case <richard.case@outlook.com>
3c8777f
2a3eafd
to
3c8777f
Compare
Rebased but build is failing. |
@richardcase I've re-triggered, looks like it could be a flake... |
Thanks @errordeveloper. Could you approve again when you get time and i'll merge in. |
I've created #267 to make sure we don't forget to add the integration test. |
Combine driver manifests
Description
Initial version of nodegroup scaling has been added. This scales
by modifying the CloudFormation template for the nodegroup. The
modified template is used to create a changeset that is then
executed.
When scaling down/in (i.e. reducing the number of nodes) we rely
solely on the the resulting change to the ASG. This means that
the node(s) that are to be terminated aren't drained and so pods
running on the terminating nodes may cause errors. In the future
we may consider picking the EC2 instances to be terminated and then
drain the nodes and then create a termination policy to ensure those
nodes are killed.
This supercedes #191 and it relates to issue #116.
Todo:
Checklist
make build
)make test
)humans.txt
file