-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(aws-asg) Prevent scaling in case of ongoing InstanceRefresh #597
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @tirimia 👋
Thank you so much for the PR, and apologies for the delay in getting this review.
I had some suggestions that I tried to apply myself, but I don't have permission to update your branch (I think it may have been because it's main
).
Here are the changes I propose:
diff --git a/CHANGELOG.md b/CHANGELOG.md
index f4b6e0c..e9f8467 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,5 +1,8 @@
## UNRELEASED
+IMPROVEMENTS:
+ * plugin/target/aws: Prevent scaling if an instance refresh is in progress [[GH-597](https://github.com/hashicorp/nomad-autoscaler/pull/597)]
+
## 0.3.7 (June 10, 2022)
IMPROVEMENTS:
diff --git a/plugins/builtin/target/aws-asg/plugin/plugin.go b/plugins/builtin/target/aws-asg/plugin/plugin.go
index dc8f92e..7beacdc 100644
--- a/plugins/builtin/target/aws-asg/plugin/plugin.go
+++ b/plugins/builtin/target/aws-asg/plugin/plugin.go
@@ -13,6 +13,7 @@ import (
"github.com/hashicorp/nomad-autoscaler/plugins/target"
"github.com/hashicorp/nomad-autoscaler/sdk"
"github.com/hashicorp/nomad-autoscaler/sdk/helper/nomad"
+ "github.com/hashicorp/nomad-autoscaler/sdk/helper/ptr"
"github.com/hashicorp/nomad-autoscaler/sdk/helper/scaleutils"
)
@@ -122,13 +123,10 @@ func (t *TargetPlugin) Scale(action sdk.ScalingAction, config map[string]string)
}
// Autoscaling can interfere with a running instance refresh so we
- // prevent any scaling action while a refresh is Pending or InProgress
- var maxRecords int32 = 1
+ // prevent any scaling action while a refresh is Pending or InProgress.
input := autoscaling.DescribeInstanceRefreshesInput{
AutoScalingGroupName: &asgName,
- InstanceRefreshIds: []string{},
- MaxRecords: &maxRecords,
- NextToken: nil,
+ MaxRecords: ptr.Int32ToPtr(10),
}
refreshes, err := t.asg.DescribeInstanceRefreshes(ctx, &input)
@@ -136,15 +134,15 @@ func (t *TargetPlugin) Scale(action sdk.ScalingAction, config map[string]string)
return fmt.Errorf("failed to describe AWS InstanceRefresh: %v", err)
}
- // We might get 0 results if no InstanceRefreshes were ever run on the ASG
- if len(refreshes.InstanceRefreshes) == 1 {
- refreshID := refreshes.InstanceRefreshes[0].InstanceRefreshId
- status := refreshes.InstanceRefreshes[0].Status
- if status == types.InstanceRefreshStatusInProgress ||
- status == types.InstanceRefreshStatusPending {
+ for _, refresh := range refreshes.InstanceRefreshes {
+ active := refresh.Status == types.InstanceRefreshStatusInProgress ||
+ refresh.Status == types.InstanceRefreshStatusPending
+
+ if active {
t.logger.Warn("scaling will not take place due to InstanceRefresh",
"asg_name", asgName,
- "refresh_id", refreshID)
+ "refresh_id", refresh.InstanceRefreshId,
+ "refresh_status", refresh.Status)
return nil
}
}
diff --git a/sdk/helper/ptr/ptr.go b/sdk/helper/ptr/ptr.go
index d6aee38..520b8d2 100644
--- a/sdk/helper/ptr/ptr.go
+++ b/sdk/helper/ptr/ptr.go
@@ -8,6 +8,10 @@ func IntToPtr(i int) *int {
return &i
}
+func Int32ToPtr(i int32) *int32 {
+ return &i
+}
+
func Int64ToPtr(i int64) *int64 {
return &i
}
The reason for this is that it wasn't immediately clear to me from the documentation if it would be possible to have more than one instance refresh action active at the same time.
I would imagine that this would not be possible (otherwise we would have the same problem as we're trying to fix here), but just in case it may be better to loop over a few of them. I picked 10 just because it felt like a good enough number to detect this, hopefully people are not refreshing instances this aggressively 😅
I also added a CHANGELOG
entry and a new Int32ToPtr
which is what we use to convert things to pointers.
Let me know what you think 🙂
Hi, @lgfa29 ! Should have done my changes on a different branch, sorry about that I've integrated your proposed changes, they look great. Only one instance refresh can happen at a time, but the AWS SDK returns a paginated history of Instance Refresh events, starting with the latest one. That is the reason I will only query for one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only one instance refresh can happen at a time
Great, thank you for the confirmation. I read the docs but this wasn't clear to me, so I was wondering if there could be a scenario where you start a refresh, then start a new one that finishes before the previous one.
If that's not possible then querying just the latest make sense 👍
Thanks for the PR!
As of version 0.3.7, the nomad-autoscaler does not take into account pending or currently running AWS ASG Instance Refresh events. See #596
This has lead to us having the autoscaler constantly try to adjust as the instance refresh was also spinning nodes up and down.
To avoid such problems, this PR extends the Scale action to also check the last Instance Refresh event on the ASG and not go ahead with scaling if it has a status of Pending or InProgress.
Happy to get any feedback you can provide :)
Closes #596