-
Notifications
You must be signed in to change notification settings - Fork 813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support webhook fallback #3718
Conversation
Build Failed 😱 Build Id: 3c506d32-fcb3-4b46-9596-548133f4dbab To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
Build Failed 😱 Build Id: 0f6ba3cf-0391-4048-a520-508fcef4a091 To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size. |
Build Succeeded 👏 Build Id: f863e17c-00e7-4376-a6d7-c53b9e960692 The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size. |
Build Succeeded 👏 Build Id: e0b5d167-b6b9-4a48-8bf0-07f9436b0ecb The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't had a chance to go deep, but figured I'd send you my first thing at least!
properties: | ||
policy: | ||
type: object | ||
x-kubernetes-preserve-unknown-fields: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather we have an actual spec here.
My thought here would be to take the webhook
element, and turn it into a Helm include
or template
- much like we do for _gameserverstatus.yaml and use that in both spots.
The include will need a conditional though to make sure it only recurses one level, but since you can pass in a context structure that should be doable.
f82f492
to
18afd0c
Compare
Build Failed 😱 Build Id: 7cb2b5df-afd2-4749-90e6-2b5b0586b43b To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
e945fc3
to
dae3be5
Compare
Build Succeeded 👏 Build Id: c7a1fb0e-4084-4702-b706-55fc6ac60a03 The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good!
One thing we'll definitely need is some additional docs here:
https://agones.dev/site/docs/getting-started/create-fleetautoscaler/
Our docs publish on merge, so have a look here https://agones.dev/site/docs/contribute/ to see how to use a feature
code to hide stuff until next release.
@zmerlynn @nrwiersma I'd love a second opinion -- should this be behind a feature flag? It's simple enough that it's probably not warranted, but wanted some consensus before making accepting as is. WDYT?
# limitations under the License. | ||
|
||
{{/* schema for a fleet autoscaler policy */}} | ||
{{- define "fleetautoscaler.policy" }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love it!
|
||
// Fallback defines how the autoscaler should behave in the event the webhook fails. | ||
// +optional | ||
Fallback *WebhookFallback `json:"fallback,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll ask the question just to be sure -- would we ever want a fallback for other policies? Or just webhook? (i.e. should this be further up).
I'm fairly sure the answer is "no", but wanted to triple check just in case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO no. The other policies are purely internal and cannot really fail. This is unique to Webhooks.
@@ -162,6 +175,20 @@ func applyWebhookPolicy(w *autoscalingv1.WebhookPolicy, f *agonesv1.Fleet) (repl | |||
return 0, false, err | |||
} | |||
|
|||
defer func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than this fancy defer stuff 😄 should we create a new function called applyWebhookPolicyWithFallback()
That replaces the call above to:
case autoscalingv1.WebhookPolicyType:
return applyWebhookPolicyWithFallback(pol.Webhook, f, gameServerLister, nodeCounts)
Which can call applyWebhookPolicy
and then does the appropriate fallback handling with error management, without the complexity of working out what is happening in a defer
statement? 😄
I’ll review this morning - I’d like to take a look though |
Let's go ahead and feature flag it. We've got resourcing enough that we were going to work on scheduled fleet autoscalers soon, and while I've done some thinking on an API for that, I haven't done a ton. I have some thoughts on using something like the API added here, but instead creating a new ETA: Outside of that, the code looks reasonable and love the helm templating. I'll defer to Mark for detailed review. |
Chatting with @zmerlynn - I agree, let's feature flag it, so that if we need / want to adjust the API surface once we also tackle autoscaler scheduling, we can break things if need be. For steps on feature flagging:
And to be fair - better safe than sorry 👍🏻 and this way we can get this in for the next release, you can use it - without having to wait for the scheduling implementation (which is really the point in feature flags!) |
@nrwiersma and I discussed this internally and if the decision for a So instead of investing time into a solution path that is highly likely to get deprecated soon, it makes more sense to us to contribute to the next proper iteration of this feature. Which brings me to the question if there is already some basic minimum set of definitions for a
If that's the case, we'd rather close this PR and do a first pass on the |
@aRestless If y'all are willing to start on that, it would be much appreciated - but I had a counterpoint view: I think even if we wanted to add in the concept of a Now, I mostly wanted to express that view, but I do like your framing here: a policy falls through to the next policy either if it fails (Webhook or whatever else we might add that has error deps), or if some conditional fails. That seems easy enough to express and means that the structure is "flat" (whereas my odd differentiation between "failure" and "conditional" feels oddly branchy.) So I feel like on more on your side. Which actually brings me to.. Should we allow (I'm also not tied to |
Oooh, this is interesting stuff 👍🏻 Since we're heading into design discussion, I took my design thoughts over to #3008 (comment) and tagged you all to discuss (I find audit trails for decisions easier to track in Issues, than PRs). |
It seems there is agreement on something like |
What type of PR is this?
What this PR does / Why we need it:
This adds support for fallback policies that are applied when the webhook fails. If the webhook were to fail, the autoscaler will apply the configured fallback policy.
Which issue(s) this PR fixes:
Closes #3686
Special notes for your reviewer:
I found no way to get the CRDs to be self referential. The only way I got it to work was to use
x-kubernetes-preserve-unknown-fields: true
on the fallback policy.