Autoscaler WebHook fallback Policy #3686

nrwiersma · 2024-03-08T08:24:43Z

Is your feature request related to a problem? Please describe.
While using the Autoscaler WebHook, it is possible, if not guaranteed, that at some point the service the WebHook points to will become unavailable for some or other reason. If the service does not recover within the autoscaling interval the fleets will not scale at all.

Describe the solution you'd like
It would be useful to be able to configure both the WebHook and another Policy (e.g. BufferPolicy). If the WebHook fails to respond, the other policy would be used. If no other policy is specified, the current behaviour persists, in that no replica change is made. Perhaps there could be a WebHook failure allowance (number of allowed failures) before the other Policy is used.

Describe alternatives you've considered
There are none. As it stands it is all or nothing with the WebHook.

Additional context
While it is obviously not desirable for the WebHook service to be unavailable for extended periods, anyone that has seen production knows that this unfortunately happens. This feature would bring a measure of resilience when using a WebHook.

markmandel · 2024-03-11T04:19:55Z

This sounds like a good thing 👍🏻

Any suggestions on what the configuration should look like?

nrwiersma · 2024-03-11T04:52:38Z

I would keep the config the same, but the addition of something like Fallback: true on the WebHook config. As it stands, the autoscaler validation does not stop you from setting multiple policies, they just get validated in a specific order. By moving WebHook forward in that list and checking Fallback it could continue down the list to find the next policy to validate, and the implementation would basically work the same.

Should allowing a number of failures before fallback be desired, that too could be added to the WebHook.

nrwiersma · 2024-03-11T04:54:42Z

Actually, that would not work, you would need a FallbackType to specify which type you want to fallback to.

markmandel · 2024-03-11T05:04:58Z

I was thinking some kind of nested config under webhook?

apiVersion: autoscaling.agones.dev/v1
kind: FleetAutoscaler
metadata:
  name: webhook-fleet-autoscaler
spec:
  fleetName: simple-game-server
  policy:
    # type of the policy - this example is Webhook
    type: Webhook
    # parameters for the webhook policy - this is a WebhookClientConfig, as per other K8s webhooks
    webhook:
      # use a service, or URL
      service:
        name: autoscaler-webhook-service
        namespace: default
        path: scale
      fallback: # Basically this is a copy of the above - so you could fallback to another webhook - but only allow one level
        policy:
          type: Buffer
          buffer:
            bufferSize: 5
            minReplicas: 10
            maxReplicas: 20

nrwiersma · 2024-03-11T05:14:14Z

Works as well. I would personally not have allowed Webhook falling back to Webhook, because purely from a configuration PoV it looks like more than 1 level could work. There will always be a reason to extend it by one more level. But happy to defer to your experience here.

markmandel · 2024-03-11T05:20:55Z

Works as well. I would personally not have allowed Webhook falling back to Webhook

Yeah, but also, if you really want to - should we stop you?

Also it makes our configuration parsing and management easier - the fallback is the same config as the parent - so everything becomes simple, rather than having to keep the fallback up to date with whatever newer config options we come up with down the line.

This also assumes that Helm lets us do some kind of include here that allows us to be at least a little self-referential 😄

nrwiersma · 2024-03-11T05:27:00Z

Yeah, but also, if you really want to - should we stop you?

Fair point 😄

This also assumes that Helm lets us do some kind of include here that allows us to be at least a little self-referential 😄

Helm playing ball is not an experience I have had too often, so it is not something I would bet on.

We would be happy to take this on, if it works for you.

markmandel · 2024-03-11T05:43:21Z

Helm playing ball is not an experience I have had too often, so it is not something I would bet on.

Hah. Quite possibly. I'm thinking an include where you can turn off the fallback option, so it doesn't recurse forever.

We would be happy to take this on, if it works for you.

No issue for me, have at it!

nrwiersma added the kind/feature New features for Agones label Mar 8, 2024

nrwiersma mentioned this issue Mar 19, 2024

feat: support webhook fallback #3718

Closed

zmerlynn mentioned this issue Apr 3, 2024

Feature Request: Scheduled Autoscalers #3008

Open

indexjoseph mentioned this issue Aug 15, 2024

feat: Adds e2e tests for chain/schedule policy and bump ScheduledAutoscaler to Alpha #3946

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autoscaler WebHook fallback Policy #3686

Autoscaler WebHook fallback Policy #3686

nrwiersma commented Mar 8, 2024

markmandel commented Mar 11, 2024

nrwiersma commented Mar 11, 2024 •

edited

Loading

nrwiersma commented Mar 11, 2024

markmandel commented Mar 11, 2024

nrwiersma commented Mar 11, 2024

markmandel commented Mar 11, 2024

nrwiersma commented Mar 11, 2024

markmandel commented Mar 11, 2024

Autoscaler WebHook fallback Policy #3686

Autoscaler WebHook fallback Policy #3686

Comments

nrwiersma commented Mar 8, 2024

markmandel commented Mar 11, 2024

nrwiersma commented Mar 11, 2024 • edited Loading

nrwiersma commented Mar 11, 2024

markmandel commented Mar 11, 2024

nrwiersma commented Mar 11, 2024

markmandel commented Mar 11, 2024

nrwiersma commented Mar 11, 2024

markmandel commented Mar 11, 2024

nrwiersma commented Mar 11, 2024 •

edited

Loading