Exposing scale subresource to enable HPA's to modify the replicas of Infinispan CR #2133

ZeidH · 2024-07-29T12:07:47Z

Enhanced the spec.subresources field for the Infinispan CRD to map scaling

      scale:
        specReplicasPath: .spec.replicas
        statusReplicasPath: .status.replicas

This enables APIs like the HorizontalPodAutoscaler and KEDA to modify the replica count of the Cache.

openshift-ci · 2024-07-29T12:07:57Z

Hi @ZeidH. Thanks for your PR.

I'm waiting for a infinispan member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

ryanemerson

Thanks for the contribution @ZeidH, however we generate our CRD content from code, so it's not sufficient to update the static resource files.

Looking at the Artemis PR you linked on the issue, it seems like we can generate this by adding the following annotation above the Infinispan struct:

//+kubebuilder:subresource:scale:specpath=.spec.replicas,statuspath=.status.replicas

You should then be able to generate the static files by doing make generate.

ZeidH · 2024-07-31T10:15:32Z

Thanks for the clear instructions @ryanemerson!, I'm rather new to this so appreciate the help.
Encountered this issue with controller-gen on my version of go (1.22.5) so I also updated it to latest (1.15.0).

ryanemerson · 2024-07-31T10:40:42Z

Thanks for updating @ZeidH. It seems like we need to increase our baseline go version in order to use that version of controller-gen. I have created a PR to baseline on 1.21 (1.20 is now EOL) and update the controller-gen version #2134. I'll let you know once that PR has been merged so that you can rebase this PR.

ZeidH · 2024-08-09T09:21:52Z

I've tested the autoscaling with this setup:

./scripts/ci/kind-with-olm.sh
make deploy-cert-manager
helm install keda kedacore/keda --namespace keda --create-namespace
skaffold dev

and a ScaledObject that targets the Infinispan CR

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: infinispan-scaler
  namespace: infinispan-operator-system
spec:
  minReplicaCount: 0
  scaleTargetRef:
    apiVersion: infinispan.org/v1
    kind: Infinispan
    name: test-cache
  triggers:
    - metadata:
        desiredReplicas: '2'
        start: 52 8 * * *
        end: 59 8 * * *
        timezone: UTC
      type: cron
---
apiVersion: infinispan.org/v1
kind: Infinispan
metadata:
  annotations:
    infinispan.org/monitoring: 'true'
    infinispan.org/operatorPodTargetLabels: 'rht.comp,rht.comp_ver,rht.prod_name,rht.prod_ver,rht.subcomp_t'
  name: test-cache
  namespace: infinispan-operator-system
spec:
  service:
    container:
      livenessProbe:
        failureThreshold: 5
        initialDelaySeconds: 0
        periodSeconds: 10
        successThreshold: 1
        timeoutSeconds: 1
      readinessProbe:
        failureThreshold: 5
        initialDelaySeconds: 0
        periodSeconds: 10
        successThreshold: 1
        timeoutSeconds: 1
      startupProbe:
        failureThreshold: 600
        initialDelaySeconds: 3
        periodSeconds: 1
        successThreshold: 1
        timeoutSeconds: 1
    type: DataGrid
  jmx: {}
  configListener:
    enabled: true
    logging:
      level: info
  upgrades:
    type: Shutdown
  replicas: 1

The Statefulset was able to scale from zero to n and back but I can imagine that for some configurations scaling to zero is not possible

api/v1/infinispan_types.go

ryanemerson · 2024-09-03T13:54:53Z

Apologies for the delay @ZeidH, but main is now baselined on go 1.21. Can you please rebase on the latest code?

ryanemerson · 2024-09-03T14:08:06Z

but I can imagine that for some configurations scaling to zero is not possible

Can you explain which configurations you think would cause issues?

…the Infinispan CR

This reverts commit da838d6.

ryanemerson

Thanks @ZeidH, just a couple of minor comments and then I think we're good.

config/manifests/bases/infinispan-operator.clusterserviceversion.yaml

api/v1/infinispan_types.go

ZeidH · 2024-09-09T13:48:00Z

but I can imagine that for some configurations scaling to zero is not possible

Can you explain which configurations you think would cause issues?

I can't think of any specific to Infinispan, the amount of available configurations is so immense it would take a long time to figure out what config can be autoscaled, and what cannot. I think after learning a bit on what Infinispan does the past few months, the configs where autoscaling doesn't work well also most likely wouldn't make much sense to use autoscaling/scale to zero, like clustered caches where you (almost)always want to have a hot instance.

But from experience there can be issues with autoscaling when using specific types of storage configurations, or as @rigazilla mentioned in the GH Issue: about the operator sometimes needs to explicitly set the .spec.Replicas, i.e. in upgrade

What we attempted to do is to dynamically scale a replicated cache with the amount of nodes (1 cache per node), essentially making Infinispan behave like a DaemonSet. Although we're not done with it just yet, so far it's been working out well.

openshift-ci bot added the needs-ok-to-test label Jul 29, 2024

ZeidH mentioned this pull request Jul 29, 2024

Autoscaling #749

Open

ryanemerson requested changes Jul 31, 2024

View reviewed changes

ZeidH force-pushed the 749-autoscaling branch from b7cf784 to a84d975 Compare August 9, 2024 09:20

ryanemerson reviewed Sep 3, 2024

View reviewed changes

api/v1/infinispan_types.go Show resolved Hide resolved

ZeidH added 4 commits September 4, 2024 13:19

Exposing scale subresource to enable HPA's to modify the replicas of …

07d741e

…the Infinispan CR

Adding kubebuilder field

b6d496e

Updated controller-gen to 0.15.0

da838d6

Fix missing label selector on HPA

c35359e

ZeidH force-pushed the 749-autoscaling branch from a84d975 to c35359e Compare September 4, 2024 11:21

Revert "Updated controller-gen to 0.15.0"

40ef01d

This reverts commit da838d6.

ryanemerson requested changes Sep 9, 2024

View reviewed changes

config/manifests/bases/infinispan-operator.clusterserviceversion.yaml Outdated Show resolved Hide resolved

api/v1/infinispan_types.go Show resolved Hide resolved

Fix review comments

6afdd5d

ZeidH requested a review from ryanemerson September 11, 2024 09:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exposing scale subresource to enable HPA's to modify the replicas of Infinispan CR #2133

Exposing scale subresource to enable HPA's to modify the replicas of Infinispan CR #2133

ZeidH commented Jul 29, 2024

openshift-ci bot commented Jul 29, 2024

ryanemerson left a comment

ZeidH commented Jul 31, 2024

ryanemerson commented Jul 31, 2024

ZeidH commented Aug 9, 2024

ryanemerson commented Sep 3, 2024 •

edited

Loading

ryanemerson commented Sep 3, 2024 •

edited

Loading

ryanemerson left a comment

ZeidH commented Sep 9, 2024

Exposing scale subresource to enable HPA's to modify the replicas of Infinispan CR #2133

Are you sure you want to change the base?

Exposing scale subresource to enable HPA's to modify the replicas of Infinispan CR #2133

Conversation

ZeidH commented Jul 29, 2024

openshift-ci bot commented Jul 29, 2024

ryanemerson left a comment

Choose a reason for hiding this comment

ZeidH commented Jul 31, 2024

ryanemerson commented Jul 31, 2024

ZeidH commented Aug 9, 2024

ryanemerson commented Sep 3, 2024 • edited Loading

ryanemerson commented Sep 3, 2024 • edited Loading

ryanemerson left a comment

Choose a reason for hiding this comment

ZeidH commented Sep 9, 2024

ryanemerson commented Sep 3, 2024 •

edited

Loading

ryanemerson commented Sep 3, 2024 •

edited

Loading