Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support extended resources and ephemeral-storage for scale-from-zero specified in MachineClass NodeTemplate #334

Conversation

elankath
Copy link

@elankath elankath commented Nov 7, 2024

What this PR does / why we need it:

Right now in scale-from-zero cases, the autoscaler does not respect ephemeral-storage and extended resource specified in the MachineClass.NodeTemplate . Only the standard cpu, gpu and memory are picked up. Neither is there any support for custom extended resource which are fully ignored presently.

Which issue(s) this PR fixes:
Fixes #132

Special notes for your reviewer:

Release note:

Support extended resources and ephemeral-storage for scale-from-zero specified in MachineClass NodeTemplate

@gardener-robot gardener-robot added needs/review Needs review size/m Size of pull request is medium (see gardener-robot robot/bots/size.py) labels Nov 7, 2024
@gardener-robot-ci-1 gardener-robot-ci-1 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Nov 7, 2024
@elankath elankath self-assigned this Nov 7, 2024
@gardener-robot-ci-1 gardener-robot-ci-1 added needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Nov 7, 2024
Copy link

@rishabh-11 rishabh-11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. some changes requested.

cluster-autoscaler/cloudprovider/mcm/mcm_manager.go Outdated Show resolved Hide resolved
cluster-autoscaler/cloudprovider/mcm/mcm_manager.go Outdated Show resolved Hide resolved
@gardener-robot gardener-robot added the needs/changes Needs (more) changes label Nov 11, 2024
@gardener-robot-ci-2 gardener-robot-ci-2 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Nov 12, 2024
@elankath
Copy link
Author

Corrected issues. Will test scale from zero for ephemeral storage and custom resources and add manual test-log tomorrow. I am unsure how to code an integration test for this though.

@gardener-robot-ci-1 gardener-robot-ci-1 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Nov 12, 2024
@gardener-robot-ci-3 gardener-robot-ci-3 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Nov 12, 2024
@gardener-robot-ci-2 gardener-robot-ci-2 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Nov 12, 2024
@elankath
Copy link
Author

elankath commented Nov 19, 2024

Test log for scale from zero with custom resource named example.com/dongle

apiVersion: v1
kind: Pod
metadata:
  name: testexres1
spec:
  containers:
    - name: example-container
      image: busybox
      command: ["sh", "-c", "echo Using extended resources && sleep 3600"]
      resources:
        limits:
          resource.com/dongle: 2
        requests:
          resource.com/dongle: 2

Node Group b has this specified:

        providerConfig:
          apiVersion: aws.provider.extensions.gardener.cloud/v1alpha1
          kind: WorkerConfig
          nodeTemplate:
            capacity:
              cpu: 8
              ephemeral-storage: 50Gi
              gpu: 0
              hana.hc.sap.com/hcu/cpu: 20
              hana.hc.sap.com/hcu/memory: 10
              memory: 7Gi
              resource.com/dongle: 6

Scale from zero triggered.


I1119 13:37:51.654540   38814 mcm_manager.go:981] Copying extended resources map[hana.hc.sap.com/hcu/cpu:{{20 0} {<nil>} 20 DecimalSI} hana.hc.sap.com/hcu/memory:{{10 0} {<nil>} 10 DecimalSI} resource.com/dongle:{{6 0} {<nil>} 6 DecimalSI}] to template node.Status.Capacity

 I1119 13:37:41.141731   38814 klogx.go:87] Pod default/testexres1 can be moved to template-node-for-shoot--i062009--abc-b-z1-5762866181449073921-upcoming-0

 Normal   TriggeredScaleUp   37s    cluster-autoscaler  pod triggered scale-up: [{shoot--i062009--abc-b-z1 0->1 (max: 1)}

Copy link

@rishabh-11 rishabh-11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@gardener-robot gardener-robot added reviewed/lgtm Has approval for merging and removed needs/changes Needs (more) changes needs/review Needs review labels Nov 20, 2024
@gardener-robot-ci-3 gardener-robot-ci-3 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Nov 20, 2024
@rishabh-11 rishabh-11 merged commit db938d5 into gardener:machine-controller-manager-provider Nov 20, 2024
10 checks passed
@gardener-robot gardener-robot added the status/closed Issue is closed (either delivered or triaged) label Nov 20, 2024
@dhague
Copy link

dhague commented Nov 20, 2024

Looking forward to using this feature - thanks!

elankath added a commit to elankath/autoscaler that referenced this pull request Nov 20, 2024
…specified in MachineClass NodeTemplate (gardener#334)

* support extended resouces and ephemeral-storage in scale-from-zero

* corrected resource issues

* adjusted unit test TestBuildNodeFromTemplate for changes
rishabh-11 pushed a commit that referenced this pull request Nov 21, 2024
…specified in MachineClass NodeTemplate (#334) (#336)

* support extended resouces and ephemeral-storage in scale-from-zero

* corrected resource issues

* adjusted unit test TestBuildNodeFromTemplate for changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) reviewed/lgtm Has approval for merging reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) size/m Size of pull request is medium (see gardener-robot robot/bots/size.py) status/closed Issue is closed (either delivered or triaged)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow a way to specify extended resources for scale-from-zero scenario
7 participants