resourceModels supports extended resources #4050

wengyao04 · 2023-09-11T02:37:20Z

Please provide an in-depth description of the question you have:
I am able to register clusters with push mode, and use default resourceModel which only supports cpu, memory, ephemeral-storage and storage.
When I add extended resources like gpu in the resourceModels in the cluster like

  resourceModels:
  - grade: 0
    ranges:
    - max: "72"
      min: "0"
      name: cpu
    - max: 560Gi
      min: "0"
      name: memory
    - max: "0"
      min: "0"
      name: nvidia.com/gpu
    - max: "0"
      min: "0"
      name: myexample.com/gpu-v100
  - grade: 1
    ranges:
    - max: "96"
      min: "72"
      name: cpu
    - max: 1.6Ti
      min: 560Gi
      name: memory
    - max: "4"
      min: "0"
      name: nvidia.com/gpu
    - max: "4"
      min: "0"
      name: myexample.com/gpu-v100

I get

Unsupported value: "nvidia.com/gpu": supported values: "cpu", "ephemeral-storage", "memory", "storage"

My understanding is that General Cluster Modeling use the resourceSummary to check the allocatable/allocated to schedule the pod.
But we also want to have GPU in Customized Cluster Modeling, I don't think gpu will have the fragmentation issues like gpu/memory when using general cluster modeling, as ppl cannot claim partial gpu. But it would be still preferred to have extended resources in the customized cluster modeling just make it consistent with cluster resources ?
I also run Karmada dashboard, it shows cpu, memory and storage. But it would be nice to show extended resources ?
What do you think about this question?:

Environment:

Karmada version: latest
Kubernetes version: 1.24
Others:

The text was updated successfully, but these errors were encountered:

RainbowMango · 2023-09-12T13:22:11Z

Unsupported value: "nvidia.com/gpu": supported values: "cpu", "ephemeral-storage", "memory", "storage"

That's because the validation rules restrict the supported resources here. I might need to ask several questions when thinking if we can extend it for introducing another resource, like nvidia.com/gpu.

I don't think gpu will have the fragmentation issues like gpu/memory when using general cluster modeling, as ppl cannot claim partial gpu. But it would be still preferred to have extended resources in the customized cluster modeling just make it consistent with cluster resources ?

I think the GPU also might have the fragmentation issue if you are saying General Cluster Modeling, for example, we have 3 nodes and each nodes has 1 GPU left, Karmada would think the cluster has 3 GPU, thus could assign a job that requires 2 or 3 GPU. Am I right?

wengyao04 · 2023-09-12T14:21:29Z

Hi @RainbowMango:
You are right, we also have GUP fragmentation. In the example, we have 4 nodes and each nodes has 1 GPU left, if we require one master and one worker each of which requires 2 GPU, it won't work. It would be helpful to support extended resource in the the resouceModel

RainbowMango · 2023-11-14T11:38:59Z

Hi @wengyao04 I believe it's a reasonable feature. And I asked @chaosi-zju to help with this. He will sync the progress here.

chaosi-zju · 2023-11-16T01:17:05Z

/assign

wengyao04 · 2023-11-21T15:37:27Z

Hi @chaosi-zju thank you for your demon in the community meetup. Do we have a nightly build version of Karmda schedule image and we can try it out on our platform ?

RainbowMango · 2023-11-22T03:03:26Z

I just talked to @chaosi-zju this morning, he will send the PR this week. Hopefully, it can be included in the coming v1.8 releases by the end of this month.

chaosi-zju · 2023-11-22T13:29:39Z

Hi @chaosi-zju thank you for your demon in the community meetup. Do we have a nightly build version of Karmda schedule image and we can try it out on our platform ?

Hi @wengyao04, sorry for delay, I will submit the PR as soon as possible~

RainbowMango · 2023-11-24T02:55:37Z

/kind feature

RainbowMango · 2023-11-24T06:29:30Z

Hi @wengyao04 This feature(#4307) has been merged, you can test it with the latest image now.
Thanks for spotting this, your feedback means a lot to the community.

This feature will be released in release-1.8 by the end of this month. If you want a preview release before the release, please feel free to let me know.

wengyao04 · 2023-11-24T14:47:23Z

Hi @RainbowMango and @chaosi-zju Thank you very much to provide this feature ! We will sync the latest images and test it out

wengyao04 · 2023-11-28T03:08:41Z

Hi @RainbowMango and @chaosi-zju we test out the latest image and it can categorize our gpu nodes correctly. But we find that the resoucesModel causes potential resources waste (underutilization) because the node is categorized also by the lowest resources. This resources underutilization is even worse if our cluster is mixed with gpu and cpu boxes.

Let me give an simple example. Suppose there are 3 nodes in my cluster

node 1: 72 CPUs, 512Gi Memory, 4 GPUs
node 2: 72 CPUs, 512Gi Memory, 4 GPUs
node 3: 72 CPUs, 512Gi Memory, 0 GPUs

If we define resoucesModel like the following.

resourceModels:
- grade: 0
 ranges:
  - min: "0"
    max: "4"
    name: cpu
  - min: "0"
    max: 16Gi
    name: memory
  - min: "0"
    max: "1"
    name: nvidia.com/gpu
- grade: 1
  ranges:
  - min: "4"
    max: "16"
    name: cpu
  - min: 16Gi
    max: 128Gi
    name: memory
  - min: "1"
    max: "2"
    name: nvidia.com/gpu
- grade: 2
  ranges:
  - min: "16"
    max: "32"
    name: cpu
  - min: 128Gi
    max: 256Gi
    name: memory
  - min: "2"
    max: "3"
    name: nvidia.com/gpu
- grade: 3
  ranges:
  - min: "32"
    max: "48"
    name: cpu
  - min: 256Gi
    max: 384Gi
    name: memory
  - min: "3"
    max: "4"
    name: nvidia.com/gpu
- grade: 4
  ranges:
  - min: "48"
    max: "9223372036854775807"
    name: cpu
  - min: 384Gi
    max: "9223372036854775807"
    name: memory
  - min: "4"
    max: "9223372036854775807"
    name: nvidia.com/gpu

Then two gpu nodes are in grade 4 and the cpu node is in grade 0. If our two GPU nodes are fully occupied, and users submit a CPU workload requires 10 CPU and 100 Gi memory. This workload cannot be scheduled because the Karmada scheduler think the the cpu node is in grade 0 and don't have enough resources although the cluster summary shows there are enough allocatable resources.

I can put more cpu/memory in grade 0 and resources underutilization always exists. Could we have suggestion from the community to properly set the resourcesModel ?

Thank you !

RainbowMango · 2023-11-28T04:06:31Z

Then two gpu nodes are in grade 4 and the cpu node is in grade 0.

I'm surprised that, given 72 CPUs on each node, I suppose the CPU node should be in grade 4 （[48, ~).
Can you share the status of the testing cluster? Including the resource module configuration in .spec.resourceModels and the resourceSummary in .status.resourceSummary?

wengyao04 · 2023-11-28T04:38:13Z

Hi @RainbowMango In our real clusters we have total 6 nodes: 4 gpu and 2 cpu boxes, this is the summary from cluster status

  kubernetesVersion: v1.24.13
  nodeSummary:
    readyNum: 6
    totalNum: 6
  resourceSummary:
    allocatable:
      cpu: "432"
      ephemeral-storage: "16767979331679"
      hugepages-1Gi: "0"
      hugepages-2Mi: "0"
      memory: 3236824812Ki
      nvidia.com/gpu: "8"
      pods: "660"
    allocatableModelings:
    - count: 4
      grade: 0
    - count: 0
      grade: 1
    - count: 0
      grade: 2
    - count: 0
      grade: 3
    - count: 2
      grade: 4
    allocated:
      cpu: 51835m
      memory: 36420050Ki
      pods: "145"

The 4 CPU nodes are categorized at grade 0. I see you have this logic https://github.com/karmada-io/karmada/blob/master/pkg/modeling/modeling.go#L123 but cpu node always have GPU quantity 0 and cannot be categorized to other grades.

I can increase the grade 0 cpu/memory but it will cause GPU resource waste.
For a simple example if my resourcesModel is like

resourceModels:
- grade: 0
 ranges:
  - min: "0"
    max: "32"
    name: cpu
  - min: "0"
    max: 256Gi
    name: memory
  - min: "0"
    max: "1"
    name: nvidia.com/gpu
- grade: 1
  ranges:
  - min: "32"
    max: "40"
    name: cpu
  - min: 256Gi
    max: 320Gi
    name: memory
  - min: "1"
    max: "2"
    name: nvidia.com/gpu
- grade: 2
  ranges:
  - min: "40"
    max: "48"
    name: cpu
  - min: 320Gi
    max: 384Gi
    name: memory
  - min: "2"
    max: "3"
    name: nvidia.com/gpu
- grade: 3
  ranges:
  - min: "48"
    max: "56"
    name: cpu
  - min: 384Gi
    max: 464Gi
    name: memory
  - min: "3"
    max: "4"
    name: nvidia.com/gpu
- grade: 4
  ranges:
  - min: "56"
    max: "9223372036854775807"
    name: cpu
  - min: 464Gi
    max: "9223372036854775807"
    name: memory
  - min: "4"
    max: "9223372036854775807"
    name: nvidia.com/gpu

Then I will have 4 cpu nodes in grade 0 and 2 gpu nodes in grade 4
If I have two gpu workloads each of them using 1 GPU but 40 cpu and 260 Gi memory. Although each gpu node still has 3 GPUs, 32 cpu and 452 Gi memory, the gpu nodes is still categorized as grade 0 and cause gpu resource under utilization.

wengyao04 · 2023-11-28T04:42:37Z

I think in our current situation, we probably disable resoucesModel feature gate and just use resource summary during the scheduling. For the resource fragmentation issues, we probably enable volcano gang scheduler on our member clusters to avoid partial running jobs and surface a clear message to the clients.

I think there is always a tradeoff if we cannot cache all member cluster's nodes in the scheduler cache.

chaosi-zju · 2023-11-28T07:55:24Z

@wengyao04 please give me some time to think about it, I will feedback as soon as possible~

chaosi-zju · 2023-11-28T16:55:04Z

Hi @wengyao04

For your scenario, using ResourceModel may be really not the most suitable choice, because ResourceModel is supposed to be a rough estimation, not that so precise. And, in your scenario, the shortcomings of ResourceModel are more clearly exposed since:

You have a large range of CPU and Memory, for example, CPU 0 ~ 72C, MEM 0 ~ 560Gi, and a small range of GPU only 0~4, which makes it really difficult to divide the ResourceModel.
ResourceModel fits the positive correlation between the amount of resources required, for example the more CPU a workload requires, the more memory it tends to require, so CPU and memory are positively correlated. However, GPUs are not so strongly correlated with them.

However, it doesn't mean Karmada not support your scenario, there are other ways in Karmada if you need more accurate scheduler !

Another option is to use karmada-scheduler-estimator, the downside is that you need to deploy an additional component (means costing more resource), you can refer to Cluster Accurate Scheduler Estimator For Rescheduling for more information.

This component list/watch the node object of every member cluster and have a accurate overview of remaining resource of each node.

I'll write another demo using karmada-scheduler-estimator specifically according to your scenario in following comments~

Besides, I want to known which installation method you used ? If you have problem in install karmada-scheduler-estimator component, feel free to ask me~

chaosi-zju · 2023-11-28T18:30:30Z

@wengyao04 the demo of karmada-scheduler-estimator: https://h3ld32xlpo.feishu.cn/wiki/V7shw9Q3kiGkCak4ELocSzDen4g

wengyao04 · 2023-11-29T18:11:19Z

Hi @chaosi-zju Thank you very much. I disable CustomizedClusterResourceModeling and deploy the cluster estimator. It meets our requirements.

One small issue is that the helm char only supporting one cluster https://github.com/karmada-io/karmada/blob/master/charts/karmada/values.yaml#L742 I manually install one for another member cluster. could we make them as a list ?

chaosi-zju · 2023-11-30T01:26:49Z

could we make them as a list ?

Got it, it makes sense. I will support it~

chaosi-zju · 2023-11-30T14:32:13Z

Hi @wengyao04

I have evaluated that it's feasible to change the estimator in helm to list format, can you create a Issue in Karmada repo for me, I will try to submit a related PR to achieve it in this week~

Meanwhile, since you are using the helm installation method, I would like to ask you if there is anything you find troublesome in the process of helm installation? or what needs to be improved in the installation experience? Can you provide us with suggestions to improve the installation experience? You can create another issue that describes what your ideal perfect installation would look like.

wengyao04 · 2023-12-04T15:26:55Z

Hi @chaosi-zju Thank you very much. I submit a separate issue #4368

wengyao04 added the kind/question label Sep 11, 2023

RainbowMango added this to the v1.8 milestone Sep 12, 2023

karmada-bot assigned chaosi-zju Nov 16, 2023

chaosi-zju mentioned this issue Nov 22, 2023

resourceModels supports extended resources #4307

Merged

karmada-bot added the kind/feature label Nov 24, 2023

karmada-bot closed this as completed in #4307 Nov 24, 2023

RainbowMango added this to Karmada Release 1.8 Nov 24, 2023

RainbowMango moved this to Done in Karmada Release 1.8 Nov 24, 2023

chaosi-zju mentioned this issue Dec 2, 2023

make estimator config in helm values as a list #4358

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

resourceModels supports extended resources #4050

resourceModels supports extended resources #4050

wengyao04 commented Sep 11, 2023

RainbowMango commented Sep 12, 2023

wengyao04 commented Sep 12, 2023

RainbowMango commented Nov 14, 2023

chaosi-zju commented Nov 16, 2023

wengyao04 commented Nov 21, 2023

RainbowMango commented Nov 22, 2023

chaosi-zju commented Nov 22, 2023

RainbowMango commented Nov 24, 2023

RainbowMango commented Nov 24, 2023

wengyao04 commented Nov 24, 2023

wengyao04 commented Nov 28, 2023

RainbowMango commented Nov 28, 2023

wengyao04 commented Nov 28, 2023 •

edited

Loading

wengyao04 commented Nov 28, 2023

chaosi-zju commented Nov 28, 2023

chaosi-zju commented Nov 28, 2023 •

edited

Loading

chaosi-zju commented Nov 28, 2023

wengyao04 commented Nov 29, 2023

chaosi-zju commented Nov 30, 2023

chaosi-zju commented Nov 30, 2023

wengyao04 commented Dec 4, 2023

resourceModels supports extended resources #4050

resourceModels supports extended resources #4050

Comments

wengyao04 commented Sep 11, 2023

RainbowMango commented Sep 12, 2023

wengyao04 commented Sep 12, 2023

RainbowMango commented Nov 14, 2023

chaosi-zju commented Nov 16, 2023

wengyao04 commented Nov 21, 2023

RainbowMango commented Nov 22, 2023

chaosi-zju commented Nov 22, 2023

RainbowMango commented Nov 24, 2023

RainbowMango commented Nov 24, 2023

wengyao04 commented Nov 24, 2023

wengyao04 commented Nov 28, 2023

RainbowMango commented Nov 28, 2023

wengyao04 commented Nov 28, 2023 • edited Loading

wengyao04 commented Nov 28, 2023

chaosi-zju commented Nov 28, 2023

chaosi-zju commented Nov 28, 2023 • edited Loading

chaosi-zju commented Nov 28, 2023

wengyao04 commented Nov 29, 2023

chaosi-zju commented Nov 30, 2023

chaosi-zju commented Nov 30, 2023

wengyao04 commented Dec 4, 2023

wengyao04 commented Nov 28, 2023 •

edited

Loading

chaosi-zju commented Nov 28, 2023 •

edited

Loading