Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate resources that share the same flavors in a ClusterQueue #308

Closed
alculquicondor opened this issue Jul 29, 2022 · 7 comments · Fixed by #326
Closed

Validate resources that share the same flavors in a ClusterQueue #308

alculquicondor opened this issue Jul 29, 2022 · 7 comments · Fixed by #326
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@alculquicondor
Copy link
Contributor

alculquicondor commented Jul 29, 2022

This is a spinoff from #167 (comment), and a pre-requisite for #296

We need to validate that, in a ClusterQueue two resources either have completely different flavors or they must share all flavors, in the same order

@alculquicondor
Copy link
Contributor Author

/kind feature

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Jul 29, 2022
@kerthcet
Copy link
Contributor

kerthcet commented Aug 1, 2022

in a ClusterQueue two resources either have completely different flavors

Why? This is too strict. I think this is similar to cluster autoscaler, when scaling up, we may have some kinds of nodes as the first choice, but we may still have fallback choices in case of insufficient of the preference ones. They maybe general nodes(compared to high-performance nodes), pay-as-you-go ones or preemptible instances. This kind of nodes can be shared by different node pools.

In terms of flavors, it's same, we may have some kinds of nodes for general usage which referenced by a special flavor and meets the requirements of different resources.

Some immature ideas, is it possible to add quota to resourceFlavor instead of clusterQueue, the resourceFlavor refers to special type of resources, quota can also be part of the resource properties. Currently, resourceFlavor only plays a role as nodeAffinity, which can also achieved by other ways like pod's selector. It seems we haven't leverage it totally.

@alculquicondor
Copy link
Contributor Author

I think the idea wasn't clear. With the proposal, this should be possible (removing the quotas just to show the point):

  resources:
  - name: "cpu"
    flavors: ["spot", "ondemand"]
  - name: "memory"
    flavors: ["spot", "ondemand"]
  - name: "example.com/gpu"
    flavors: ["a100", "t4", "k80"]

In this case, CPU and memory are coupled together. When a workload is assigned to this ClusterQueue, it would be given the same flavor for CPU and memory. The next workload could get a different flavor, but it needs to be the same for CPU and memory as well.

Achieving this coupling would be difficult if we allowed something like this:

  - name: "cpu"
    flavors: ["spot", "ondemand"]
  - name: "memory"
    flavors: ["spot", "default"]

Note that the limitation is within one ClusterQueue. If there is another ClusterQueue in which CPU and memory don't need to be coupled, they can use different flavors.

Some immature ideas, is it possible to add quota to resourceFlavor instead of clusterQueue, the resourceFlavor refers to special type of resources, quota can also be part of the resource properties. Currently, resourceFlavor only plays a role as nodeAffinity, which can also achieved by other ways like pod's selector. It seems we haven't leverage it totally.

See the motivation for ResourceFlavor here: #59

Basically putting the node labels in a separate object prevents multiple ClusterQueues from definining the same flavor but with different labels.

@kerthcet
Copy link
Contributor

kerthcet commented Aug 9, 2022

Still think we should allow this configurations.

  - name: "cpu"
    flavors: ["spot", "on-demand"]
  - name: "memory"
    flavors: ["spot", "default", "on-demand"]

This is actually a fallback strategy, common in use.

pseudo-code below:

loop:
for podSet in TotalRequests {
    qualifiedFlavors, ok := findFlavorsMeetThePodSetResourceRequirements()
    if !ok {
        return error
    }
    
    for flavor in qualifiedFlavors {
            if meetResourceQuotaRequirements {
                assignFlavors()
                break loop
            } else {
                continue
            }
    }

    if haveCohort {
        for flavor in qualifiedFlavors {
              ok = borrowFlavorSpecifiedResources()
              if ok {
                  assignFlavors()
                      break loop
                  }
               return error
        }
    }
}

@alculquicondor
Copy link
Contributor Author

That particular example doesn't make sense to me: "spot" and "on-demand" are the opposite, and thus they cover all the possibilities. Also, why wouldn't "cpu": ["spot", "default", "on-demand"] not make sense?
In which case you could assign "cpu": "spot", "memory": "default"?

@alculquicondor
Copy link
Contributor Author

/assign

@alculquicondor
Copy link
Contributor Author

/priority important-soon

@k8s-ci-robot k8s-ci-robot added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Aug 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants