GenerateClustersInfo should take AvailableReplicas into account when sorting #5129

mszacillo · 2024-07-03T20:11:40Z

What happened:
When scheduling, Karmada will select the clusters out of the list of candidates (ClusterDetailInfo) sorted by cluster.Score descending. This sorting is done by spreadconstraint#sortClusters, and called when running generateClustersInfo. Currently the sort only accounts for cluster.Score, but in edge cases where the score between clusters is the same, we should prefer scheduling on clusters with more availableReplicas. Instead, we just pick the first cluster our of the sorted list:

I0703 18:28:55.882315       1 util.go:78] MaxAvailableReplica scores calculated by estimator general-estimator for workload(kind=FlinkDeployment, s-spaaseng/mszacillo-karmada): [{cluster-A 60} {cluster-B 64}]
I0703 18:28:55.886486       1 util.go:78] MaxAvailableReplica scores calculated by estimator scheduler-estimator for workload(kind=FlinkDeployment, s-spaaseng/mszacillo-karmada): [{cluster-A 9} {cluster-B 20}]
I0703 18:28:55.886575       1 util.go:102] Target cluster calculated by estimators (available cluster && maxAvailableReplicas): [{cluster-A 9} {cluster-B 20}]
I0703 18:28:55.886594       1 select_clusters_by_cluster.go:34] Selecting best clusters using scores: [{cluster-A 0 9 cluster-A} {cluster-B 0 20 cluster-B}].
I0703 18:28:55.886668       1 generic_scheduler.go:101] Selected clusters: [cluster-A]

What you expected to happen:
Ideally, when creating the default list of ClusterInfo, we should additionally sort by availableReplicas. So in the above example, Karmada should select Cluster-B since the score is the same, but Cluster-B has more availableReplicas (20).

How to reproduce it (as minimally and precisely as possible):
This is easily reproducible when clusters have the same ClusterAffinity / ClusterLocality scores. In that case, the clusters are only sorted by score rather than availableReplica.

Potential fixes:
I was able to address this by adding an additional compareFunction to the generateClusterInfo method:

sortClusters(info.Clusters, func(i *ClusterDetailInfo, j *ClusterDetailInfo) *bool {
	if i.AvailableReplicas != j.AvailableReplicas {
		return pointer.Bool(i.AvailableReplicas > j.AvailableReplicas)
	}
	return nil
})

Out of curiosity, do we see future cases for custom compareFunctions in sortClusters definition? This could also be fixed by sorting clusters by default using these rules:

Sort by cluster Score
If Score is equal, sort by availableReplicas

Environment:

Karmada version: v1.9.0
Kubernetes version: v1.29.0

The text was updated successfully, but these errors were encountered:

mszacillo · 2024-07-03T20:13:31Z

/assign

RainbowMango · 2024-07-05T07:13:14Z

cc @whitewindmills here to take a look.

RainbowMango · 2024-07-05T07:19:07Z

but in edge cases where the score between clusters is the same, we should prefer scheduling on clusters with more availableReplicas. Instead, we just pick the first cluster our of the sorted list:

I agree with it.
There is no specific rule declared by the user about how to select a cluster. So it can not say we are doing wrong or not.
But, if we select the one with more availableReplicas would be a safe choice.
Can you explain your thoughts about it?

RainbowMango · 2024-07-05T07:20:32Z

Out of curiosity, do we see future cases for custom compareFunctions in sortClusters definition? This could also be fixed by sorting clusters by default using these rules

@whitewindmills Is this reserved for extensions?

whitewindmills · 2024-07-05T08:26:59Z

@whitewindmills Is this reserved for extensions?

sure

whitewindmills · 2024-07-05T08:33:30Z

it is a small optimization, fell free to address this.

whitewindmills · 2024-07-05T08:36:10Z

I prefer this to be an optimization. 😀
/remove-kind bug
/kind feature

mszacillo · 2024-07-05T15:29:32Z

Sounds good, I'll open a PR shortly.

Can you explain your thoughts about it?

In our case, we'd just like to spread out our applications across the clusters. At the moment the default behavior for the sort is to sort by cluster name if the score is the same - this leads us to pack a single cluster until there are not enough resources.

mszacillo added the kind/bug Categorizes issue or PR as related to a bug. label Jul 3, 2024

github-project-automation bot added this to Karmada Overall Backlog Jul 3, 2024

karmada-bot assigned mszacillo Jul 3, 2024

karmada-bot added kind/feature Categorizes issue or PR as related to a new feature. and removed kind/bug Categorizes issue or PR as related to a bug. labels Jul 5, 2024

mszacillo mentioned this issue Jul 5, 2024

GroupClusters should sort by score and availableReplica count #5144

Merged

karmada-bot closed this as completed in #5144 Jul 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GenerateClustersInfo should take AvailableReplicas into account when sorting #5129

GenerateClustersInfo should take AvailableReplicas into account when sorting #5129

mszacillo commented Jul 3, 2024 •

edited

Loading

mszacillo commented Jul 3, 2024

RainbowMango commented Jul 5, 2024

RainbowMango commented Jul 5, 2024

RainbowMango commented Jul 5, 2024

whitewindmills commented Jul 5, 2024

whitewindmills commented Jul 5, 2024

whitewindmills commented Jul 5, 2024

mszacillo commented Jul 5, 2024

GenerateClustersInfo should take AvailableReplicas into account when sorting #5129

GenerateClustersInfo should take AvailableReplicas into account when sorting #5129

Comments

mszacillo commented Jul 3, 2024 • edited Loading

mszacillo commented Jul 3, 2024

RainbowMango commented Jul 5, 2024

RainbowMango commented Jul 5, 2024

RainbowMango commented Jul 5, 2024

whitewindmills commented Jul 5, 2024

whitewindmills commented Jul 5, 2024

whitewindmills commented Jul 5, 2024

mszacillo commented Jul 5, 2024

mszacillo commented Jul 3, 2024 •

edited

Loading