Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GenerateClustersInfo should take AvailableReplicas into account when sorting #5129

Closed
mszacillo opened this issue Jul 3, 2024 · 8 comments · Fixed by #5144
Closed

GenerateClustersInfo should take AvailableReplicas into account when sorting #5129

mszacillo opened this issue Jul 3, 2024 · 8 comments · Fixed by #5144
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@mszacillo
Copy link
Contributor

mszacillo commented Jul 3, 2024

What happened:
When scheduling, Karmada will select the clusters out of the list of candidates (ClusterDetailInfo) sorted by cluster.Score descending. This sorting is done by spreadconstraint#sortClusters, and called when running generateClustersInfo. Currently the sort only accounts for cluster.Score, but in edge cases where the score between clusters is the same, we should prefer scheduling on clusters with more availableReplicas. Instead, we just pick the first cluster our of the sorted list:

I0703 18:28:55.882315       1 util.go:78] MaxAvailableReplica scores calculated by estimator general-estimator for workload(kind=FlinkDeployment, s-spaaseng/mszacillo-karmada): [{cluster-A 60} {cluster-B 64}]
I0703 18:28:55.886486       1 util.go:78] MaxAvailableReplica scores calculated by estimator scheduler-estimator for workload(kind=FlinkDeployment, s-spaaseng/mszacillo-karmada): [{cluster-A 9} {cluster-B 20}]
I0703 18:28:55.886575       1 util.go:102] Target cluster calculated by estimators (available cluster && maxAvailableReplicas): [{cluster-A 9} {cluster-B 20}]
I0703 18:28:55.886594       1 select_clusters_by_cluster.go:34] Selecting best clusters using scores: [{cluster-A 0 9 cluster-A} {cluster-B 0 20 cluster-B}].
I0703 18:28:55.886668       1 generic_scheduler.go:101] Selected clusters: [cluster-A]

What you expected to happen:
Ideally, when creating the default list of ClusterInfo, we should additionally sort by availableReplicas. So in the above example, Karmada should select Cluster-B since the score is the same, but Cluster-B has more availableReplicas (20).

How to reproduce it (as minimally and precisely as possible):
This is easily reproducible when clusters have the same ClusterAffinity / ClusterLocality scores. In that case, the clusters are only sorted by score rather than availableReplica.

Potential fixes:
I was able to address this by adding an additional compareFunction to the generateClusterInfo method:

sortClusters(info.Clusters, func(i *ClusterDetailInfo, j *ClusterDetailInfo) *bool {
	if i.AvailableReplicas != j.AvailableReplicas {
		return pointer.Bool(i.AvailableReplicas > j.AvailableReplicas)
	}
	return nil
})

Out of curiosity, do we see future cases for custom compareFunctions in sortClusters definition? This could also be fixed by sorting clusters by default using these rules:

  • Sort by cluster Score
  • If Score is equal, sort by availableReplicas

Environment:

  • Karmada version: v1.9.0
  • Kubernetes version: v1.29.0
@mszacillo mszacillo added the kind/bug Categorizes issue or PR as related to a bug. label Jul 3, 2024
@mszacillo
Copy link
Contributor Author

/assign

@RainbowMango
Copy link
Member

cc @whitewindmills here to take a look.

@RainbowMango
Copy link
Member

but in edge cases where the score between clusters is the same, we should prefer scheduling on clusters with more availableReplicas. Instead, we just pick the first cluster our of the sorted list:

I agree with it.
There is no specific rule declared by the user about how to select a cluster. So it can not say we are doing wrong or not.
But, if we select the one with more availableReplicas would be a safe choice.
Can you explain your thoughts about it?

@RainbowMango
Copy link
Member

Out of curiosity, do we see future cases for custom compareFunctions in sortClusters definition? This could also be fixed by sorting clusters by default using these rules

@whitewindmills Is this reserved for extensions?

@whitewindmills
Copy link
Member

@whitewindmills Is this reserved for extensions?

sure

@whitewindmills
Copy link
Member

it is a small optimization, fell free to address this.

@whitewindmills
Copy link
Member

I prefer this to be an optimization. 😀
/remove-kind bug
/kind feature

@karmada-bot karmada-bot added kind/feature Categorizes issue or PR as related to a new feature. and removed kind/bug Categorizes issue or PR as related to a bug. labels Jul 5, 2024
@mszacillo
Copy link
Contributor Author

Sounds good, I'll open a PR shortly.

Can you explain your thoughts about it?

In our case, we'd just like to spread out our applications across the clusters. At the moment the default behavior for the sort is to sort by cluster name if the score is the same - this leads us to pack a single cluster until there are not enough resources.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
Status: No status
Development

Successfully merging a pull request may close this issue.

4 participants