Fresh rescheduling not happening via workloadrebalancer #5070

bharathguvvala · 2024-06-20T12:43:58Z

What happened:

According to the documenation, a fresh rescheduling should happen upon the creation of a workloadrebalancer resource. With propagationpolicy having cluster affinities, while rescheduling is happening the scheduling algorithm is still honoring the previous schedulerObservedAffinity which means that it doesn't attempt to schedule the workload to cluster groups which have affinityIndex < schedulerObservedAffinity

What you expected to happen:

A fresh rescheduling should attempt to schedule across all cluster affinity groups irrespective of the scheduleObservedAffinity.

How to reproduce it (as minimally and precisely as possible):

Have two cluster affinities A and B.

Deploy a workload which gets scheduled to A
Make the pods unschedulable causing the deployment to get descheduled and scheduled to B
Free up the capacity in A
Create WorkloadRebalancer to trigger a rescheduling.
Workload is still scheduled on B

Anything else we need to know?:

Environment:

Karmada version: 1.10.0
kubectl-karmada or karmadactl version (the result of kubectl-karmada version or karmadactl version):
Others:

The text was updated successfully, but these errors were encountered:

RainbowMango · 2024-06-21T01:08:48Z

cc @chaosi-zju for help

chaosi-zju · 2024-06-21T02:01:47Z

Hi @bharathguvvala, thank you for your feedback~

In deed, according to current implementation you are right, as for multiple clusterAffinities, now the scheduling algorithm is still honoring the previous schedulerObservedAffinity.

Two main considerations before, according to proposal:

1）In Motivation chapter:

Assuming the user has propagated the workloads to member clusters, in some scenarios the current replicas distribution is not the most expected, such as...

However, when designing multiple clusterAffinities, there is no good or bad distinction between different clusterAffinities, and the first clusterAffinity is not explicitly specified as most expected. They are all considered good choices, and any one can be chosen, and, they are just different cluster combinations.

Besides, due to the limitations of the ability of multiple clusterAffinity, there is currently only the ability to choose the next clusterAffinity and no ability to return to the previous clusterAffinity.

2）In Constraints chapter:

it is only guaranteed that the new schedule result meets current Placement

The stories mentioned in the proposal have one thing in common, that is, the actual distribution of replicas deviates from the expectation in the policy. However, in this example, since multiple clusterAffinities are not good or bad distinction, the current scheduling result actually meets the expectations of the policy.

Of course, this is just a previous consideration, and it may not be considered carefully because it has not encountered the scene of the real production environment. After above description, do you still think that rescheduling is necessary to switch back to the first clusterAffinity? Why? We can continue to discuss the rationality of your appeal~

bharathguvvala · 2024-06-21T06:19:43Z

@chaosi-zju Thanks for the response. Please see my responses inline.

This is what's cited as a usecase in the motivation

replicas migrated due to cluster failover, while now cluster recovered.
replicas migrated due to application-level failover, while now each cluster has sufficient resources to run the replicas.

In both these examples, falling back to the previous affinity is the intended effect. One practical example is if the clusters involved here are part of a private on-prem (A) and public (B) clouds where A is preferred given the higher costs of B and is meant to be used only for bursting or failover. This is the scenario we are attempting to solve at our company by leveraging Karmada.

However, when designing multiple clusterAffinities, there is no good or bad distinction between different clusterAffinities, and the first clusterAffinity is not explicitly specified as most expected. They are all considered good choices, and any one can be chosen, and, they are just different cluster combinations.

The example I cited above goes against the premise posed here that the order of affinities does not influence the most preferred. While that's expected in a normal scheduling , Shouldn't a fresh reschedule mean scheduling without any regard for the current placement or schedulerObservedAffinity similar to how a first time scheduling is done?

Is it possible to introduce a flag in the WorkloadRebalancer to introduce the behaviour of fresh scheduling?

chaosi-zju · 2024-06-21T07:01:35Z

Hi @bharathguvvala, your point of view gives us great reference value, thanks~

In both these examples, falling back to the previous affinity is the intended effect.

The original intention of these two stories you mentioned is when your single clusterAffinity has multiple clusters, just like:

...
spec:
  placement:
    clusterAffinity:
      clusterNames:
        - member1
        - member2
...

Then, if the member1 cluster failover, replicas will all migrate to member2 cluster. When member1 cluster recovered, you can use WorkloadRebalancer to migrated replicas back to member1+member2.

One practical example is if the clusters involved here are part of a private on-prem (A) and public (B) clouds where A is preferred given the higher costs of B and is meant to be used only for bursting or failover. This is the scenario we are attempting to solve at our company by leveraging Karmada.

However, after listening to the scene you described, I think your usage and appeal are reasonable. I think we really need to support this ability.

But, as I said about multiple clusterAffinities, we currently only has the ability to choose the next clusterAffinity and no ability to return to the previous clusterAffinity, it will take us some time to design and evolve this capability of WorkloadBalancer.

CC @RainbowMango what do you think about this case?

bharathguvvala · 2024-06-21T07:24:55Z

I am willing to contribute. Just wondering if there Is a possibility of me participating in the design discussions and contributing to the feature?

chaosi-zju · 2024-06-21T07:50:17Z

Just wondering if there Is a possibility of me participating in the design discussions and contributing to the feature?

Of course you can！

Karmada is very happy to welcome new contributors to the community, and karmada has always embraced open source enthusiasts with openness and humility~

bharathguvvala · 2024-06-24T06:07:51Z

So how should I proceed? Should I create an enhancement proposal -- I am thinking that this capability can be a part of workload rebalancer which is supposed to trigger a fresh reschedule?

chaosi-zju · 2024-06-24T12:32:59Z

So how should I proceed?

Hi, I think we can proceed as follows:

First, you can summarize the general direction and ideas, and describe in this issue how you plan to achieve this capability. We will invite the feature owner of WorkloadRebalancer and Multiple ClusterAffinity involved to have a pleasant discussion together.
Then, you can share your ideas at Regular Community Meeting at Tuesday 08:00 Pacific Time (English) (biweekly). When you are ready, you can post a Meeting Notes and Agenda here.
Next, you can submit your complete proposal and try to realize it~

bharathguvvala · 2024-06-28T07:29:46Z

@chaosi-zju Thanks for the response. In summary, what has been discussed in this thread is what the concerns are ie. lack of ability to do a fresh reschedule where the workloads need to be relocated back to the original clusters where the conditions are satisfied (either due to re-availability of capacity or the cluster becoming available if there was a failover etc). Currently no construct of Karmada enables this reverse migration workflow. Almost all the scheduling flows taking into account the previous scheduling context.

I see WorkloadRebalancer as a natural fit to solve this by providing a control to the user to signal a fresh reschedule (preferably through a field spec.freshReschedule) without honoring the previous schedule context. Such reschedules may or may not cause the workload to be relocated depending on where it was scheduled prior. It is upto the user who triggers it to decide when a fresh reschedule is triggered via WorkloadRebalancer and that the user is fully aware of the implications of such an action. Users may build workflows to trigger any periodic fresh reschedules as well to suit their needs.

I am willing to discuss this in the next community meeting. Should I go ahead and add it to the meeting agenda?

chaosi-zju · 2024-06-28T08:59:20Z

I am willing to discuss this in the next community meeting. Should I go ahead and add it to the meeting agenda?

Yes, I am very glad that you can go ahead and please add your topic to the meeting agenda, next english meeting is at the date of 2024-07-09.

By the way, what time zone are you in now? Does time ponit 08:00 Pacific Time (English) bother you? If there is any difficulty, we will try our best to coordinate the time.

chaosi-zju · 2024-06-28T09:17:34Z

I see WorkloadRebalancer as a natural fit to solve this by providing a control to the user to signal a fresh reschedule (preferably through a field spec.freshReschedule) without honoring the previous schedule context. Such reschedules may or may not cause the workload to be relocated depending on where it was scheduled prior. It is upto the user who triggers it to decide when a fresh reschedule is triggered via WorkloadRebalancer and that the user is fully aware of the implications of such an action. Users may build workflows to trigger any periodic fresh reschedules as well to suit their needs.

Thank you very much for sharing your opinion. You have given me further insight into the expected features from a user demand perspective. And, I agree with what you said "It is upto the user who triggers it to decide when a fresh reschedule is triggered via WorkloadRebalancer and that the user is fully aware of the implications of such an action".

By the way, have you considered from an implementation perspective how to achieve what you said "without honoring the previous schedule context"? For example, as for multiple clusterAffinities, from implementation perspective, how can karmada-scheduler drop current clusterAffinity and go back to first clusterAffinity? I'm looking forward to your opinions on these questions.

bharathguvvala · 2024-06-28T10:30:43Z

I was thinking for a fresh reschedule we could start the evaluation freshly instead of resuming from the current schedulerrObservingAffinityName by nullifying the rb.Status.SchedulerObservedAffinityName of the workload from the WorkloadRebalancer controller. I am not sure if this introduces some side effects.

chaosi-zju · 2024-06-29T03:00:54Z

I was thinking for a fresh reschedule we could start the evaluation freshly instead of resuming from the current schedulerrObservingAffinityName by nullifying the rb.Status.SchedulerObservedAffinityName of the workload from the WorkloadRebalancer controller. I am not sure if this introduces some side effects.

CC @XiShanYongYe-Chang what do you think about his thought on refreshing multiple clusterAffinities?

XiShanYongYe-Chang · 2024-06-29T03:16:14Z

Thanks, let me take a look.

XiShanYongYe-Chang · 2024-07-02T09:39:17Z

Sorry for replying late.

I see WorkloadRebalancer as a natural fit to solve this by providing a control to the user to signal a fresh reschedule (preferably through a field spec.freshReschedule) without honoring the previous schedule context.

I think this is a good direction. We did not provide the capability of resetting the scheduling group because there was no user case support for multi clusteraffinities group scheduling. Your case may be a good start. Thanks @bharathguvvala

XiShanYongYe-Chang · 2024-07-02T09:39:41Z

/kind feature
/remove-kind bug

chaosi-zju · 2024-07-08T09:47:17Z

Hi @bharathguvvala, tomorrow is the community meeting~

Is it convenient for you tomorrow to share your topic? If it is convenient, please add the topic to the agenda.

Thank you very much, I'm looking forward to your performance!

chaosi-zju · 2024-07-08T09:49:55Z

related issue: #4990 , just recording

bharathguvvala · 2024-07-08T10:30:26Z

Hi @bharathguvvala, tomorrow is the community meeting~

Is it convenient for you tomorrow to share your topic? If it is convenient, please add the topic to the agenda.

Thank you very much, I'm looking forward to your performance!

Requested edit access to the document.

chaosi-zju · 2024-07-08T11:57:55Z

Requested edit access to the document.

Hi @bharathguvvala, it seems you can get edit permissions automatically~

By joining the google groups you will be able to edit the meeting notes.
Join google group mailing list: https://groups.google.com/forum/#!forum/karmada

bharathguvvala · 2024-07-08T17:41:31Z

@chaosi-zju Thanks. Added it to the agenda.

chaosi-zju · 2024-07-11T02:45:00Z

hello @bharathguvvala, I created an issue to track the progress of your subsequent related activities #5172.

You can take charge of this feature and move forward to implement it, come on~

bharathguvvala added the kind/bug Categorizes issue or PR as related to a bug. label Jun 20, 2024

karmada-bot added kind/feature Categorizes issue or PR as related to a new feature. and removed kind/bug Categorizes issue or PR as related to a bug. labels Jul 2, 2024

chaosi-zju mentioned this issue Jul 11, 2024

[Umbrella] Enhancement of WorkloadRebalancer to support fresh rescheduling across multiple cluster affinities #5172

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fresh rescheduling not happening via workloadrebalancer #5070

Fresh rescheduling not happening via workloadrebalancer #5070

bharathguvvala commented Jun 20, 2024 •

edited

Loading

RainbowMango commented Jun 21, 2024

chaosi-zju commented Jun 21, 2024 •

edited

Loading

bharathguvvala commented Jun 21, 2024 •

edited

Loading

chaosi-zju commented Jun 21, 2024

bharathguvvala commented Jun 21, 2024

chaosi-zju commented Jun 21, 2024

bharathguvvala commented Jun 24, 2024

chaosi-zju commented Jun 24, 2024

bharathguvvala commented Jun 28, 2024 •

edited

Loading

chaosi-zju commented Jun 28, 2024

chaosi-zju commented Jun 28, 2024

bharathguvvala commented Jun 28, 2024 •

edited

Loading

chaosi-zju commented Jun 29, 2024 •

edited

Loading

XiShanYongYe-Chang commented Jun 29, 2024

XiShanYongYe-Chang commented Jul 2, 2024

XiShanYongYe-Chang commented Jul 2, 2024

chaosi-zju commented Jul 8, 2024

chaosi-zju commented Jul 8, 2024

bharathguvvala commented Jul 8, 2024

chaosi-zju commented Jul 8, 2024

bharathguvvala commented Jul 8, 2024

chaosi-zju commented Jul 11, 2024

Fresh rescheduling not happening via workloadrebalancer #5070

Fresh rescheduling not happening via workloadrebalancer #5070

Comments

bharathguvvala commented Jun 20, 2024 • edited Loading

RainbowMango commented Jun 21, 2024

chaosi-zju commented Jun 21, 2024 • edited Loading

bharathguvvala commented Jun 21, 2024 • edited Loading

chaosi-zju commented Jun 21, 2024

bharathguvvala commented Jun 21, 2024

chaosi-zju commented Jun 21, 2024

bharathguvvala commented Jun 24, 2024

chaosi-zju commented Jun 24, 2024

bharathguvvala commented Jun 28, 2024 • edited Loading

chaosi-zju commented Jun 28, 2024

chaosi-zju commented Jun 28, 2024

bharathguvvala commented Jun 28, 2024 • edited Loading

chaosi-zju commented Jun 29, 2024 • edited Loading

XiShanYongYe-Chang commented Jun 29, 2024

XiShanYongYe-Chang commented Jul 2, 2024

XiShanYongYe-Chang commented Jul 2, 2024

chaosi-zju commented Jul 8, 2024

chaosi-zju commented Jul 8, 2024

bharathguvvala commented Jul 8, 2024

chaosi-zju commented Jul 8, 2024

bharathguvvala commented Jul 8, 2024

chaosi-zju commented Jul 11, 2024

bharathguvvala commented Jun 20, 2024 •

edited

Loading

chaosi-zju commented Jun 21, 2024 •

edited

Loading

bharathguvvala commented Jun 21, 2024 •

edited

Loading

bharathguvvala commented Jun 28, 2024 •

edited

Loading

bharathguvvala commented Jun 28, 2024 •

edited

Loading

chaosi-zju commented Jun 29, 2024 •

edited

Loading