Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fresh rescheduling not happening via workloadrebalancer #5070

Open
bharathguvvala opened this issue Jun 20, 2024 · 22 comments
Open

Fresh rescheduling not happening via workloadrebalancer #5070

bharathguvvala opened this issue Jun 20, 2024 · 22 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@bharathguvvala
Copy link

bharathguvvala commented Jun 20, 2024

What happened:

According to the documenation, a fresh rescheduling should happen upon the creation of a workloadrebalancer resource. With propagationpolicy having cluster affinities, while rescheduling is happening the scheduling algorithm is still honoring the previous schedulerObservedAffinity which means that it doesn't attempt to schedule the workload to cluster groups which have affinityIndex < schedulerObservedAffinity

What you expected to happen:

A fresh rescheduling should attempt to schedule across all cluster affinity groups irrespective of the scheduleObservedAffinity.

How to reproduce it (as minimally and precisely as possible):

Have two cluster affinities A and B.

  1. Deploy a workload which gets scheduled to A
  2. Make the pods unschedulable causing the deployment to get descheduled and scheduled to B
  3. Free up the capacity in A
  4. Create WorkloadRebalancer to trigger a rescheduling.
  5. Workload is still scheduled on B

Anything else we need to know?:

Environment:

  • Karmada version: 1.10.0
  • kubectl-karmada or karmadactl version (the result of kubectl-karmada version or karmadactl version):
  • Others:
@bharathguvvala bharathguvvala added the kind/bug Categorizes issue or PR as related to a bug. label Jun 20, 2024
@RainbowMango
Copy link
Member

cc @chaosi-zju for help

@chaosi-zju
Copy link
Member

chaosi-zju commented Jun 21, 2024

Hi @bharathguvvala, thank you for your feedback~

In deed, according to current implementation you are right, as for multiple clusterAffinities, now the scheduling algorithm is still honoring the previous schedulerObservedAffinity.

Two main considerations before, according to proposal:

1)In Motivation chapter:

Assuming the user has propagated the workloads to member clusters, in some scenarios the current replicas distribution is not the most expected, such as...

However, when designing multiple clusterAffinities, there is no good or bad distinction between different clusterAffinities, and the first clusterAffinity is not explicitly specified as most expected. They are all considered good choices, and any one can be chosen, and, they are just different cluster combinations.

Besides, due to the limitations of the ability of multiple clusterAffinity, there is currently only the ability to choose the next clusterAffinity and no ability to return to the previous clusterAffinity.

2)In Constraints chapter:

it is only guaranteed that the new schedule result meets current Placement

The stories mentioned in the proposal have one thing in common, that is, the actual distribution of replicas deviates from the expectation in the policy. However, in this example, since multiple clusterAffinities are not good or bad distinction, the current scheduling result actually meets the expectations of the policy.


Of course, this is just a previous consideration, and it may not be considered carefully because it has not encountered the scene of the real production environment. After above description, do you still think that rescheduling is necessary to switch back to the first clusterAffinity? Why? We can continue to discuss the rationality of your appeal~

@bharathguvvala
Copy link
Author

bharathguvvala commented Jun 21, 2024

@chaosi-zju Thanks for the response. Please see my responses inline.

This is what's cited as a usecase in the motivation

replicas migrated due to cluster failover, while now cluster recovered.
replicas migrated due to application-level failover, while now each cluster has sufficient resources to run the replicas.

In both these examples, falling back to the previous affinity is the intended effect. One practical example is if the clusters involved here are part of a private on-prem (A) and public (B) clouds where A is preferred given the higher costs of B and is meant to be used only for bursting or failover. This is the scenario we are attempting to solve at our company by leveraging Karmada.

However, when designing multiple clusterAffinities, there is no good or bad distinction between different clusterAffinities, and the first clusterAffinity is not explicitly specified as most expected. They are all considered good choices, and any one can be chosen, and, they are just different cluster combinations.

The example I cited above goes against the premise posed here that the order of affinities does not influence the most preferred. While that's expected in a normal scheduling , Shouldn't a fresh reschedule mean scheduling without any regard for the current placement or schedulerObservedAffinity similar to how a first time scheduling is done?

Is it possible to introduce a flag in the WorkloadRebalancer to introduce the behaviour of fresh scheduling?

@chaosi-zju
Copy link
Member

Hi @bharathguvvala, your point of view gives us great reference value, thanks~

In both these examples, falling back to the previous affinity is the intended effect.

The original intention of these two stories you mentioned is when your single clusterAffinity has multiple clusters, just like:

...
spec:
  placement:
    clusterAffinity:
      clusterNames:
        - member1
        - member2
...

Then, if the member1 cluster failover, replicas will all migrate to member2 cluster. When member1 cluster recovered, you can use WorkloadRebalancer to migrated replicas back to member1+member2.

One practical example is if the clusters involved here are part of a private on-prem (A) and public (B) clouds where A is preferred given the higher costs of B and is meant to be used only for bursting or failover. This is the scenario we are attempting to solve at our company by leveraging Karmada.

However, after listening to the scene you described, I think your usage and appeal are reasonable. I think we really need to support this ability.

But, as I said about multiple clusterAffinities, we currently only has the ability to choose the next clusterAffinity and no ability to return to the previous clusterAffinity, it will take us some time to design and evolve this capability of WorkloadBalancer.

CC @RainbowMango what do you think about this case?

@bharathguvvala
Copy link
Author

I am willing to contribute. Just wondering if there Is a possibility of me participating in the design discussions and contributing to the feature?

@chaosi-zju
Copy link
Member

Just wondering if there Is a possibility of me participating in the design discussions and contributing to the feature?

Of course you can!

Karmada is very happy to welcome new contributors to the community, and karmada has always embraced open source enthusiasts with openness and humility~

@bharathguvvala
Copy link
Author

So how should I proceed? Should I create an enhancement proposal -- I am thinking that this capability can be a part of workload rebalancer which is supposed to trigger a fresh reschedule?

@chaosi-zju
Copy link
Member

So how should I proceed?

Hi, I think we can proceed as follows:

  1. First, you can summarize the general direction and ideas, and describe in this issue how you plan to achieve this capability. We will invite the feature owner of WorkloadRebalancer and Multiple ClusterAffinity involved to have a pleasant discussion together.
  2. Then, you can share your ideas at Regular Community Meeting at Tuesday 08:00 Pacific Time (English) (biweekly). When you are ready, you can post a Meeting Notes and Agenda here.
  3. Next, you can submit your complete proposal and try to realize it~

@bharathguvvala
Copy link
Author

bharathguvvala commented Jun 28, 2024

@chaosi-zju Thanks for the response. In summary, what has been discussed in this thread is what the concerns are ie. lack of ability to do a fresh reschedule where the workloads need to be relocated back to the original clusters where the conditions are satisfied (either due to re-availability of capacity or the cluster becoming available if there was a failover etc). Currently no construct of Karmada enables this reverse migration workflow. Almost all the scheduling flows taking into account the previous scheduling context.

I see WorkloadRebalancer as a natural fit to solve this by providing a control to the user to signal a fresh reschedule (preferably through a field spec.freshReschedule) without honoring the previous schedule context. Such reschedules may or may not cause the workload to be relocated depending on where it was scheduled prior. It is upto the user who triggers it to decide when a fresh reschedule is triggered via WorkloadRebalancer and that the user is fully aware of the implications of such an action. Users may build workflows to trigger any periodic fresh reschedules as well to suit their needs.

I am willing to discuss this in the next community meeting. Should I go ahead and add it to the meeting agenda?

@chaosi-zju
Copy link
Member

I am willing to discuss this in the next community meeting. Should I go ahead and add it to the meeting agenda?

Yes, I am very glad that you can go ahead and please add your topic to the meeting agenda, next english meeting is at the date of 2024-07-09.

By the way, what time zone are you in now? Does time ponit 08:00 Pacific Time (English) bother you? If there is any difficulty, we will try our best to coordinate the time.

@chaosi-zju
Copy link
Member

I see WorkloadRebalancer as a natural fit to solve this by providing a control to the user to signal a fresh reschedule (preferably through a field spec.freshReschedule) without honoring the previous schedule context. Such reschedules may or may not cause the workload to be relocated depending on where it was scheduled prior. It is upto the user who triggers it to decide when a fresh reschedule is triggered via WorkloadRebalancer and that the user is fully aware of the implications of such an action. Users may build workflows to trigger any periodic fresh reschedules as well to suit their needs.

Thank you very much for sharing your opinion. You have given me further insight into the expected features from a user demand perspective. And, I agree with what you said "It is upto the user who triggers it to decide when a fresh reschedule is triggered via WorkloadRebalancer and that the user is fully aware of the implications of such an action".

By the way, have you considered from an implementation perspective how to achieve what you said "without honoring the previous schedule context"? For example, as for multiple clusterAffinities, from implementation perspective, how can karmada-scheduler drop current clusterAffinity and go back to first clusterAffinity? I'm looking forward to your opinions on these questions.

@bharathguvvala
Copy link
Author

bharathguvvala commented Jun 28, 2024

I was thinking for a fresh reschedule we could start the evaluation freshly instead of resuming from the current schedulerrObservingAffinityName by nullifying the rb.Status.SchedulerObservedAffinityName of the workload from the WorkloadRebalancer controller. I am not sure if this introduces some side effects.

@chaosi-zju
Copy link
Member

chaosi-zju commented Jun 29, 2024

I was thinking for a fresh reschedule we could start the evaluation freshly instead of resuming from the current schedulerrObservingAffinityName by nullifying the rb.Status.SchedulerObservedAffinityName of the workload from the WorkloadRebalancer controller. I am not sure if this introduces some side effects.

CC @XiShanYongYe-Chang what do you think about his thought on refreshing multiple clusterAffinities?

@XiShanYongYe-Chang
Copy link
Member

Thanks, let me take a look.

@XiShanYongYe-Chang
Copy link
Member

Sorry for replying late.

I see WorkloadRebalancer as a natural fit to solve this by providing a control to the user to signal a fresh reschedule (preferably through a field spec.freshReschedule) without honoring the previous schedule context.

I think this is a good direction. We did not provide the capability of resetting the scheduling group because there was no user case support for multi clusteraffinities group scheduling. Your case may be a good start. Thanks @bharathguvvala

@XiShanYongYe-Chang
Copy link
Member

/kind feature
/remove-kind bug

@karmada-bot karmada-bot added kind/feature Categorizes issue or PR as related to a new feature. and removed kind/bug Categorizes issue or PR as related to a bug. labels Jul 2, 2024
@chaosi-zju
Copy link
Member

Hi @bharathguvvala, tomorrow is the community meeting~

Is it convenient for you tomorrow to share your topic? If it is convenient, please add the topic to the agenda.

Thank you very much, I'm looking forward to your performance!

@chaosi-zju
Copy link
Member

related issue: #4990 , just recording

@bharathguvvala
Copy link
Author

Hi @bharathguvvala, tomorrow is the community meeting~

Is it convenient for you tomorrow to share your topic? If it is convenient, please add the topic to the agenda.

Thank you very much, I'm looking forward to your performance!

Requested edit access to the document.

@chaosi-zju
Copy link
Member

Requested edit access to the document.

Hi @bharathguvvala, it seems you can get edit permissions automatically~

By joining the google groups you will be able to edit the meeting notes.
Join google group mailing list: https://groups.google.com/forum/#!forum/karmada

image

@bharathguvvala
Copy link
Author

@chaosi-zju Thanks. Added it to the agenda.

@chaosi-zju
Copy link
Member

hello @bharathguvvala, I created an issue to track the progress of your subsequent related activities #5172.

You can take charge of this feature and move forward to implement it, come on~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
Status: No status
Development

No branches or pull requests

5 participants