Multiple cluster affinity groups not working as expected #4990

vicaya · 2024-05-28T04:24:44Z

What happened:
According to https://karmada.io/docs/userguide/scheduling/resource-propagating/#multiple-cluster-affinity-groups ,
there are 2 potential use cases: 1. local bursts to cloud; 2. primary failover to backup. I tested the use case 2 with the following policy with a simple workload httpbin:

apiVersion: policy.karmada.io/v1alpha1
kind: PropagationPolicy
metadata:
 name: failover-test
spec:
 #...
 placement:
  clusterAffinities:
   - affinityName: primary
     clusterNames:
      - c0
   - affinityName: backup
     clusterNames:
      - c1
  #...

Verified that both clusters are ready
Deployed the workload. It got scheduled to c0 as expected.
Disconnected (paused) cluster c0. Verified that c0 became not ready and workload got rescheduled to c1. So far so good.
Reconnected (resumed) cluster c0. Verified that c0 became ready and the workload were running on both clusters. But after a while workload on c0 got deleted and kept running on c1.
Disconnected (paused) cluster c1. Verified that c1 became not ready, while c0 is ready. The workload never failed back to c0.

What you expected to happen:

For step 4, I expected the workload to move back to c0 according the spec order.
For step 5, I expected the workload to fail back to c0, as it's the only ready cluster.

How to reproduce it (as minimally and precisely as possible):

See the the above steps to reproduce the problem. It's as minimal as you can get.

Anything else we need to know?:

Please provide a working example policy for the primary failover to backup use case. Make sure workload would move back to primary from backup when primary is ready again.

Environment:

Karmada version: 1.9.1 (created by karmadactl init, and than edited controller timeout options to make failover happen faster)
kubectl-karmada or karmadactl version (the result of kubectl-karmada version or karmadactl version): 1.9.1
Others:

The text was updated successfully, but these errors were encountered:

dominicqi · 2024-05-28T10:21:59Z

I think this should be a bug.

karmada/pkg/scheduler/scheduler.go

Line 530 in d676996

    
           affinityIndex := getAffinityIndex(rb.Spec.Placement.ClusterAffinities, rb.Status.SchedulerObservedAffinityName)

affinityIndex not always from zero.
Should the design here be such that it always stays on the backup cluster, and when there is a problem with the backup cluster, it moves to the primary cluster, or should it transfer back to the old cluster after the primary cluster recovers? Or should the user be allowed to choose how to handle it?

vicaya · 2024-05-28T17:00:24Z

Should the design here be such that it always stays on the backup cluster, and when there is a problem with the backup cluster, it moves to the primary cluster, or should it transfer back to the old cluster after the primary cluster recovers? Or should the user be allowed to choose how to handle it?

IMO, primary is primary for a reason and backup is usually intended for holding the workload temporarily until the primary recovers. OTOH, I can see an option e.g. failbackOnly: True for move the workload back to primary only when the backup fails, might be useful.

XiShanYongYe-Chang · 2024-05-29T04:02:42Z

Hi @vicaya, As you describe, this is the expected behavior. When multiple cluster groups are scheduled, if the current group is not suitable, the next group will be enabled and there will be no fallback.

How about try with this:

 placement:
  clusterAffinities:
   - affinityName: primary
     clusterNames:
      - c0
   - affinityName: backup
     clusterNames:
      - c0
      - c1

dominicqi · 2024-05-29T05:44:47Z

Hi @XiShanYongYe-Chang
I understand, but if it is a real primary-backup cluster, how should we ensure that the workload returns to the primary cluster in the end? Should we do this by adding and then removing taints?

XiShanYongYe-Chang · 2024-05-29T06:18:13Z

Hi @dominicqi, you can try the rebalance feature. It will be released in v1.10, the day after tomorrow.

vicaya · 2024-05-29T20:30:21Z

you can try the rebalance feature

Are you talking about #4840? Are you saying that with the same config as above, rebalancer will move the workload back to primary? If ttlSecondsAfterFinished is not specified, would the scheduler keep rebalancing indefinitely? Hopefully, we don't have to apply the rebalancer CR separately to do persistent rebalancing.

I also tried to use staticWeightList along with maxGroups: 1 to make sure all the replicas are in the the primary cluster with higher weight, which doesn't work after failover either. Hope the rebalancer would make this work as well, at the expense of verbosity and clarity.

Primary/backup scenario is such a common use case, it'd be great if the multiple cluster affinity groups would work out of the box as intended.

vicaya added the kind/bug Categorizes issue or PR as related to a bug. label May 28, 2024

github-project-automation bot added this to Karmada Overall Backlog May 28, 2024

This was referenced Jul 8, 2024

Fresh rescheduling not happening via workloadrebalancer #5070

Open

[Umbrella] Enhancement of WorkloadRebalancer to support fresh rescheduling across multiple cluster affinities #5172

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple cluster affinity groups not working as expected #4990

Multiple cluster affinity groups not working as expected #4990

vicaya commented May 28, 2024 •

edited

Loading

dominicqi commented May 28, 2024

vicaya commented May 28, 2024

XiShanYongYe-Chang commented May 29, 2024

dominicqi commented May 29, 2024

XiShanYongYe-Chang commented May 29, 2024

vicaya commented May 29, 2024

Multiple cluster affinity groups not working as expected #4990

Multiple cluster affinity groups not working as expected #4990

Comments

vicaya commented May 28, 2024 • edited Loading

dominicqi commented May 28, 2024

vicaya commented May 28, 2024

XiShanYongYe-Chang commented May 29, 2024

dominicqi commented May 29, 2024

XiShanYongYe-Chang commented May 29, 2024

vicaya commented May 29, 2024

vicaya commented May 28, 2024 •

edited

Loading