Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the user experience of Failover #5150

Open
XiShanYongYe-Chang opened this issue Jul 6, 2024 · 7 comments
Open

Improve the user experience of Failover #5150

XiShanYongYe-Chang opened this issue Jul 6, 2024 · 7 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@XiShanYongYe-Chang
Copy link
Member

What would you like to be added:

Improve the user experience of Failover

Why is this needed:

The Failover and GracefulEviction features are currently in the Beta phase, which means they are enabled by default.

There is a scenario where users propagate configuration resources by directly specifying the cluster names. When a cluster is disconnected from the Karmada control plane for several hours, it is identified as NotReady. Once the cluster recovers, the configuration resources on that cluster are deleted unexpectedly. If this occurs in a production environment, it could lead to serious consequences.

Therefore, we need to optimize the Failover feature for this scenario to provide users with a more stable and reliable experience.

@XiShanYongYe-Chang XiShanYongYe-Chang added the kind/feature Categorizes issue or PR as related to a new feature. label Jul 6, 2024
@whitewindmills
Copy link
Member

are you saying that resources on unhealthy member clusters will be deleted cause they're migrated to other member clusters? if so, how to improve it?

@XiShanYongYe-Chang
Copy link
Member Author

are you saying that resources on unhealthy member clusters will be deleted cause they're migrated to other member clusters?

That's it. One thing to note is that it's not migrated to another cluster, it's just moved out of the failed cluster.

if so, how to improve it?

I hope to hear everyone's opinion.

@whitewindmills
Copy link
Member

One thing to note is that it's not migrated to another cluster, it's just moved out of the failed cluster.

let me guest how it happened, no fit cluster?

@XiShanYongYe-Chang
Copy link
Member Author

In other words, the clusters that need to be distributed have been listed, and these clusters will be distributed with configuration resources.

@whitewindmills
Copy link
Member

does it look like this? if the cluster foo becomes unhealthy, the configuration will be stuck in the cluster until the cluster becomes healthy and then it is deleted. am I right?

apiVersion: policy.karmada.io/v1alpha1
kind: PropagationPolicy
metadata:
  name: test-pp
  namespace: default
spec:
  resourceSelectors:
    - apiVersion: v1
      kind: ConfigMap
      name: conf
  placement:
    clusterAffinity:
      clusterNames:
      - foo
      - bar
  ...

@XiShanYongYe-Chang
Copy link
Member Author

Yes, you are right.

@whitewindmills
Copy link
Member

yes, this is a noteworthy case where we would prefer not to delete resources when there is no new cluster to migrate to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
Status: No status
Development

No branches or pull requests

2 participants