Does Karmada support removal of dangling resources? #5071

mszacillo · 2024-06-20T21:26:41Z

Hello!

Our general question is: Does Karmada support a way to remove dangling resources in the event of a cluster failure?

Please provide an in-depth description of the question you have:

We have been running cluster failover tests for our Flink applications and have run into a few interesting scenarios. One instance in particular is in the case that there is a network partition between the Karmada control plane and the control plane of one of the member clusters. Here is an example scenario:

A FlinkDeployment (and other secrets that are required) get deployed to Cluster X.
We shut off the nodes for the control plane of Cluster X to simulate a network partition. Eventually Karmada will taint Cluster X as NoExecute, and attempt to reschedule the application elsewhere.
The FlinkDeployment gets rescheduled to Cluster Y. The previously scheduled FlinkDeployment on Cluster X continues to run.
We turn on the nodes for Cluster X once again. Karmada is able to reconnect to the cluster, and we end up with a dangling FlinkDeployment on Cluster X, since the ResourceBinding now points to Cluster Y.

Is there a way we can have Karmada reconcile these types of dangling resources and remove them from the cluster that has recovered? For example, even if the ResourceBinding only points to 1 cluster, there are still multiple Works scheduled across multiple clusters. I would assume that Karmada should be able to reconcile that the (# of works) != (# of replicas), and attempt to remove the dangling work.

Environment:

Karmada version: v1.9.0
Kubernetes version: v1.29

The text was updated successfully, but these errors were encountered:

RainbowMango · 2024-06-21T08:58:18Z

Yes, Karmada already supports this scenario now.

After the application(FlinkDeployment) gets rescheduled from cluster X to cluster Y, then, the cluster X will be removed from the relevant ResourceBinding. Something like:

apiVersion: work.karmada.io/v1alpha2
kind: ResourceBinding
spec:
  clusters:
  - name: cluster-Y  # only cluster-Y will be present
    replicas: 1

When the ResourceBinding controller tries to sync the latest ResourceBinding, it will find and remove the dangling work resources(actually we say orphan work in the code) targeted to legacy cluster-X.

Note that, the ResourceBinding controller only triggers the the deletion of those orphan works (by setting a non-nil deletionTimestamp), but those works will be eventually removed until the network recovery.

mszacillo · 2024-06-21T19:43:19Z

Ah, thanks for pointing this out! Did some tests and confirmed this does eventually happen which is good to know.

Out of curiosity, what happens if a ResourceBinding is deleted from a cluster that is in a bad state? Does the orphan work eventually get removed as well in that case?

RainbowMango · 2024-06-25T08:06:51Z

Out of curiosity, what happens if a ResourceBinding is deleted from a cluster that is in a bad state? Does the orphan work eventually get removed as well in that case?

I guess you mean the behaviors when a ResourceBinding is deleted from Karmada.
In that case, all work propagated by this ResourceBinding would be removed, resulting in all workload propagated by the ResourceBinding to member clusters being removed eventually.

Note that,

Karmada treats the ResourceBinding as internal resources, it is not supposed to be removed by any third-party system.
The owner of ResourceBinding is the resource template, in other words, once the resource template is gone, the ResourceBinding will be removed automatically, that's the default behavior by now.
We are still working on another proposal that hopes to provide a mechanism to keep the resources from member clusters in case of resource template deletion. See the discussion at Does karmada can prevent removal of these managed resources #4709

mszacillo added the kind/question Indicates an issue that is a support question. label Jun 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does Karmada support removal of dangling resources? #5071

Does Karmada support removal of dangling resources? #5071

mszacillo commented Jun 20, 2024

RainbowMango commented Jun 21, 2024 •

edited

Loading

mszacillo commented Jun 21, 2024

RainbowMango commented Jun 25, 2024

Does Karmada support removal of dangling resources? #5071

Does Karmada support removal of dangling resources? #5071

Comments

mszacillo commented Jun 20, 2024

RainbowMango commented Jun 21, 2024 • edited Loading

mszacillo commented Jun 21, 2024

RainbowMango commented Jun 25, 2024

RainbowMango commented Jun 21, 2024 •

edited

Loading