-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to suspend work #4688
Comments
Created a PR to get feedback: #4689 |
Related to it #1859 |
+1 on this user story.
May I ask for more detailed info about how you do the migration? I don't understand how can the workload be synced from Just a guess, the process of migrating a workload would be something like: Step 1: Create a PropagationPolicy to take over an application from apiVersion: policy.karmada.io/v1alpha1
kind: PropagationPolicy
metadata:
name: foo
spec:
resourceSelectors:
- apiVersion: apps/v1
kind: Deployment
name: nginx
placement:
clusterAffinity:
clusterNames:
- blue Step 2: Add apiVersion: policy.karmada.io/v1alpha1
kind: PropagationPolicy
metadata:
name: foo
spec:
resourceSelectors:
- apiVersion: apps/v1
kind: Deployment
name: nginx
placement:
clusterAffinity:
clusterNames:
- blue
- green Step 3: Testing against the Step 4: Remove the application from the apiVersion: policy.karmada.io/v1alpha1
kind: PropagationPolicy
metadata:
name: foo
spec:
resourceSelectors:
- apiVersion: apps/v1
kind: Deployment
name: nginx
placement:
clusterAffinity:
clusterNames:
- green |
I totally agree that Actually we already have an annotation( Lines 41 to 47 in 9ccc8be
But this annotation is used internally in karmada itself in scenario of a Work is not managed by a ResourceBinding .
But, in your case, even after But don't worry, there are also two approaches to eliminate the risk:
I tend to introduce the functionality to |
Hi @a7i, we are going to push this feature recently. Can you help to provide some feedback with @RainbowMango above? |
While we have long-term plans to use karmada apiserver for handling our multi-cluster / blue-green uses-cases, we currently reuse our kube-apiserver (on the blue cluster). We essentially bootstrap karmada controller-manager, webhook, scheduler, aggregated-api onto an existing cluster (which reuses the kube-apiserver). We then use policies to migrate workloads gradually to another cluster. I realize that majority of karmada use-cases do not follow this pattern.
Ideally we deprecate this annotation in favor of using the same
I agree with your recommendation. Let me pick this back up and have a PR ready this week. |
@XiShanYongYe-Chang I have updated the PR to reflect the requested changes. I look forward to your feedback |
Can you elaborate on this step a little bit? |
Given that we have two clusters:
Let's take a simple workload (Deployment with 3 replicas and Service) as an example.
|
Thanks for the detailed explanation. Do you need to pause work at each step interval between 3 and 9 steps? If so, what condition is the pause work waiting for? |
No, given that we still rely on karmada-controller-manager for handling propagation syncs. After step 9, however, it needs to be suspended. I think this feature request is generic enough to not have to make our use-case the only use-case. I believe that "User Story 1" from the Issue description is a valid use-case and as a cluster-admin, I want to have this capability at my disposal. |
Thanks @a7i. I wonder if Karmada can do more to facilitate users to use out-of-the-box capabilities, such as canary release, in addition to providing APIs for suspending work propagation (which can be considered as an atomic capability). I may be wrong, please correct me if so. |
Hi @a7i I would like to know, do you use argo-cd for canary release? I was wondering if we could do some combination with argo-cd by suspending work. |
Hi, we also have the need to suspend rb, mainly to do a global sorting based on priority, and to perform capacity admission on the queue associated with rb. |
I raised a similar need in the issue #4559 , getting the scheduling results of the RB helps to increase flexibility in scheduling. |
Hi @tpiperatgod , from issue $4559, I think what you want to pause is the propagation of the Work, which is the stage where you can get the rb scheduling results. But for the rb scheduling pause, there is no scheduling result yet. What would you do with this? |
Yes, that's what I meant, there may be ambiguity in the presentation. |
What would you like to be added:
Ability to suspend work to ensure that changes are not being reconciled.
Why is this needed:
User Story 1: As a cluster admin, we may get stuck in a reconciliation loop where karmada controller-manager will update a resource but due to some unknown reason (perhaps a controller on the member cluster) the resource is reverted. The operator can decide to suspend work while debugging the issue.
User Story 2: We're using Karmada to migrate workload from one blue cluster to a green cluster. Once we move a workload, we want to suspend any updates from blue to green until we cut over to green.
Persona
Cluster Admin who is oncall and has permission to modify karmada resources
Implementation Details
There are three ways that we can go about this:
1. Expose
suspend
onWork
CRDThis is the simplest approach but requires the Cluster Admin to identify
Work
in thekarmada-es-${cluster}
namespace before patching them. Once the field is set, the controller will get out early. In the code here https://github.com/karmada-io/karmada/blob/master/pkg/controllers/status/work_status_controller.go#L93-L96, we would add the following:Effort: Low
2. Expose
suspend
onResourceBinding
CRDThis requires the
suspend
field onResourceBinding
which will also get updated toWork
. The Cluster Admin will have to identify theResourceBinding
in the workload namespace before patching them. The name of resource binding is more predictable thanWork
.Effort: Medium
3. Expose
suspend
onPropagationPolicy
andClusterPropagationPolicy
CRDThe Cluster Admin will have to know the PP or CPP with the highest priority and decide to suspend them. Changes to
suspend
will have to get updated toResourceBinding
andWork
.Effort: X-Large
The text was updated successfully, but these errors were encountered: