Skip to content

Commit

Permalink
add doc for workload-rabalancer
Browse files Browse the repository at this point in the history
Signed-off-by: chaosi-zju <chaosi@zju.edu.cn>
  • Loading branch information
chaosi-zju committed May 23, 2024
1 parent c76b569 commit 58de8b2
Show file tree
Hide file tree
Showing 2 changed files with 317 additions and 0 deletions.
316 changes: 316 additions & 0 deletions docs/userguide/scheduling/workload-rebalancer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,316 @@
---
title: Workload Rebalance
---

In general case, after replicas of workloads is scheduled, it will keep the scheduling result inert
and the replicas distribution will not change. Even if reschedule is triggered by modifying replicas or placement,
it will maintain the exist replicas distribution as closely as possible, only making minimal adjustments when necessary,
which minimizes disruptions and preserves the balance across clusters.

However, in some scenarios, users hope to have approach to actively trigger a fresh rescheduling, which disregards the
previous assignment entirely and seeks to establish an entirely new replica distribution across clusters.

## Applicable Scenarios

### Scenario 1

In cluster failover scenario, replicas are distributed in member1 + member2 two clusters, however they would all migrate to
member2 cluster if member1 cluster fails.

As a cluster administrator, I hope the replicas redistribute to two clusters when member1 cluster recovered, so that
the resources of the member1 cluster will be re-utilized, also for the sake of high availability.

### Scenario 2

In application-level failover, low-priority applications may be preempted, resulting in shrinking from multi clusters
to single cluster due to cluster resources are in short supply
(refer to [Application-level Failover](https://karmada.io/docs/next/userguide/failover/application-failover#why-application-level-failover-is-required)).

As a user, I hope the replicas of low-priority applications can be redistributed to multi clusters when
cluster resources are sufficient to ensure the high availability of application.

### Scenario 3

In `Aggregated` schedule type, replicas may still distribute across multiple clusters due to resource constraints.

As a user, I hope the replicas to be redistributed in an aggregated strategy when any cluster has
sufficient resource to accommodate all replicas, so that the application better meets actual business requirements.


### Scenario 4

In disaster-recovery scenario, replicas migrated from primary cluster to backup cluster when primary cluster failure.

As a cluster administrator, I hope that replicas can migrate back when cluster restored, so that:

1. restore to the disaster-recovery mode to ensure the reliability and stability of the cluster federation.
2. save the cost of the backup cluster.

## Prerequisites

### Karmada has been installed

We can install Karmada by referring to [quick-start](https://github.com/karmada-io/karmada#quick-start), or directly
run `hack/local-up-karmada.sh` script which is also used to run our E2E cases.

## Example

### Step 1: create a Deployment and a ClusterRole

You should first prepare a Deployment named `demo-deploy-1`, and a ClusterRole named `demo-role`.

To achieve this, you can create a new file `deployments-and-services.yaml` and content with the following:

<details>
<summary>deployments-and-services.yaml</summary>

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: demo-deploy-1
labels:
app: test
spec:
replicas: 3
selector:
matchLabels:
app: demo-deploy-1
template:
metadata:
labels:
app: demo-deploy-1
spec:
terminationGracePeriodSeconds: 0
containers:
- image: nginx
name: demo-deploy-1
resources:
limits:
cpu: 10m
memory: 10Mi
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: demo-role
rules:
- apiGroups:
- '*'
resources:
- '*'
verbs:
- '*'
---
apiVersion: policy.karmada.io/v1alpha1
kind: ClusterPropagationPolicy
metadata:
name: default-pp
spec:
placement:
clusterTolerations:
- effect: NoSchedule
key: workload-rebalancer-test
operator: Exists
tolerationSeconds: 0
clusterAffinity:
clusterNames:
- member1
- member2
replicaScheduling:
replicaDivisionPreference: Weighted
replicaSchedulingType: Divided
weightPreference:
dynamicWeight: AvailableReplicas
resourceSelectors:
- apiVersion: apps/v1
kind: Deployment
name: demo-deploy-1
namespace: default
- apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
name: demo-role
```
</details>
Then run the following command to create those resources:
```bash
kubectl --context karmada-apiserver apply -f deployments-and-services.yaml
```

And you can check whether this step succeed like this:

```bash
$ kubectl --context karmada-apiserver get deploy demo-deploy-1
NAME READY UP-TO-DATE AVAILABLE AGE
demo-deploy-1 3/3 3 3 3m18s
$ kubectl --context member1 get po
NAME READY STATUS RESTARTS AGE
demo-deploy-1-784cd456bf-dv6xw 1/1 Running 0 3m18s
demo-deploy-1-784cd456bf-fgjn7 1/1 Running 0 3m18s
$ kubectl --context member2 get po
NAME READY STATUS RESTARTS AGE
demo-deploy-1-784cd456bf-856rf 1/1 Running 0 3m18s

$ kubectl --context karmada-apiserver get clusterrole demo-role
NAME CREATED AT
demo-role 2024-05-22T11:10:29Z
```

take `deployment/demo-deploy-1` as example, 2 replicas propagated to member1 cluster and 1 replica propagated to member2 cluster.

### Step 2: add `NoExecute` taint to member1 cluster to mock cluster failover

* Run the following command to add `NoExecute` taint to member1 cluster:

```bash
$ karmadactl --karmada-context=karmada-apiserver taint clusters member1 workload-rebalancer-test:NoExecute
cluster/member1 tainted
```

Then, reschedule will be triggered for the reason of cluster failover, and all replicas will be propagated to member2 cluster,
you can see:

```bash
$ kubectl --context member1 get po
No resources found in default namespace.

$ kubectl --context member2 get po
NAME READY STATUS RESTARTS AGE
demo-deploy-1-784cd456bf-856rf 1/1 Running 0 5m27s
demo-deploy-1-784cd456bf-b5977 1/1 Running 0 35s
demo-deploy-1-784cd456bf-pqthv 1/1 Running 0 35s
```

* Run the following command to remove the above `NoExecute` taint from member1 cluster:

```bash
$ karmadactl --karmada-context=karmada-apiserver taint clusters member1 workload-rebalancer-test:NoExecute-
cluster/member1 untainted
```

Removing the taint will not lead to replicas propagation changed for the reason of scheduling result inert,
all replicas will keep in member2 cluster unchanged.

### Step 3. apply a WorkloadRebalancer to trigger rescheduling.

Assuming you want to trigger the rescheduling of above resources, you can create a new file `workload-rebalancer.yaml`
and content with the following:

```yaml
apiVersion: apps.karmada.io/v1alpha1
kind: WorkloadRebalancer
metadata:
name: demo
spec:
workloads:
- apiVersion: apps/v1
kind: Deployment
name: demo-deploy-1
namespace: default
- apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
name: demo-role
```
Then run the following command to apply it:
```bash
kubectl --context karmada-apiserver apply -f workload-rebalancer.yaml
```

you will get a `workloadrebalancer.apps.karmada.io/demo created` result, which means the API created success.

### Step 4: check the status of WorkloadRebalancer.

Run the following command:

```bash
$ kubectl --context karmada-apiserver get workloadrebalancer demo -o yaml
apiVersion: apps.karmada.io/v1alpha1
kind: WorkloadRebalancer
metadata:
...
creationTimestamp: "2024-05-22T11:16:10Z"
name: demo
...
spec:
...
status:
finishTime: "2024-05-22T11:16:10Z"
observedGeneration: 1
observedWorkloads:
- result: Successful
workload:
apiVersion: apps/v1
kind: Deployment
name: demo-deploy-1
namespace: default
- result: Successful
workload:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
name: demo-role
```

Thus, you can observe the rescheduling result at `status.observedWorkloads` field of `workloadrebalancer/demo`.
As you can see, `Deployment/demo-deploy-1` and `ClusterRole/demo-role` rescheduled successfully.

### Step 5: Observe the real effect of WorkloadRebalancer

Take `deployment/demo-deploy-1` as an example, you can observe the real replicas propagation status:

```bash
$ kubectl --context member1 get po
NAME READY STATUS RESTARTS AGE
demo-deploy-1-784cd456bf-82kt6 1/1 Running 0 89s
demo-deploy-1-784cd456bf-k9fhl 1/1 Running 0 89s

$ kubectl --context member2 get po
NAME READY STATUS RESTARTS AGE
demo-deploy-1-784cd456bf-856rf 1/1 Running 0 9m23s
```

As you see, rescheduling happened and 2 replicas migrated back to member1 cluster while 1 replica in member2 cluster keep unchanged.

Besides, you can observe a schedule event emitted by `default-scheduler`, such as:

```bash
$ kubectl --context karmada-apiserver describe deployment demo-deploy-1
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
...
Normal ScheduleBindingSucceed 31s default-scheduler Binding has been scheduled successfully. Result: {member2:2, member1:1}
Normal GetDependenciesSucceed 31s dependencies-distributor Get dependencies([]) succeed.
Normal SyncSucceed 31s execution-controller Successfully applied resource(default/demo-deploy-1) to cluster member1
Normal AggregateStatusSucceed 31s (x4 over 31s) resource-binding-status-controller Update resourceBinding(default/demo-deploy-1-deployment) with AggregatedStatus successfully.
Normal SyncSucceed 31s execution-controller Successfully applied resource(default/demo-deploy-1) to cluster member2
```

### Step 6: Update and Auto-clean WorkloadRebalancer

Assuming you want the WorkloadRebalancer resource been auto cleaned in the future, you can just edit it and set
`spec.ttlSecondsAfterFinished` field to `300`, just like:

```yaml
apiVersion: apps.karmada.io/v1alpha1
kind: WorkloadRebalancer
metadata:
name: demo
spec:
ttlSecondsAfterFinished: 300
workloads:
- apiVersion: apps/v1
kind: Deployment
name: demo-deploy-1
namespace: default
- apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
name: demo-role
```
After you applied this modification, this WorkloadRebalancer resource will be auto deleted after 300 seconds.
1 change: 1 addition & 0 deletions sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@ module.exports = {
"userguide/scheduling/descheduler",
"userguide/scheduling/scheduler-estimator",
"userguide/scheduling/cluster-resources",
"userguide/scheduling/workload-rebalancer",
],
},
{
Expand Down

0 comments on commit 58de8b2

Please sign in to comment.