Skip to content

Commit

Permalink
add end to end canary update doc
Browse files Browse the repository at this point in the history
Signed-off-by: 守辰 <shouchen.zz@alibaba-inc.com>
  • Loading branch information
furykerry committed Jul 1, 2024
1 parent fd84109 commit 701f1b7
Show file tree
Hide file tree
Showing 4 changed files with 327 additions and 4 deletions.
2 changes: 1 addition & 1 deletion docs/user-manuals/cloneset.md
Original file line number Diff line number Diff line change
Expand Up @@ -219,7 +219,7 @@ The effect of the above configuration is that during scaling up, CloneSet will n
CloneSet provides three update types, defaults to `ReCreate`.

- `ReCreate`: controller will delete old Pods and PVCs and create new ones.
- `InPlaceIfPossible`: controller will try to in-place update Pod instead of recreating them if possible. Please read the concept doc below.
- `InPlaceIfPossible`: controller will try to in-place update Pod instead of recreating them if possible. Current only image and other fields are supported for in-place update.
- `InPlaceOnly`: controller will in-place update Pod instead of recreating them. With `InPlaceOnly` policy, user cannot modify any fields other than the fields that supported to in-place update.

**You may need to read the [concept doc](../core-concepts/inplace-update) for more details of in-place update.**
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
# 全链路灰度

import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

## 全链路灰度流程
<center><img src={require('/static/img/rollouts/e2e.png').default} width="90%" /></center>

全链路灰度发布是一种特殊的金丝雀发布流程。 在这种金丝雀发布中, 一个微服务系统中的多个应用可以共用一个流量网关,上游应用的灰度副本会把流量传递到下游应用的灰度副本中, 从而保证一个请求的处理尽可能的保持在一个端到端灰度环境中。 这种灰度环境往往被叫做流量泳道, 在这样的泳道中,如果某个应用不存在灰度实例, 请求会引流到稳定版本的应用实例中。 但当下游应用又存在灰度实例的时候,发往下游的请求又会被导流到灰度实例上。 全链路灰度往往被用来在需要多个应用协同的场景下进行业务验证和灰度。

这里通过一个简单的样例系统来演示全链路灰度, 这个系统的组成为 (gateway -> spring-cloud-a -> spring-cloud-b), 也就是请求由网关接入,网关会把接入的流量发往`spring-cloud-a``spring-cloud-a`接下来会调用下游系统`spring-cloud-b`

## 公用网关的配置
因为网关配置需要被多个应用共享, 所以具体的网关配置需要在Rollout外设置。

```YAML

apiVersion: rollouts.kruise.io/v1alpha1
kind: TrafficRouting
metadata:
name: mse-traffic
spec:
objectRef:
- service: spring-cloud-a
ingress:
classType: mse
name: spring-cloud-a
strategy:
matches:
# optional A/B Testing setting
- headers:
- type: Exact
name: User-Agent
value: foo
# alternative gray environment will receive 30% traffic
# weight: 20
# optional request head modification
requestHeaderModifier:
set:
- name: x-mse-tag
value: gray
```

## 发布配置

**Note: v1beta1 available from Kruise Rollout v0.5.0.**

多个应用的发布配置通过引用公共的网关配置,具体的,可以通过Rollout资源的 `trafficRoutingRef` 字段或者`rollouts.kruise.io/trafficrouting`标注完成。此外,通过Rollout的`patchPodTemplateMetadata`字段,灰度的实例相比稳定版本的实例,可以有不同的实例元数据。 服务发现的实现,例如微服务引擎或者服务网格, 可以利用实例元数据的差异,引导流量到不同的下游服务实例中。

<Tabs>
<TabItem value="v1beta1" label="v1beta1" default>

```YAML
# a rollout configuration
apiVersion: rollouts.kruise.io/v1beta1
kind: Rollout
metadata:
name: rollout-a
spec:
workloadRef:
apiVersion: apps/v1
kind: Deployment
name: spring-cloud-a
strategy:
canary:
steps:
- pause: {}
replicas: 1
patchPodTemplateMetadata:
labels:
alicloud.service.tag: gray
opensergo.io/canary-gray: gray
trafficRoutingRef: mse-traffic
---
# b rollout configuration
apiVersion: rollouts.kruise.io/v1beta1
kind: Rollout
metadata:
name: rollout-b
spec:
workloadRef:
apiVersion: apps/v1
kind: Deployment
name: spring-cloud-b
strategy:
canary:
steps:
- pause: {}
replicas: 1
patchPodTemplateMetadata:
labels:
alicloud.service.tag: gray
opensergo.io/canary-gray: gray
trafficRoutingRef: mse-traffic
```
</TabItem>
<TabItem value="v1alpha1" label="v1alpha1">

```YAML
# a rollout configuration
apiVersion: rollouts.kruise.io/v1alpha1
kind: Rollout
metadata:
name: rollout-a
annotations:
rollouts.kruise.io/trafficrouting: mse-traffic
spec:
objectRef:
workloadRef:
apiVersion: apps/v1
kind: Deployment
name: spring-cloud-a
strategy:
canary:
steps:
- pause: {}
replicas: 1
patchPodTemplateMetadata:
labels:
alicloud.service.tag: gray
opensergo.io/canary-gray: gray
---
# b rollout configuration
apiVersion: rollouts.kruise.io/v1alpha1
kind: Rollout
metadata:
name: rollout-b
annotations:
rollouts.kruise.io/trafficrouting: mse-traffic
spec:
objectRef:
workloadRef:
apiVersion: apps/v1
kind: Deployment
name: spring-cloud-b
strategy:
canary:
steps:
- pause: {}
replicas: 1
patchPodTemplateMetadata:
labels:
alicloud.service.tag: gray
opensergo.io/canary-gray: gray
```

</TabItem>
</Tabs>

### 效果解释
当下发`spring-cloud-a`的新版本时:
- 工作负载`spring-cloud-a`会被暂停,确保没有实例被更新。
- 具有1个实例的新的灰度Deployment会被创建,并且这个实例会具有`alicloud.service.tag: gray``opensergo.io/canary-gray: gray`的标签。
- 流量请求头`User-Agent`匹配为foo的入口流量会被引流到这个灰度实例上,同时流量会新增一个请求头`x-mse-tag=gray`, 便于微服务引擎识别灰度流量。
- `spring-cloud-a` 的灰度实例在调用下游服务时 ,会尽量选取`spring-cloud-b`相应的灰度实例,当`spring-cloud-b`不存在灰度实例时,会选取稳定版本的实例。请注意,这个步骤需要对应的服务发现实现的支持

当您认为金丝雀验证已经通过并确认进行下一步时:
- `spring-cloud-a`工作负载将使用本机滚动更新策略进行升级;
- 流量将恢复到原始的负载均衡策略;
- 金丝雀Deployment和Pods将被删除。


### 已知目前支持全链路灰度的网关实现如下:
- [MSE](https://help.aliyun.com/zh/mse/user-guide/implement-mse-based-end-to-end-canary-release-by-using-kruise-rollouts)(阿里云微服务引擎)
Original file line number Diff line number Diff line change
Expand Up @@ -209,10 +209,10 @@ spec:
CloneSet 提供了 3 种升级方式,默认为 `ReCreate`

- `ReCreate`: 控制器会删除旧 Pod 和它的 PVC,然后用新版本重新创建出来。
- `InPlaceIfPossible`: 控制器会优先尝试原地升级 Pod,如果不行再采用重建升级。具体参考下方阅读文档
- `InPlaceOnly`: 控制器只允许采用原地升级。因此,用户只能修改上一条中的限制字段,如果尝试修改其他字段会被 Kruise 拒绝。
- `InPlaceIfPossible`: 控制器会优先尝试原地升级 Pod,如果不行再采用重建升级。当前, 仅支持容器镜像等字段的原地升级
- `InPlaceOnly`: 控制器只允许采用原地升级。因此,用户只能修改容器镜像等字段,如果尝试修改其他字段会被 Kruise 拒绝。

**请阅读[该文档](../core-concepts/inplace-update)了解更多原地升级的细节。**
**请阅读[原地升级概念](../core-concepts/inplace-update)了解更多原地升级的细节。**

我们还在原地升级中提供了 **graceful period** 选项,作为优雅原地升级的策略。用户如果配置了 `gracePeriodSeconds` 这个字段,控制器在原地升级的过程中会先把 Pod status 改为 not-ready,然后等一段时间(`gracePeriodSeconds`),最后再去修改 Pod spec 中的镜像版本。
这样,就为 endpoints-controller 这些控制器留出了充足的时间来将 Pod 从 endpoints 端点列表中去除。
Expand Down
159 changes: 159 additions & 0 deletions rollouts/user-manuals/strategy-end2end-canary-update.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
# End to End Canary Release

import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

## End to End Canary Release Process
![ab](../../static/img/rollouts/e2e.png)

End to end canary release is a special kind of canary release. In such canary release, multiple applications in a micro-service system share a common traffic gateway, and the canary replicas of upstream application will pass the traffic to the downstream applications, so that the request will remain in the canary enviroment if available. The end to end canary enviroment is often called traffic swimlane. In such traffic swimlane, requests will go to the replicas of stable version if no canary replicas is available, and go back to canary environment if canary replicas are available again. End to end canary release is often utilized to conduct business evaluation that requires the cooperation of multiple applications.

The usage of end to end canary release can be illustrated using a simple sample system (gateway -> spring-cloud-a -> spring-cloud-b), that is, the requests will be admited by the gateway, and the gateway will pass the traffic first to `spring-cloud-a`, and `spring-cloud-a` will invoke the downstream `spring-cloud-b`.

## Common gateway Configuration
Since gateway configuration is shared by multiple applications. The gateway is configured outside of the rollout.

```YAML

apiVersion: rollouts.kruise.io/v1alpha1
kind: TrafficRouting
metadata:
name: mse-traffic
spec:
objectRef:
- service: spring-cloud-a
ingress:
classType: mse
name: spring-cloud-a
strategy:
matches:
# optional A/B Testing setting
- headers:
- type: Exact
name: User-Agent
value: foo
# gray environment will receive 30% traffic
# weight: 20
# optional request head modification
requestHeaderModifier:
set:
- name: x-mse-tag
value: gray
```

## Rollout Configuration
**Note: v1beta1 available from Kruise Rollout v0.5.0.**

Rollout configuration of multiple applications can share the same traffic routing by referring the traffic routing config name using `trafficRoutingRef` field or the `rollouts.kruise.io/trafficrouting` annotation. In addition, the canary replicas can have different configuration with the normal replicas by changing the metadata of canary replicas using patchPodTemplateMetadata field. The service discovery implementation e.g. micro-service engine or service mesh can utilize the metadata information to guide traffic to different downstream applications accordingly.

<Tabs>
<TabItem value="v1beta1" label="v1beta1" default>

```YAML
# a rollout configuration
apiVersion: rollouts.kruise.io/v1beta1
kind: Rollout
metadata:
name: rollout-a
spec:
workloadRef:
apiVersion: apps/v1
kind: Deployment
name: spring-cloud-a
strategy:
canary:
steps:
- pause: {}
replicas: 1
patchPodTemplateMetadata:
labels:
alicloud.service.tag: gray
opensergo.io/canary-gray: gray
trafficRoutingRef: mse-traffic
---
# b rollout configuration
apiVersion: rollouts.kruise.io/v1beta1
kind: Rollout
metadata:
name: rollout-b
spec:
workloadRef:
apiVersion: apps/v1
kind: Deployment
name: spring-cloud-b
strategy:
canary:
steps:
- pause: {}
replicas: 1
patchPodTemplateMetadata:
labels:
alicloud.service.tag: gray
opensergo.io/canary-gray: gray
trafficRoutingRef: mse-traffic
```
</TabItem>
<TabItem value="v1alpha1" label="v1alpha1">

```YAML
# a rollout configuration
apiVersion: rollouts.kruise.io/v1alpha1
kind: Rollout
metadata:
name: rollout-a
annotations:
rollouts.kruise.io/trafficrouting: mse-traffic
spec:
objectRef:
workloadRef:
apiVersion: apps/v1
kind: Deployment
name: spring-cloud-a
strategy:
canary:
steps:
- pause: {}
replicas: 1
patchPodTemplateMetadata:
labels:
alicloud.service.tag: gray
opensergo.io/canary-gray: gray
---
# b rollout configuration
apiVersion: rollouts.kruise.io/v1alpha1
kind: Rollout
metadata:
name: rollout-b
annotations:
rollouts.kruise.io/trafficrouting: mse-traffic
spec:
objectRef:
workloadRef:
apiVersion: apps/v1
kind: Deployment
name: spring-cloud-b
strategy:
canary:
steps:
- pause: {}
replicas: 1
patchPodTemplateMetadata:
labels:
alicloud.service.tag: gray
opensergo.io/canary-gray: gray
```

</TabItem>
</Tabs>

### Behavior Explanation
When you apply a new revision for `spring-cloud-a`:
- The workload `spring-cloud-a` will be paused, and no Pod is updated;
- A new canary Deployment will be created with 1 replica, and the replica will have additional labels `alicloud.service.tag: gray` and `opensergo.io/canary-gray: gray`
- Traffic with header `User-Agent=foo` will be routed to the new canary Deployment pod, in addition an extra header `x-mse-tag=gray` is added to help
- `spring-cloud-a` will invoke the canary replicas of downstream application spring-cloud-b if available, and will invoke the stable replicas if no canary spring-cloud-b exists. Note that, this step requires the support of service discovery implementation.

When you thought the verification of canary is ok, and confirmed to next step:
- The workload `spring-cloud-a` will be upgraded using native rolling update strategy;
- The traffic will be restored to native load balance strategy.
- The canary Deployment and Pods will be deleted.

0 comments on commit 701f1b7

Please sign in to comment.