Skip to content

Commit

Permalink
Merge pull request #46 from stefanprodan/skip-canary
Browse files Browse the repository at this point in the history
 Add option to skip the canary analysis
  • Loading branch information
stefanprodan authored Feb 13, 2019
2 parents 2c9c1ad + e565789 commit efd901a
Show file tree
Hide file tree
Showing 10 changed files with 190 additions and 348 deletions.
378 changes: 36 additions & 342 deletions README.md

Large diffs are not rendered by default.

3 changes: 3 additions & 0 deletions artifacts/canaries/canary.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,9 @@ spec:
# Istio virtual service host names (optional)
hosts:
- app.istio.weavedx.com
# for emergency cases when you want to ship changes
# in production without analysing the canary
skipAnalysis: false
canaryAnalysis:
# schedule interval (default 60s)
interval: 10s
Expand Down
2 changes: 2 additions & 0 deletions artifacts/flagger/crd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,8 @@ spec:
properties:
port:
type: number
skipAnalysis:
type: boolean
canaryAnalysis:
properties:
interval:
Expand Down
2 changes: 2 additions & 0 deletions charts/flagger/templates/crd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,8 @@ spec:
properties:
port:
type: number
skipAnalysis:
type: boolean
canaryAnalysis:
properties:
interval:
Expand Down
2 changes: 1 addition & 1 deletion docs/gitbook/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,4 @@

## Tutorials

* [Canary Deployments with Helm charts](tutorials/canary-helm-gitops.md)
* [Canaries with Helm charts and GitOps](tutorials/canary-helm-gitops.md)
8 changes: 8 additions & 0 deletions docs/gitbook/how-it-works.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,9 @@ spec:
# Istio virtual service host names (optional)
hosts:
- podinfo.example.com
# for emergency cases when you want to ship changes
# in production without analysing the canary
skipAnalysis: false
canaryAnalysis:
# schedule interval (default 60s)
interval: 1m
Expand Down Expand Up @@ -167,6 +170,11 @@ And the time it takes for a canary to be rollback when the metrics or webhook ch
interval * threshold
```

In emergency cases, you may want to skip the analysis phase and ship changes directly to production.
At any time you can set the `spec.skipAnalysis: true`.
When skip analysis is enabled, Flagger checks if the canary deployment is healthy and
promotes it without analysing it. If an analysis is underway, Flagger cancels it and runs the promotion.

### HTTP Metrics

The canary analysis is using the following Prometheus queries:
Expand Down
48 changes: 43 additions & 5 deletions docs/gitbook/tutorials/canary-helm-gitops.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Canary Deployments with Helm charts
# Canary Deployments with Helm Charts and GitOps

This guide shows you how to package a web app into a Helm chart, trigger canary deployments on Helm upgrade
and automate the chart release process with Weave Flux.
Expand Down Expand Up @@ -230,10 +230,20 @@ If you've enabled the Slack notifications, you'll receive an alert with the reas

### GitOps automation

Instead of using Helm CLI from a CI tool to perform the install and upgrade, you could use a Git based approach.
Instead of using Helm CLI from a CI tool to perform the install and upgrade,
you could use a Git based approach. GitOps is a way to do Continuous Delivery,
it works by using Git as a source of truth for declarative infrastructure and workloads.
In the [GitOps model](https://www.weave.works/technologies/gitops/),
any change to production must be committed in source control
prior to being applied on the cluster. This way rollback and audit logs are provided by Git.

![Helm GitOps Canary Deployment](https://raw.githubusercontent.com/stefanprodan/flagger/master/docs/diagrams/flagger-flux-gitops.png)

In order to apply the GitOps pipeline model to Flagger canary deployments you'll need
a Git repository with your workloads definitions in YAML format,
a container registry where your CI system pushes immutable images and
an operator that synchronizes the Git repo with the cluster state.

Create a git repository with the following content:

```
Expand Down Expand Up @@ -287,7 +297,34 @@ With the `flux.weave.works` annotations I instruct Flux to automate this release
When an image tag in the sem ver range of `1.4.0 - 1.4.99` is pushed to Quay,
Flux will upgrade the Helm release and from there Flagger will pick up the change and start a canary deployment.

A CI/CD pipeline for the frontend release could look like this:
Install [Weave Flux](https://github.com/weaveworks/flux) and its Helm Operator by specifying your Git repo URL:

```bash
helm repo add weaveworks https://weaveworks.github.io/flux
helm install --name flux \
--set helmOperator.create=true \
--set git.url=git@github.com:<USERNAME>/<REPOSITORY> \
--namespace flux \
weaveworks/flux
```

At startup Flux generates a SSH key and logs the public key. Find the SSH public key with:

```bash
kubectl -n flux logs deployment/flux | grep identity.pub | cut -d '"' -f2
```

In order to sync your cluster state with Git you need to copy the public key and create a
deploy key with write access on your GitHub repository.

Open GitHub, navigate to your fork, go to _Setting > Deploy keys_ click on _Add deploy key_,
check _Allow write access_, paste the Flux public key and click _Add key_.

After a couple of seconds Flux will apply the Kubernetes resources from Git and Flagger will
launch the `frontend` and `backend` apps.

A CI/CD pipeline for the `frontend` release could look like this:

* cut a release from the master branch of the podinfo code repo with the git tag `1.4.1`
* CI builds the image and pushes the `podinfo:1.4.1` image to the container registry
Expand All @@ -302,7 +339,7 @@ A CI/CD pipeline for the frontend release could look like this:

If the canary fails, fix the bug, do another patch release eg `1.4.2` and the whole process will run again.

There are a couple of reasons why a canary deployment fails:
A canary deployment can fail due to any of the following reasons:

* the container image can't be downloaded
* the deployment replica set is stuck for more then ten minutes (eg. due to a container crash loop)
Expand All @@ -312,4 +349,5 @@ There are a couple of reasons why a canary deployment fails:
* the Istio telemetry service is unable to collect traffic metrics
* the metrics server (Prometheus) can't be reached


If you want to find out more about managing Helm releases with Flux here is an in-depth guide
[github.com/stefanprodan/gitops-helm](https://github.com/stefanprodan/gitops-helm).
4 changes: 4 additions & 0 deletions pkg/apis/flagger/v1alpha3/types.go
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,10 @@ type CanarySpec struct {
// the maximum time in seconds for a canary deployment to make progress
// before it is considered to be failed. Defaults to ten minutes.
ProgressDeadlineSeconds *int32 `json:"progressDeadlineSeconds,omitempty"`

// promote the canary without analysing it
// +optional
SkipAnalysis bool `json:"skipAnalysis,omitempty"`
}

// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object
Expand Down
50 changes: 50 additions & 0 deletions pkg/controller/scheduler.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ import (
"strings"
"time"

istiov1alpha3 "github.com/knative/pkg/apis/istio/v1alpha3"
flaggerv1 "github.com/stefanprodan/flagger/pkg/apis/flagger/v1alpha3"
"k8s.io/apimachinery/pkg/apis/meta/v1"
)
Expand Down Expand Up @@ -170,6 +171,11 @@ func (c *Controller) advanceCanary(name string, namespace string, skipLivenessCh
}
}

// check if analysis should be skipped
if skip := c.shouldSkipAnalysis(cd, primaryRoute, canaryRoute); skip {
return
}

// check if the number of failed checks reached the threshold
if cd.Status.Phase == flaggerv1.CanaryProgressing &&
(!retriable || cd.Status.FailedChecks >= cd.Spec.CanaryAnalysis.Threshold) {
Expand Down Expand Up @@ -294,6 +300,50 @@ func (c *Controller) advanceCanary(name string, namespace string, skipLivenessCh
}
}

func (c *Controller) shouldSkipAnalysis(cd *flaggerv1.Canary, primary istiov1alpha3.DestinationWeight, canary istiov1alpha3.DestinationWeight) bool {
if !cd.Spec.SkipAnalysis {
return false
}

// route all traffic to primary
primary.Weight = 100
canary.Weight = 0
if err := c.router.SetRoutes(cd, primary, canary); err != nil {
c.recordEventWarningf(cd, "%v", err)
return false
}
c.recorder.SetWeight(cd, primary.Weight, canary.Weight)

// copy spec and configs from canary to primary
c.recordEventInfof(cd, "Copying %s.%s template spec to %s-primary.%s",
cd.Spec.TargetRef.Name, cd.Namespace, cd.Spec.TargetRef.Name, cd.Namespace)
if err := c.deployer.Promote(cd); err != nil {
c.recordEventWarningf(cd, "%v", err)
return false
}

// shutdown canary
if err := c.deployer.Scale(cd, 0); err != nil {
c.recordEventWarningf(cd, "%v", err)
return false
}

// update status phase
if err := c.deployer.SetStatusPhase(cd, flaggerv1.CanarySucceeded); err != nil {
c.recordEventWarningf(cd, "%v", err)
return false
}

// notify
c.recorder.SetStatus(cd)
c.recordEventInfof(cd, "Promotion completed! Canary analysis was skipped for %s.%s",
cd.Spec.TargetRef.Name, cd.Namespace)
c.sendNotification(cd, "Canary analysis was skipped, promotion finished.",
false, false)

return true
}

func (c *Controller) checkCanaryStatus(cd *flaggerv1.Canary, shouldAdvance bool) bool {
c.recorder.SetStatus(cd)
if cd.Status.Phase == flaggerv1.CanaryProgressing {
Expand Down
41 changes: 41 additions & 0 deletions pkg/controller/scheduler_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,47 @@ func TestScheduler_Rollback(t *testing.T) {
}
}

func TestScheduler_SkipAnalysis(t *testing.T) {
mocks := SetupMocks()
// init
mocks.ctrl.advanceCanary("podinfo", "default", false)

// enable skip
cd, err := mocks.flaggerClient.FlaggerV1alpha3().Canaries("default").Get("podinfo", metav1.GetOptions{})
if err != nil {
t.Fatal(err.Error())
}
cd.Spec.SkipAnalysis = true
_, err = mocks.flaggerClient.FlaggerV1alpha3().Canaries("default").Update(cd)
if err != nil {
t.Fatal(err.Error())
}

// update
dep2 := newTestDeploymentV2()
_, err = mocks.kubeClient.AppsV1().Deployments("default").Update(dep2)
if err != nil {
t.Fatal(err.Error())
}

// detect changes
mocks.ctrl.advanceCanary("podinfo", "default", true)
// advance
mocks.ctrl.advanceCanary("podinfo", "default", true)

c, err := mocks.flaggerClient.FlaggerV1alpha3().Canaries("default").Get("podinfo", metav1.GetOptions{})
if err != nil {
t.Fatal(err.Error())
}
if !c.Spec.SkipAnalysis {
t.Errorf("Got skip analysis %v wanted %v", c.Spec.SkipAnalysis, true)
}

if c.Status.Phase != v1alpha3.CanarySucceeded {
t.Errorf("Got canary state %v wanted %v", c.Status.Phase, v1alpha3.CanarySucceeded)
}
}

func TestScheduler_NewRevisionReset(t *testing.T) {
mocks := SetupMocks()
// init
Expand Down

0 comments on commit efd901a

Please sign in to comment.