Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to skip the canary analysis #46

Merged
merged 6 commits into from
Feb 13, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
378 changes: 36 additions & 342 deletions README.md

Large diffs are not rendered by default.

3 changes: 3 additions & 0 deletions artifacts/canaries/canary.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,9 @@ spec:
# Istio virtual service host names (optional)
hosts:
- app.istio.weavedx.com
# for emergency cases when you want to ship changes
# in production without analysing the canary
skipAnalysis: false
canaryAnalysis:
# schedule interval (default 60s)
interval: 10s
Expand Down
2 changes: 2 additions & 0 deletions artifacts/flagger/crd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,8 @@ spec:
properties:
port:
type: number
skipAnalysis:
type: boolean
canaryAnalysis:
properties:
interval:
Expand Down
2 changes: 2 additions & 0 deletions charts/flagger/templates/crd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,8 @@ spec:
properties:
port:
type: number
skipAnalysis:
type: boolean
canaryAnalysis:
properties:
interval:
Expand Down
2 changes: 1 addition & 1 deletion docs/gitbook/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,4 @@

## Tutorials

* [Canary Deployments with Helm charts](tutorials/canary-helm-gitops.md)
* [Canaries with Helm charts and GitOps](tutorials/canary-helm-gitops.md)
8 changes: 8 additions & 0 deletions docs/gitbook/how-it-works.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,9 @@ spec:
# Istio virtual service host names (optional)
hosts:
- podinfo.example.com
# for emergency cases when you want to ship changes
# in production without analysing the canary
skipAnalysis: false
canaryAnalysis:
# schedule interval (default 60s)
interval: 1m
Expand Down Expand Up @@ -167,6 +170,11 @@ And the time it takes for a canary to be rollback when the metrics or webhook ch
interval * threshold
```

In emergency cases, you may want to skip the analysis phase and ship changes directly to production.
At any time you can set the `spec.skipAnalysis: true`.
When skip analysis is enabled, Flagger checks if the canary deployment is healthy and
promotes it without analysing it. If an analysis is underway, Flagger cancels it and runs the promotion.

### HTTP Metrics

The canary analysis is using the following Prometheus queries:
Expand Down
48 changes: 43 additions & 5 deletions docs/gitbook/tutorials/canary-helm-gitops.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Canary Deployments with Helm charts
# Canary Deployments with Helm Charts and GitOps

This guide shows you how to package a web app into a Helm chart, trigger canary deployments on Helm upgrade
and automate the chart release process with Weave Flux.
Expand Down Expand Up @@ -230,10 +230,20 @@ If you've enabled the Slack notifications, you'll receive an alert with the reas

### GitOps automation

Instead of using Helm CLI from a CI tool to perform the install and upgrade, you could use a Git based approach.
Instead of using Helm CLI from a CI tool to perform the install and upgrade,
you could use a Git based approach. GitOps is a way to do Continuous Delivery,
it works by using Git as a source of truth for declarative infrastructure and workloads.
In the [GitOps model](https://www.weave.works/technologies/gitops/),
any change to production must be committed in source control
prior to being applied on the cluster. This way rollback and audit logs are provided by Git.

![Helm GitOps Canary Deployment](https://raw.githubusercontent.com/stefanprodan/flagger/master/docs/diagrams/flagger-flux-gitops.png)

In order to apply the GitOps pipeline model to Flagger canary deployments you'll need
a Git repository with your workloads definitions in YAML format,
a container registry where your CI system pushes immutable images and
an operator that synchronizes the Git repo with the cluster state.

Create a git repository with the following content:

```
Expand Down Expand Up @@ -287,7 +297,34 @@ With the `flux.weave.works` annotations I instruct Flux to automate this release
When an image tag in the sem ver range of `1.4.0 - 1.4.99` is pushed to Quay,
Flux will upgrade the Helm release and from there Flagger will pick up the change and start a canary deployment.

A CI/CD pipeline for the frontend release could look like this:
Install [Weave Flux](https://github.com/weaveworks/flux) and its Helm Operator by specifying your Git repo URL:

```bash
helm repo add weaveworks https://weaveworks.github.io/flux

helm install --name flux \
--set helmOperator.create=true \
--set git.url=git@github.com:<USERNAME>/<REPOSITORY> \
--namespace flux \
weaveworks/flux
```

At startup Flux generates a SSH key and logs the public key. Find the SSH public key with:

```bash
kubectl -n flux logs deployment/flux | grep identity.pub | cut -d '"' -f2
```

In order to sync your cluster state with Git you need to copy the public key and create a
deploy key with write access on your GitHub repository.

Open GitHub, navigate to your fork, go to _Setting > Deploy keys_ click on _Add deploy key_,
check _Allow write access_, paste the Flux public key and click _Add key_.

After a couple of seconds Flux will apply the Kubernetes resources from Git and Flagger will
launch the `frontend` and `backend` apps.

A CI/CD pipeline for the `frontend` release could look like this:

* cut a release from the master branch of the podinfo code repo with the git tag `1.4.1`
* CI builds the image and pushes the `podinfo:1.4.1` image to the container registry
Expand All @@ -302,7 +339,7 @@ A CI/CD pipeline for the frontend release could look like this:

If the canary fails, fix the bug, do another patch release eg `1.4.2` and the whole process will run again.

There are a couple of reasons why a canary deployment fails:
A canary deployment can fail due to any of the following reasons:

* the container image can't be downloaded
* the deployment replica set is stuck for more then ten minutes (eg. due to a container crash loop)
Expand All @@ -312,4 +349,5 @@ There are a couple of reasons why a canary deployment fails:
* the Istio telemetry service is unable to collect traffic metrics
* the metrics server (Prometheus) can't be reached


If you want to find out more about managing Helm releases with Flux here is an in-depth guide
[github.com/stefanprodan/gitops-helm](https://github.com/stefanprodan/gitops-helm).
4 changes: 4 additions & 0 deletions pkg/apis/flagger/v1alpha3/types.go
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,10 @@ type CanarySpec struct {
// the maximum time in seconds for a canary deployment to make progress
// before it is considered to be failed. Defaults to ten minutes.
ProgressDeadlineSeconds *int32 `json:"progressDeadlineSeconds,omitempty"`

// promote the canary without analysing it
// +optional
SkipAnalysis bool `json:"skipAnalysis,omitempty"`
}

// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object
Expand Down
50 changes: 50 additions & 0 deletions pkg/controller/scheduler.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ import (
"strings"
"time"

istiov1alpha3 "github.com/knative/pkg/apis/istio/v1alpha3"
flaggerv1 "github.com/stefanprodan/flagger/pkg/apis/flagger/v1alpha3"
"k8s.io/apimachinery/pkg/apis/meta/v1"
)
Expand Down Expand Up @@ -170,6 +171,11 @@ func (c *Controller) advanceCanary(name string, namespace string, skipLivenessCh
}
}

// check if analysis should be skipped
if skip := c.shouldSkipAnalysis(cd, primaryRoute, canaryRoute); skip {
return
}

// check if the number of failed checks reached the threshold
if cd.Status.Phase == flaggerv1.CanaryProgressing &&
(!retriable || cd.Status.FailedChecks >= cd.Spec.CanaryAnalysis.Threshold) {
Expand Down Expand Up @@ -294,6 +300,50 @@ func (c *Controller) advanceCanary(name string, namespace string, skipLivenessCh
}
}

func (c *Controller) shouldSkipAnalysis(cd *flaggerv1.Canary, primary istiov1alpha3.DestinationWeight, canary istiov1alpha3.DestinationWeight) bool {
if !cd.Spec.SkipAnalysis {
return false
}

// route all traffic to primary
primary.Weight = 100
canary.Weight = 0
if err := c.router.SetRoutes(cd, primary, canary); err != nil {
c.recordEventWarningf(cd, "%v", err)
return false
}
c.recorder.SetWeight(cd, primary.Weight, canary.Weight)

// copy spec and configs from canary to primary
c.recordEventInfof(cd, "Copying %s.%s template spec to %s-primary.%s",
cd.Spec.TargetRef.Name, cd.Namespace, cd.Spec.TargetRef.Name, cd.Namespace)
if err := c.deployer.Promote(cd); err != nil {
c.recordEventWarningf(cd, "%v", err)
return false
}

// shutdown canary
if err := c.deployer.Scale(cd, 0); err != nil {
c.recordEventWarningf(cd, "%v", err)
return false
}

// update status phase
if err := c.deployer.SetStatusPhase(cd, flaggerv1.CanarySucceeded); err != nil {
c.recordEventWarningf(cd, "%v", err)
return false
}

// notify
c.recorder.SetStatus(cd)
c.recordEventInfof(cd, "Promotion completed! Canary analysis was skipped for %s.%s",
cd.Spec.TargetRef.Name, cd.Namespace)
c.sendNotification(cd, "Canary analysis was skipped, promotion finished.",
false, false)

return true
}

func (c *Controller) checkCanaryStatus(cd *flaggerv1.Canary, shouldAdvance bool) bool {
c.recorder.SetStatus(cd)
if cd.Status.Phase == flaggerv1.CanaryProgressing {
Expand Down
41 changes: 41 additions & 0 deletions pkg/controller/scheduler_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,47 @@ func TestScheduler_Rollback(t *testing.T) {
}
}

func TestScheduler_SkipAnalysis(t *testing.T) {
mocks := SetupMocks()
// init
mocks.ctrl.advanceCanary("podinfo", "default", false)

// enable skip
cd, err := mocks.flaggerClient.FlaggerV1alpha3().Canaries("default").Get("podinfo", metav1.GetOptions{})
if err != nil {
t.Fatal(err.Error())
}
cd.Spec.SkipAnalysis = true
_, err = mocks.flaggerClient.FlaggerV1alpha3().Canaries("default").Update(cd)
if err != nil {
t.Fatal(err.Error())
}

// update
dep2 := newTestDeploymentV2()
_, err = mocks.kubeClient.AppsV1().Deployments("default").Update(dep2)
if err != nil {
t.Fatal(err.Error())
}

// detect changes
mocks.ctrl.advanceCanary("podinfo", "default", true)
// advance
mocks.ctrl.advanceCanary("podinfo", "default", true)

c, err := mocks.flaggerClient.FlaggerV1alpha3().Canaries("default").Get("podinfo", metav1.GetOptions{})
if err != nil {
t.Fatal(err.Error())
}
if !c.Spec.SkipAnalysis {
t.Errorf("Got skip analysis %v wanted %v", c.Spec.SkipAnalysis, true)
}

if c.Status.Phase != v1alpha3.CanarySucceeded {
t.Errorf("Got canary state %v wanted %v", c.Status.Phase, v1alpha3.CanarySucceeded)
}
}

func TestScheduler_NewRevisionReset(t *testing.T) {
mocks := SetupMocks()
// init
Expand Down