Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for SMI #180

Merged
merged 11 commits into from
May 14, 2019
Merged

Add support for SMI #180

merged 11 commits into from
May 14, 2019

Conversation

stefanprodan
Copy link
Member

@stefanprodan stefanprodan commented May 13, 2019

This PR adds support for SMI TrafficSplit.

Test procedure with the Istio SMI adapter:

  1. Install Istio with Prometheus and Mixer telemetry enabled.
  2. Clone this repo and checkout the smi branch
  3. Install the SMI Istio adapter with kubectl apply -f ./artifacts/smi/istio-adapter.yaml
  4. Install Flagger:
helm upgrade -i flagger ./charts/flagger \
--wait \
--namespace istio-system \
--set image.tag=smi-8fd3e92 \
--set meshProvider=smi:istio
  1. Follow the canary tutorial https://docs.flagger.app/usage/progressive-delivery
  2. Use the following canary spec:
apiVersion: flagger.app/v1alpha3
kind: Canary
metadata:
  name: podinfo
  namespace: test
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: podinfo
  progressDeadlineSeconds: 60
  autoscalerRef:
    apiVersion: autoscaling/v2beta1
    kind: HorizontalPodAutoscaler
    name: podinfo
  service:
    port: 9898
    gateways:
    - public-gateway.istio-system.svc.cluster.local
    hosts:
    - app.example.com
  canaryAnalysis:
    interval: 15s
    threshold: 15
    maxWeight: 30
    stepWeight: 10
    metrics:
    - name: request-success-rate
      threshold: 99
      interval: 1m
    - name: request-duration
      threshold: 500
      interval: 30s
    webhooks:
      - name: load-test
        url: http://flagger-loadtester.test/
        timeout: 5s
        metadata:
          type: cmd
          cmd: "hey -z 10m -q 10 -c 2 http://podinfo.test:9898/"

@surajssd
Copy link

TrafficSplit object was created, and it didn't create any VS object. I think because the operator is deployed in different namespace istio-system, than the application.

@stefanprodan
Copy link
Member Author

stefanprodan commented May 14, 2019

Have you installed the operator using kubectl apply -f ./artifacts/smi/istio-adapter.yaml? The e2e tests cover this, the adapter sits in istio-system while the canary is in the test namespace.

@surajssd
Copy link

surajssd commented May 14, 2019

After applying following diff, I redeployed the adapter:

diff --git a/artifacts/smi/istio-adapter.yaml b/artifacts/smi/istio-adapter.yaml
index eaebdcb..66fd04d 100644
--- a/artifacts/smi/istio-adapter.yaml
+++ b/artifacts/smi/istio-adapter.yaml
@@ -27,7 +27,7 @@ apiVersion: v1
 kind: ServiceAccount
 metadata:
   name: smi-adapter-istio
-  namespace: istio-system
+  namespace: test
 ---
 apiVersion: rbac.authorization.k8s.io/v1
 kind: ClusterRole
@@ -90,7 +90,7 @@ metadata:
 subjects:
   - kind: ServiceAccount
     name: smi-adapter-istio
-    namespace: istio-system
+    namespace: test
 roleRef:
   kind: ClusterRole
   name: smi-adapter-istio
@@ -100,7 +100,7 @@ apiVersion: apps/v1
 kind: Deployment
 metadata:
   name: smi-adapter-istio
-  namespace: istio-system
+  namespace: test
 spec:
   replicas: 1
   selector:
@@ -122,7 +122,7 @@ spec:
           imagePullPolicy: Always
           env:
             - name: WATCH_NAMESPACE
-              value: ""
+              value: "test"
             - name: POD_NAME
               valueFrom:
                 fieldRef:

I see a lot of following errors from the adapter pod:

{"level":"error","ts":1557825936.9886234,"logger":"kubebuilder.controller","msg":"Reconciler error","controller":"trafficsplit-controller",
"request":"test/podinfo",
"error":"admission webhook \"pilot.validation.istio.io\" denied the request: configuration is invalid: wildcard host * is not allowed for virtual services bound to the mesh gateway",
"stacktrace":"github.com/deislabs/smi-adapter-istio/vendor/github.com/go-logr/zapr.(*zapLogger).Error\n\t/Users/aleph/go/src/github.com/deislabs/smi-adapter-istio/vendor/github.com/go-logr/zapr/zapr.go:128\ngit.luolix.top/deislabs/smi-adapter-istio/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/Users/aleph/go/src/github.com/deislabs/smi-adapter-istio/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:217\ngit.luolix.top/deislabs/smi-adapter-istio/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/Users/aleph/go/src/github.com/deislabs/smi-adapter-istio/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158\ngit.luolix.top/deislabs/smi-adapter-istio/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/Users/aleph/go/src/github.com/deislabs/smi-adapter-istio/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\ngit.luolix.top/deislabs/smi-adapter-istio/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/Users/aleph/go/src/github.com/deislabs/smi-adapter-istio/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134\ngit.luolix.top/deislabs/smi-adapter-istio/vendor/k8s.io/apimachinery/pkg/util/wait.Until\n\t/Users/aleph/go/src/github.com/deislabs/smi-adapter-istio/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"}

Hence it is not creating any VirtualService object.

@surajssd
Copy link

Again my bad. Yes you are right there is no need to make any changes to the current config.

I still see same errors, trafficsplit is created successfully but VS is not created. Because of the above error that I have mentioned.

@stefanprodan
Copy link
Member Author

stefanprodan commented May 14, 2019

Ah I assume you're using Istio 1.1.5 and the mesh gateway got more restrictive.

Use this Canary that doesn't attach the virtual service to the internal mesh:

apiVersion: flagger.app/v1alpha3
kind: Canary
metadata:
  name: podinfo
  namespace: test
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: podinfo
  progressDeadlineSeconds: 60
  autoscalerRef:
    apiVersion: autoscaling/v2beta1
    kind: HorizontalPodAutoscaler
    name: podinfo
  service:
    port: 9898
    gateways:
    - public-gateway.istio-system.svc.cluster.local
    hosts:
    - app.example.com
  canaryAnalysis:
    interval: 15s
    threshold: 15
    maxWeight: 30
    stepWeight: 10
    metrics:
    - name: request-success-rate
      threshold: 99
      interval: 1m
    - name: request-duration
      threshold: 500
      interval: 30s
    webhooks:
      - name: load-test
        url: http://flagger-loadtester.test/
        timeout: 5s
        metadata:
          type: cmd
          cmd: "hey -z 10m -q 10 -c 2 http://podinfo.test:9898/"

@surajssd
Copy link

With that canary object it worked fine for me, I was using following:

apiVersion: flagger.app/v1alpha3
kind: Canary
metadata:
  name: podinfo
  namespace: test
spec:
  # deployment reference
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: podinfo
  # the maximum time in seconds for the canary deployment
  # to make progress before it is rollback (default 600s)
  progressDeadlineSeconds: 60
  # HPA reference (optional)
  autoscalerRef:
    apiVersion: autoscaling/v2beta1
    kind: HorizontalPodAutoscaler
    name: podinfo
  service:
    # container port
    port: 9898
    # Istio gateways (optional)
    gateways:
    - public-gateway.istio-system.svc.cluster.local
    - mesh
    # Istio virtual service host names (optional)
    hosts:
    - '*'
  canaryAnalysis:
    # schedule interval (default 60s)
    interval: 1m
    # max number of failed metric checks before rollback
    threshold: 5
    # max traffic percentage routed to canary
    # percentage (0-100)
    maxWeight: 50
    # canary increment step
    # percentage (0-100)
    stepWeight: 10
    metrics:
    - name: request-success-rate
      # minimum req success rate (non 5xx responses)
      # percentage (0-100)
      threshold: 99
      interval: 1m
    - name: request-duration
      # maximum req duration P99
      # milliseconds
      threshold: 500
      interval: 30s
    # generate traffic during analysis
    webhooks:
      - name: load-test
        url: http://flagger-loadtester.test/
        timeout: 5s
        metadata:
          cmd: "hey -z 1m -q 10 -c 2 http://podinfo.test:9898/"

Thanks it seems to work now :-)

@stefanprodan
Copy link
Member Author

Thanks for testing it 🥇

@stefanprodan stefanprodan merged commit b20e017 into master May 14, 2019
@stefanprodan stefanprodan deleted the smi branch May 14, 2019 10:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants