-
Notifications
You must be signed in to change notification settings - Fork 740
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flux restarts primary deployment before canary analysis begins #928
Comments
I was able to get the expected behavior by updating the kustomize.toolkit.fluxcd.io/v1beta1 apiVersion: kustomize.toolkit.fluxcd.io/v1beta1
kind: Kustomization
metadata:
name: apps
namespace: flux-system
spec:
interval: 30m0s
dependsOn:
- name: istio-system
sourceRef:
kind: GitRepository
name: flux-system
path: ./apps
prune: true
validation: client
healthChecks:
- apiVersion: apps/v1
kind: Deployment
name: flagger-loadtester
namespace: prod
images:
- name: ghcr.io/stefanprodan/podinfo
newName: ghcr.io/stefanprodan/podinfo
newTag: 5.0.1 and then I removed the images section from kustomization.yaml. My guess is that making this change causes flux to edit the deployment in place instead of overwriting the deployment resource when the kustomization is edited "outside" of flux. I would appreciate any additional information about this behavior. I know this is more of a flux related issue, but it seems that flagger is designed to be interoperable with flux so I would appreciate more clarity into how flux and flagger interact with each other. |
I'm running in to the same issue. Sadly enough the fix by @stefanprodan didn't fix it for me. What I see what's happening at the moment is the following with flagger v1.12.0:
Also using flux in this case and images are updated via updating the image tag in |
@Whyeasy make sure to use flux v0.15.0 |
Updated flux, recreated all resources to start from scratch. Still the same sequence. |
Ok can you please post here |
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
deployment.kubernetes.io/revision: "2"
kustomize.toolkit.fluxcd.io/checksum: 0a40893bfdc545d62125bd3e74eeb2ebaa7097c2
creationTimestamp: "2021-06-17T12:47:52Z"
generation: 3
labels:
app: booking-service-primary
cluster: flagship-dev
kustomize.toolkit.fluxcd.io/name: starfleet-dev
kustomize.toolkit.fluxcd.io/namespace: flux-system
name: booking-service-primary
namespace: starfleet-dev
ownerReferences:
- apiVersion: flagger.app/v1beta1
blockOwnerDeletion: true
controller: true
kind: Canary
name: booking-service
uid: d1b9b88c-2571-48fb-a037-09613f1998d0
resourceVersion: "828453816"
selfLink: /apis/apps/v1/namespaces/starfleet-dev/deployments/booking-service-primary
uid: 89bddb7b-f6a3-44bb-afa1-fb2470ad7b53
spec:
progressDeadlineSeconds: 600
replicas: 2
revisionHistoryLimit: 10
selector:
matchLabels:
app: booking-service-primary
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
annotations:
cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
flagger-id: 3e80a49a-4ec7-410b-a471-7e4514f6737a
creationTimestamp: null
labels:
app: booking-service-primary
cluster: flagship-dev
spec:
containers:
- <containers> |
@Whyeasy thanks, just be sure this bug is in the latest version, can you post here |
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "3"
meta.helm.sh/release-name: flagger
meta.helm.sh/release-namespace: linkerd
creationTimestamp: "2021-05-17T08:31:03Z"
generation: 3
labels:
app.kubernetes.io/instance: flagger
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: flagger
helm.sh/chart: flagger-1.12.0
helm.toolkit.fluxcd.io/name: flagger
helm.toolkit.fluxcd.io/namespace: linkerd
name: flagger
namespace: linkerd
resourceVersion: "828428857"
selfLink: /apis/apps/v1/namespaces/linkerd/deployments/flagger
uid: be21a5c8-58a9-4721-b6d8-e3105754a668
spec:
progressDeadlineSeconds: 600
replicas: 3
revisionHistoryLimit: 10
selector:
matchLabels:
app.kubernetes.io/instance: flagger
app.kubernetes.io/name: flagger
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
annotations:
appmesh.k8s.aws/sidecarInjectorWebhook: disabled
prometheus.io/port: "8080"
prometheus.io/scrape: "true"
creationTimestamp: null
labels:
app.kubernetes.io/instance: flagger
app.kubernetes.io/name: flagger
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchLabels:
app.kubernetes.io/instance: flagger
app.kubernetes.io/name: flagger
topologyKey: kubernetes.io/hostname
weight: 100
containers:
- command:
- ./flagger
- -log-level=info
- -mesh-provider=linkerd
- -metrics-server=http://prometheus-kube-prometheus-prometheus.monitoring:9090
- -enable-config-tracking=true
- -slack-user=flagger
- -enable-leader-election=true
- -leader-election-namespace=linkerd
image: ghcr.io/fluxcd/flagger:1.12.0
imagePullPolicy: IfNotPresent
livenessProbe:
exec:
command:
- wget
- --quiet
- --tries=1
- --timeout=4
- --spider
- http://localhost:8080/healthz
failureThreshold: 3
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
name: flagger
ports:
- containerPort: 8080
name: http
protocol: TCP
readinessProbe:
exec:
command:
- wget
- --quiet
- --tries=1
- --timeout=4
- --spider
- http://localhost:8080/healthz
failureThreshold: 3
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
resources:
limits:
cpu: "1"
memory: 512Mi
requests:
cpu: 10m
memory: 32Mi
securityContext:
readOnlyRootFilesystem: true
runAsUser: 10001
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
nodeSelector:
node_pool: preemptible
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: flagger
serviceAccountName: flagger
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoSchedule
key: preemptible
operator: Equal
value: "true" |
Describe the bug
I have been experimenting with flagger using the gitops-istio repository. I have found that when updating an image tag, flux will reprovision the
backend-primary
deployment immediately on reconciliation before canary analysis begins. During canary analysis, both the backend and backend-primary deployments are running the new image (which defeats the point of canary analysis obviously).If I update the image tag by editing the deployment manually (i.e.
kubectl -n prod edit deployment backend
), the canary analysis works as expected (backend deployment is updated and scaled up -> canary analysis proceeds -> backend-primary is updated if successful -> backend is scaled down).To Reproduce
Run
kubectl -n prod get deployment -w
to watch the status of deployment. Then update the image tag in git and wait for reconciliation. You should see that thebackend-primary
deployment is restarted, and the image tag for thebackend-primary
deployment is immediately bumped to the new version.Expected behavior
backend-primary
image should only be updated after canary analysis completes (successfully).Additional context
Using all version provided by 570060d on gitops-istio
The text was updated successfully, but these errors were encountered: