Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HelmRelease file not updated #214

Closed
derrickburns opened this issue Aug 11, 2021 · 11 comments
Closed

HelmRelease file not updated #214

derrickburns opened this issue Aug 11, 2021 · 11 comments
Labels
bug Something isn't working

Comments

@derrickburns
Copy link

derrickburns commented Aug 11, 2021

@stefanprodan As you know, I have been using your tools for a long time. My general experience is that the tools are rock solid.

Today a colleague reported that an image was not being updated. I inspected our cluster. The Image Automation controller had been running for 4 days. It had made updates as recently as last night. There were no error messages in the logs. In other words, from the perspective of the Image Automation controller, everything was running fine.

I looked for an ImagePolicy for the image in question. I found one. The policy had been updated to the new image hours earlier. The annotation to instruct the Image Automation controller was attached to a HelmRelease. That file had not been modified with the new image tag. However, that file had been updated by the image automation controller in the last 4 days.

apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: mock-metadata-service
spec:
  interval: 5m
  chart:
    spec:
      chart: .
      version: "0.0.4"
      sourceRef:
        kind: GitRepository
        name: mock-metadata-service
        namespace: shared
      interval: 1m
  values:
    name: mock-metadata-service
    imagePullSecrets:
      - name: ecr-credentials-sync
    images:
      tag: 21.0811.1547.38-70e9909c-master # {"$imagepolicy": "shared:mock-metadata-service:tag"}

I restarted the Image Automation controller and it quickly updated the Gitlab repo to reflect the proper version of the Image tag.

So, I am left to conclude that either an error occurred that was not properly logged or there is another bug.

@stefanprodan
Copy link
Member

Could this be related to #209? have you seen any timeout logs?

@derrickburns
Copy link
Author

derrickburns commented Aug 12, 2021

@stefanprodan I think that this just happened again. Following is the data that I was able to collect.

@derrickburns
Copy link
Author

derrickburns commented Aug 12, 2021

I see no timeout in the logs.

Here is the complete log:
image-automation-log.txt

Here is the current date:
date.txt

Here is the image policy:
imagepolicy.txt

@derrickburns
Copy link
Author

derrickburns commented Aug 12, 2021

Pod that lists image deployed:

mock-metadata-service.txt

@derrickburns
Copy link
Author

derrickburns commented Aug 12, 2021

Here is the source of the Helmrelease:

kind: HelmRelease
metadata:
  name: mock-metadata-service
spec:
  interval: 5m
  chart:
    spec:
      chart: .
      version: "0.0.4"
      sourceRef:
        kind: GitRepository
        name: mock-metadata-service
        namespace: shared
      interval: 1m
  values:
    name: mock-metadata-service
    imagePullSecrets:
      - name: ecr-credentials-sync
    images:
      tag: 21.0810.1833.45-01134759-master # {"$imagepolicy": "shared:mock-metadata-service:tag"}

Here is the in cluster representation of the helm release:

helmrelease.txt

@derrickburns
Copy link
Author

Here is the image automation pod:

image-automation-pod.txt

@derrickburns
Copy link
Author

derrickburns commented Aug 17, 2021

What would happen if the controller ran on a node that was very low of memory? Could it cause this failure?

@squaremo
Copy link
Member

👋 @derrickburns

If you set --log-level=debug on the controller deployment, the controller (in recent versions) will record much more about why it does or doesn't make any update. That might reveal if there's some subtle, or mistaken, reason it declines to commit the change you expected.

@bondido
Copy link

bondido commented Dec 13, 2021

Hi,
We are experiencing similar problems.
After quite some time working seamlessly, applications stop being updated automatically by flux.
New image is detected, but no changes are commited to workload repo and application is not updated on cluster.

After forced image-automation-controller restart changes are commited and pushed to repo and application is updated.

We've found no information in logs that could tell us anything about the cause and about the problem itself.

@pjbgf pjbgf added the bug Something isn't working label Mar 22, 2022
@pjbgf
Copy link
Member

pjbgf commented Mar 22, 2022

The image-automation controller version v0.21.0 introduces an experimental transport that fixes the issue in which the controller stops working in some specific scenarios.

The experimental transport needs to be opted-in by setting the environment variable EXPERIMENTAL_GIT_TRANSPORT to true in the controller's Deployment.

This will require a redeploy of all components so I would recommend doing so via flux bootstrap using the flux cli version v0.28.0 which will be released tomorrow.

Can you test it again with the experimental transport enabled and let us know how you get on please?

@pjbgf
Copy link
Member

pjbgf commented Sep 3, 2022

Closing this issue due to inactivity, but happy to reopen in case of reincidence whilst using the latest versions of the image automation controller.

@pjbgf pjbgf closed this as completed Sep 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants