Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

helm hooks are deleted right after creation, not after all hooks are created #2737

Closed
3 tasks done
amkartashov opened this issue Nov 18, 2019 · 16 comments · Fixed by argoproj/gitops-engine#92
Closed
3 tasks done
Assignees
Labels
bug Something isn't working component:core Syncing, diffing, cluster state cache workaround There's a workaround, might not be great, but exists
Milestone

Comments

@amkartashov
Copy link

Checklist:

  • I've searched in the docs and FAQ for my answer: http://bit.ly/argocd-faq.
  • I've included steps to reproduce the bug.
  • I've pasted the output of argocd version.

Describe the bug

I am deploying prometheus-operator as helm dependency. It has helm hook resources including serviceaccount and job which uses this serviceaccount. I see that sa gets created and then deleted before job is created. As a result, job fails and sync fails.

To Reproduce

Try to deploy prometheus-operator as helm dependency with admission webhooks enabled in values:

prometheus-operator:
  prometheusOperator:
    admissionWebhooks:
      enabled: true

It will try to create serviceaccount hook https://github.com/helm/charts/blob/master/stable/prometheus-operator/templates/prometheus-operator/admission-webhooks/job-patch/serviceaccount.yaml then it will delete it, then it will try to create job https://github.com/helm/charts/blob/master/stable/prometheus-operator/templates/prometheus-operator/admission-webhooks/job-patch/job-createSecret.yaml and it will fail because serviceaccount have been deleted to this moment.

Expected behavior

Helm deletes hooks in one go after all hooks are done, this is not documented, but it's clear from code: https://github.com/helm/helm/blob/9b42702a4bced339ff424a78ad68dd6be6e1a80a/pkg/action/hooks.go#L98 - I expect the same behavior from argocd - to delete hook resources only all of them were processed (or after sync wave maybe). It makes sense because it's often needed to have hooks running in parallel or depending on each other.

Version

argocd: v1.3.0-rc1+8a43840
  BuildDate: 2019-10-16T21:44:49Z
  GitCommit: 8a43840f0baef50946d3eadc1d52d6a2abc162d5
  GitTreeState: clean
  GoVersion: go1.12.6
  Compiler: gc
  Platform: linux/amd64
argocd-server: v1.3.0+9f8608c
  BuildDate: 2019-11-13T01:51:00Z
  GitCommit: 9f8608c9fcb2a1d8dcc06eeadd57e5c0334c5800
  GitTreeState: clean
  GoVersion: go1.12.6
  Compiler: gc
  Platform: linux/amd64
  Ksonnet Version: v0.13.1
  Kustomize Version: Version: {Version:kustomize/v3.2.1 GitCommit:d89b448c745937f0cf1936162f26a5aac688f840 BuildDate:2019-09-27T00:10:52Z GoOs:linux GoArch:amd64}
  Helm Version: v2.15.2
  Kubectl Version: v1.14.0
@amkartashov amkartashov added the bug Something isn't working label Nov 18, 2019
@alexec alexec modified the milestones: v1.4, v1.3 Nov 18, 2019
@alexec alexec self-assigned this Nov 18, 2019
@alexec
Copy link
Contributor

alexec commented Nov 18, 2019

Thank you for submitting such a detailed bug report.

A PR for this won't be immediate for this, in the meantime do you have a work-around that we can share?

Alex

@amkartashov
Copy link
Author

amkartashov commented Nov 19, 2019

do you have a work-around that we can share?

Unfortunately no, I just disabled admissionsWebhooks and tlsProxy in prometheus-operator. One can manually render hooks during installation/upgrade, but this can be tricky.

@alexec
Copy link
Contributor

alexec commented Nov 21, 2019

You can choose one or more defined annotation values: * "hook-succeeded" specifies Helm should delete the hook after the hook is successfully executed. * "hook-failed" specifies Helm should delete the hook if the hook failed during execution. * "before-hook-creation" specifies Helm should delete the previous hook before the new hook is launched.

@alexec
Copy link
Contributor

alexec commented Nov 21, 2019

So we've implemented as per the Helm docs, but not as per their implementation. They don't delete hooks until the end of the hook's phase. So we should mimic that rather than just deleting right at the end.

@alexec alexec added the workaround There's a workaround, might not be great, but exists label Nov 21, 2019
@alexmt alexmt modified the milestones: v1.3, v1.4 Dec 10, 2019
@alexec alexec removed their assignment Jan 14, 2020
@alexmt alexmt modified the milestones: v1.4, v1.5 Jan 23, 2020
@alexmt alexmt modified the milestones: v1.5, v1.6 Feb 18, 2020
@jdfalk
Copy link
Contributor

jdfalk commented Apr 30, 2020

Is there any progress on at least a temporary patch to match the expected behavior?

@wrdls
Copy link

wrdls commented May 7, 2020

I'm seeing the same issue with ingress-nginx:

chart: ingress-nginx
chart version: 2.1.0
chart repo: https://kubernetes.github.io/ingress-nginx
app version: 3.2.0

@jannfis jannfis added the component:core Syncing, diffing, cluster state cache label May 14, 2020
@alexmt alexmt modified the milestones: v1.6 GitOps Engine, v1.7 May 20, 2020
@mayzhang2000 mayzhang2000 self-assigned this May 23, 2020
@ppmathis
Copy link

ppmathis commented Jun 26, 2020

I've hit the same issue as @wrdls and @estahn when attempting to deploy ingress-nginx with version 2.9.0. In my case, I'm not even using Helm but only kustomize, however due to the official manifest containing helm hook annotations, the sync process of ingress-nginx always failed.

As ingress-nginx itself does not need these helm hooks to work properly (regular deployment through kubectl is supported too), I managed to get it working by removing all helm hook annotations. Unfortunately ArgoCD does not allow disabling the parsing/handling of Helm hooks (even when just using kustomize in the first place) itself, so I had to strip them from the original manifest using a few patchesJson6902 statements within kustomize:

kustomization.yaml
patchesJson6902:
  - target:
      group: rbac.authorization.k8s.io
      version: v1
      kind: ClusterRole
      name: ingress-nginx-admission
      namespace: ingress-nginx
    path: no-helm-hooks.json

  - target:
      group: rbac.authorization.k8s.io
      version: v1
      kind: ClusterRoleBinding
      name: ingress-nginx-admission
      namespace: ingress-nginx
    path: no-helm-hooks.json

  - target:
      group: batch
      version: v1
      kind: Job
      name: ingress-nginx-admission-create
      namespace: ingress-nginx
    path: no-helm-hooks.json

  - target:
      group: batch
      version: v1
      kind: Job
      name: ingress-nginx-admission-patch
      namespace: ingress-nginx
    path: no-helm-hooks.json

  - target:
      group: rbac.authorization.k8s.io
      version: v1
      kind: Role
      name: ingress-nginx-admission
      namespace: ingress-nginx
    path: no-helm-hooks.json

  - target:
      group: rbac.authorization.k8s.io
      version: v1
      kind: RoleBinding
      name: ingress-nginx-admission
      namespace: ingress-nginx
    path: no-helm-hooks.json

  - target:
      version: v1
      kind: ServiceAccount
      name: ingress-nginx-admission
      namespace: ingress-nginx
    path: no-helm-hooks.json
no-helm-hooks.json
[
  {
    "op": "remove",
    "path": "/metadata/annotations/helm.sh~1hook"
  },
  {
    "op": "remove",
    "path": "/metadata/annotations/helm.sh~1hook-delete-policy"
  }
]

Maybe this helps someone else as a temporary workaround to sync ingress-nginx when Helm is not being used.

EDIT: Just to clarify, I'm explicitly referring to the automatic mapping of Helm hooks to ArgoCD hooks as described in the manual, which seems to take place even if not using Helm at all.

@jdfalk
Copy link
Contributor

jdfalk commented Jul 16, 2020

Any idea what release this will be included in?

@jsolbrig
Copy link

@jessesuen Is there a timeline for this fix to be released? I am still running into this issue with prometheus-operator.

@andronux
Copy link

🆙

@foxracle
Copy link

Is there a timeline for this fix to be released? I am still running into this issue with ingress Nginx

@alexmt alexmt modified the milestones: v1.7 , v1.8 Oct 29, 2020
@alexmt
Copy link
Collaborator

alexmt commented Oct 29, 2020

It is 1.8 - we are working on testing it now. @mayzhang2000 do you think it is safe to cherry-pick fix into 1.7 as well?

@mayzhang2000
Copy link
Contributor

I think it is safe to cherry-pick to 1.7. I will take care of that if you approve.

@alexmt
Copy link
Collaborator

alexmt commented Oct 29, 2020

Thank you @mayzhang2000 ! Please, go ahead.

@mayzhang2000
Copy link
Contributor

mayzhang2000 commented Oct 30, 2020

When I am trying to cherry-pick to 1.7, I realized there are too many merge conflicts. It is quite risky.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working component:core Syncing, diffing, cluster state cache workaround There's a workaround, might not be great, but exists
Projects
None yet
Development

Successfully merging a pull request may close this issue.