-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ArgoCD app health does not respect Hooked Job's result, custom LUA check doesn't work either #9861
Comments
👍 to this, i was hoping when a Job resource becomes unhealthy it would update the application to unhealthy to trigger a notification for use to action the issue. I also tried the argocd-cm approach but it doesnt seem to have any affect on the apps health, it does change the health checks for the job though
@mdrakiburrahman did you happen to have any luck with this since you posted? |
@phyzical I came up with a hack a "
Here's the code and the Kustomize overlay, feel free to use it: https://github.com/mdrakiburrahman/openshift-app-of-apps/blob/main/kube-arc-data-services-installer-job/kustomize/base/job-deleter-job.yaml#L12 |
@mdrakiburrahman thanks for the reply :) sadly don't think this will serve our use case, its less about fixing a broken state job (well maybe this will be useful in the future less hands on 😆 ) But more about alerting us that this state has occurred in the first place. Looking at your demo gifs its occurred to me that if we just used plain Job resources it probably would be working, but as we use CronJobs which create Jobs. the health of the Job doesn't propagate up to the CronJob/App it seems. thanks again though |
We use ArgoCD in version 2.4.11. We deploy helm chart which contains a few jobs with Sync hooks and they can take long time to process. We get info, that app is healthy and send emails to users after that when in fact our app is not ready yet. As a fix for now, we will also check if there is an syncing operation before doing any actions, since the operation state for sync is Running when sync jobs are running. But it would be nice to fix it. |
I have just been bitten by this pretty hard. Could someone point me to the source of the decision why hooks aren't influencing the overall health of the application even tho they prevent further Sync operation? It seems pretty counterintuitive. Especially on the UI, because even if your app crashed mid during the deployment, you still see a Healthy App on the UI and just a failed Sync (No easy way to filter for this). Looks like only the initial solution with a ugly job-delete pod and ditching hooks altogether would work to cover all of the needs. Or my requirement could be covered by another tool like argocd rollouts or workflows? |
+1, I am also affected by this issue. |
+1. I am also looking to have this! Currently we have an ArgoCD app which deploys a helm chart containing an ArgoRollout and a Statefulset. If the ArgoRollout is unhealthy and App health is marked as degrade. But we have a Job which runs PostSync to check the health of the statefulset for metrics emitted by the statefulset and if that job returns unhealthy we aren't seeing the AppHealth getting degraded! Would love you hear from this forum on how to solve this! |
Also looking for any kind of answer for this. |
Line 28 in 14a1a55
As part of code design in argocd health check, code is ignoring health of service is there is any hook bind to it. This logic need to be changed to check health of any hooked resource. |
@kshantaramanUFL this is expected behaviour as per code, if job has hooks , controller will ignore job health for application health. argo-cd/controller/health_test.go Line 63 in 20f9719
|
@alexmt can we have some sort of flags in application controller to consider health of resources (that has hooks) for application health ? |
I am also facing the same issue, Is there any workaround that I can use? |
@jaytulshian1301 in code itself if resouces is hooked, health controller ignores health of it. |
My use case is different where I have to use a postSync hook to run a job after the app is synced. Is there something that I can do while keeping the hook? |
instead of using hooks, you can create init container to wait for status and run main container, hooks make job skip health check. |
argocd version
:Describe the bug
My App-of-apps Use case:
Job
. Nothing else.Healthy
,Degraded
,Progressing
).Job
must run every Sync - so naturally I use Argo Hooks <-- here is the issue1 minute demo
Here are 2 versions of the app, one that is running my
Job
code without Hooks (left), and then with hooks (right):Demo Manifests
As you can see, the "non-hook" version's health is respected by Argo, but then I can't run the
Job
again because of K8s.And the "hooked" version
Job
is rerunnable because I delete onHookSucceeded
thanks to Argo, but theJob
health is not respected by Argo.So I'm between a rock and a hard place.
Workarounds
I tried adding in LUA healthchecks that targets
batch/v1
GVK, no luck: mdrakiburrahman/openshift-app-of-apps@4b5ada0If I deploy a dummy
deployment
at the end of mySync
Hook job, that does the trick for first time sync. But I can't go to production with that - because Argo will respect thedeployment's
health before going green (so my App N+1 doesn't fire).I was also thinking of running a
PreSync
job with kubectl inside that deletes myJob
that runs in a non-hook or "vanilla" manner, as a workaround because Argo will force the sync when it notices it is missing so I can rerun theJob
per sync. This is gross.Finally, if Kubernetes would just let me run the non-hook version of my Job multiple times, I would gladly do so, but it doesn't because the name is already there. I also can't use
generateName
because Kustomize doesn't support it yet.Version
The text was updated successfully, but these errors were encountered: