-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wait container stuck in a loop while main container finished successfully #6115
Comments
Excellent. Thank you. Can I please ask you to test the |
what do you suggest I test for? We weren't able to reproduce this every time, it just happens sporadically. |
can you soak it and see? |
This issue is still happening in wait pod stuck in loop:
Running the following command manually inside the pod:
Would note that this workflow uses the retry strategy:
|
This was not fixed. |
@avifried1 can you please attach the full logs? I don't have quite enough logs to diagnose. |
I have the same problem |
[root@host1 ~]# kubectl -n argo logs hello-world-xr5lw wait |
[root@host1 ~]# kubectl -n argo logs hello-world-xr5lw main < hello world >
|
[root@host1 ~]# kubectl -n argo get pod Normal Scheduled 14m default-scheduler Successfully assigned argo/hello-world-xr5lw to 11.0.2.157 |
wait container logs are basically an endless loop of:
|
Had the same issue as others. Did not see this issue on local minikube cluster, but ran into it on digitalocean cluster. For now, I've switched the executor to |
@raviatmaryh thank you ,the issue has been solve. |
not sure why this was closed, the issue hasn't been solved - it has been circumvented by using k8sapin executor. Using the default Docker executor still has this issue. |
The Docker executor is being sunset. You should switch to using either PNS or Emissary: |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
I think this has been fixed. |
Summary
Wait container is occasionally stuck, workflow keeps running until it times out, while the main pod has finished successfully.
wait container log:
pod is in
Completed
status, main container is interminated - Completed (exit code: 0)
status, wait container is inrunning, ready
status. And the workflow (it's a multi-step DAG workflow) hangs onrunning
(eventually terminating after exceeding active deadline).The main container is a very simple bash step of recording timestamp (
sh -c 'date +%s > /tmp/timestamp.txt
) that takes perhaps seconds to complete.Diagnostics
Running self-managed Kubernetes on AWS.
Argo version 3.1.0-rc12
Docker executer
We've had this issue from time to time, we can't pinpoint the version in which we started seeing it.
Examining controller logs with the wf name in them:
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.
The text was updated successfully, but these errors were encountered: