-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kubernetes mode terminating early or not terminating at all. #3578
Comments
Hello! Thank you for filing an issue. The maintainers will triage your issue shortly. In the meantime, please take a look at the troubleshooting guide for bug reports. If this is a feature request, please review our contribution guidelines. |
Hey @Josh-Engle, Can you please upgrade to the |
@nikola-jokic I duplicated the issue in |
Thank you for letting us know! |
Hi @nikola-jokic , could you link the PR that fixes this issue? I'm just curious to see what it was |
Hi @nikola-jokic , we are actually seeing this error on From the image above we can see that it exits in 5m 29s instead. Here is a gist to the runner log: https://gist.github.com/genesis-jamin/774d115df441c3afdd755f73a3c499dc You can grep the logs "Finished process 170 with exit code 0" to see where the |
I'm re-opening as we are actually still seeing this issue on |
Someone else also reported the same issue in the container hooks repo: actions/runner-container-hooks#165 I'm going to copy my comment to that issue as well for posterity's sake |
We were able to root cause this -- turns out it's related to our k8s cluster setup. Our k8s cluster is hosted on GKE, and we noticed that every time a Github step would terminate early, it happened right after the cluster scaled down and evicted some We were able to someone mitigate this issue by adding taints / tolerations so that Another option for us is to disable autoscaling, but that defeats the purpose of using ARC in the first place 😆 |
Checks
Controller Version
0.9.0
Deployment Method
Helm
Checks
To Reproduce
Describe the bug
Even if the workflow should have slept for 7 minutes, it completes successfully after 4 minutes OR it never completes.
Terminating Early:
Never Terminating:
Describe the expected behavior
The workflow should have completed only after the sleep command completed.
Additional Context
Controller Logs
Runner Pod Logs
The text was updated successfully, but these errors were encountered: