-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flakey sidecar container test #2656
Comments
Also #1819 |
I looked at some build logs where this test failed and noticed that the sidecar test pod has a status of
In logs where the test passed, the pod has a status of There isn't a lot of information out there about these different reasons and why you might get one vs another. I tried looking at the Kubernetes code but I'm not familiar enough with it to understand what might cause one vs the other. I've not been able to reproduce this problem myself after running the test on my own clusters dozens of times. When I run it on my IBM cluster, the pod has status If someone is able to reproduce it, it would be interesting to display the pod and confirm the sidecar container is in P.S. The message
appears when the test passes as well. Somehow Kubernetes is reacting differently to it sometimes. |
Will always be there, try any command with |
@bobcatfish is this still an issue ? |
I'm not sure, let's close it and we can re-open if/when it happens again |
@jerop: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Issues go stale after 90d of inactivity. /lifecycle stale Send feedback to tektoncd/plumbing. |
Stale issues rot after 30d of inactivity. /lifecycle rotten Send feedback to tektoncd/plumbing. |
Rotten issues close after 30d of inactivity. /close Send feedback to tektoncd/plumbing. |
@tekton-robot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen There was an instance of this failing in #4380 |
@sbwsg: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/remove-lifecycle rotten |
Copy/pasting my notes from #4380: Here's a permanent link to the failed test run output. Failed Test: Ran for The test runner reported the following:
Both the sidecar and steps are reported as successful in the captured TaskRun, and both have a non-nil
The creation timestamp on the TaskRun is |
I looked up the logs from the test runner and the logs from the temporary boskos cluster. The boskos cluster logs are filtered down to the exact namespace where the failed TaskRuns were reported executing by Prow. The timestamps of the log messages from Prow are fully ten minutes ahead of the log messages from the Pod in the temporary Boskos cluster. Prow reports test RUN at: 2021-11-17 16:44:32.717 EST TaskRun Pod first log at: 2021-11-17 16:34:25.314 EST I'm not familiar enough with log collection in GCP or our prow cluster to definitively say that's abnormal but it strikes me as really weird! Also weird: Prow reports the RUN and FAIL within 8 milliseconds of each other despite reporting a Theory: Edit: I looked into this ^ briefly but our test code uses polling which doesn't involve comparing against any timestamps in the created resources. At least on first look this doesn't appear to be a likely culprit. |
This probably has the same root cause as #4169. |
Issues go stale after 90d of inactivity. /lifecycle stale Send feedback to tektoncd/plumbing. |
Stale issues rot after 30d of inactivity. /lifecycle rotten Send feedback to tektoncd/plumbing. |
Rotten issues close after 30d of inactivity. /close Send feedback to tektoncd/plumbing. |
@tekton-robot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Expected Behavior
TestSidecarTaskSupport/A_sidecar_that_runs_forever_is_terminated_when_Steps_complete should reliably pass unless there is a bug
Actual Behavior
TestSidecarTaskSupport/A_sidecar_that_runs_forever_is_terminated_when_Steps_complete failed on #2652 with this error in the logs:
In the k8s events you can see:
Particularly interesting:
Additional Info
Looks like #1253 had this same problem - looks like it's some kind of race condition related to switching to the nop image perhaps? (Or this is a red herring and this is just what happens when we kill the sidecar)
The text was updated successfully, but these errors were encountered: