-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TestHermeticTaskRun is flakey #4567
Comments
Some more context, It looks like for #4541 it failed 3 times in a row:
I think It seems like the failing taskrun is timing out:
And then I think we're not getting any logs b/c iirc when a TaskRun times out we have to stop the pod from executing, and I think that might involved deleting the underlying pod?? I'm getting rusty though so I'm not sure XD but if so that might explain why we aren't seeing any logs for the taskrun that is timing out:
Looking at the test that is failing, I'm wondering if it might be that the pipeline/test/hermetic_taskrun_test.go Lines 101 to 102 in 38b9f26
|
Also use Errorf instead of Fatalf between the two tests (the hermetic test and the non-hermetic tests) so that if one fails the other will still run. In tektoncd#4567 we see that the hermetic end to end test sometimes fails, specifically it seems to be the `not-hermetic-run-as-root` version of the test, and it seems like the failure is hitting the 1 minute timeout. Looking at the test, it seems to be doing an `apt-get update` which seems like an operation that would be in grave danger of sometimes taking a while (especially depending on what version of the latest ubuntu image is running) so although I'm not sure that's what is causing the problem, I want to try doing something that is less likely to take so long but still would require network access, as well as something that would require priviledged access (which I assume is why the update was included, to capture the combo of network access and doing something priviledged)
Also use Errorf instead of Fatalf between the two tests (the hermetic test and the non-hermetic tests) so that if one fails the other will still run. In tektoncd#4567 we see that the hermetic end to end test sometimes fails, specifically it seems to be the `not-hermetic-run-as-root` version of the test, and it seems like the failure is hitting the 1 minute timeout. Looking at the test, it seems to be doing an `apt-get update` which seems like an operation that would be in grave danger of sometimes taking a while (especially depending on what version of the latest ubuntu image is running) so although I'm not sure that's what is causing the problem, I want to try doing something that is less likely to take so long but still would require network access, as well as something that would require priviledged access - which I assume is why the update was included, to capture the combo of network access and doing something priviledged. I'm still a bit confused about why both of those elements are present - I assume both are not allowed in hermetic mode but it would probably make more sense to test them separately to be sure they each fail, otherwise only one is covered (i.e. either the network call is going to fail and halt things, or the priviledged operation)
Also use Errorf instead of Fatalf between the two tests (the hermetic test and the non-hermetic tests) so that if one fails the other will still run. In tektoncd#4567 we see that the hermetic end to end test sometimes fails, specifically it seems to be the `not-hermetic-run-as-root` version of the test, and it seems like the failure is hitting the 1 minute timeout. Looking at the test, it seems to be doing an `apt-get update` which seems like an operation that would be in grave danger of sometimes taking a while (especially depending on what version of the latest ubuntu image is running) so although I'm not sure that's what is causing the problem, I want to try doing something that is less likely to take so long but still would require network access. I thought maybe that it was also trying to do somethign that required priviledged execution (i.e. running as root) but it seems like that's not something that hermetic mode drops anyway (looking at the TEP it seems to just be scoped to networking) so it doesn't feel like there is actually any need for that.
Also use Errorf instead of Fatalf between the two tests (the hermetic test and the non-hermetic tests) so that if one fails the other will still run. In #4567 we see that the hermetic end to end test sometimes fails, specifically it seems to be the `not-hermetic-run-as-root` version of the test, and it seems like the failure is hitting the 1 minute timeout. Looking at the test, it seems to be doing an `apt-get update` which seems like an operation that would be in grave danger of sometimes taking a while (especially depending on what version of the latest ubuntu image is running) so although I'm not sure that's what is causing the problem, I want to try doing something that is less likely to take so long but still would require network access. I thought maybe that it was also trying to do somethign that required priviledged execution (i.e. running as root) but it seems like that's not something that hermetic mode drops anyway (looking at the TEP it seems to just be scoped to networking) so it doesn't feel like there is actually any need for that.
Hopefully this is fixed by #4567 but plz re-open if it pops up again! |
Issues go stale after 90d of inactivity. /lifecycle stale Send feedback to tektoncd/plumbing. |
Stale issues rot after 30d of inactivity. /lifecycle rotten Send feedback to tektoncd/plumbing. |
Rotten issues close after 30d of inactivity. /close Send feedback to tektoncd/plumbing. |
@tekton-robot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Expected Behavior
TestHermeticTaskRun
should only fail due to actual bugsActual Behavior
TestHermeticTaskRun
flaked in:The text was updated successfully, but these errors were encountered: