-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prowjobs fail with Pod got deleted unexpectedly
on community aws infrastructure
#9901
Comments
This issue is currently awaiting triage. If CAPI contributors determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Pod got deleted unexpectedly
Pod got deleted unexpectedly
on community aws infrastructure
@ameukam I think you mentioned that this was investigated again. Any news? :) |
We are still looking in to this with the EKS Support team. Will take some time to get the root cause. |
Thank you! Any upstream issue / Slack discussions we can follow? |
@ameukam any update on this fail? also as @sbueringer asked is there an upstream issue that we can follow. |
Does this still happen? (I just keep hearing that it's fixed :)) |
yes, it is fixed, I don't see it on testgrid or on triage board. We can close this issue. |
/close |
@fabriziopandini: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Which jobs are failing?
Several periodics fail over time.
Failures between 12th December and 19th December 2023 (7days):
periodic-cluster-api-e2e-dualstack-and-ipv6-main
: 5 failures the last 7 days (runs every 2h)capi-e2e-main
: 7 failures the last 7 days (runs every 2h)capi-e2e-mink8s-main
: 14 failures the last 7 days (runs every 2h)capi-e2e-main-1-24-1-25
: 1 failure the last 7 days (runs every 24h)capi-e2e-main-1-26-1-27
: 2 failures the last 7 days (runs every 24h)So for the affected jobs this is a failure rate of ~ 10.9% (
= 29/(3*7*12+2*7
) because of this issue.Which tests are failing?
No test or artifacts get reported for the affected job.
Since when has it been failing?
Started when moving jobs to the community owned AWS infrastructure.
The exact start date is unknown, but this flake seems to be evident since Sept 1st 2023.
Testgrid link
No response
Reason for failure (if possible)
The test job only says:
Example: https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/periodic-cluster-api-e2e-mink8s-main/1736768378699255808
Anything else we need to know?
xref original issue where we started to move v1.3 jobs to the community owned AWS infrastructure:
NodeNotReady
test flakes on Release-1.3 test jobs #9379Refer to: kubernetes/org#4433 (comment)
Refer to : kubernetes.slack.com/archives/C8TSNPY4T/p1694020825316969
This might be related to the thread going on in #sig-k8s-infra on
Nodes are randomly freezing and failing 🧵
Label(s) to be applied
/kind failing-test
One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.
The text was updated successfully, but these errors were encountered: