-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random hangs of booted qemu machines #1511
Comments
One such example of this failure is https://jenkins-fedora-coreos-pipeline.apps.ocp.fedoraproject.org/job/build/1429/ (requires sign in) in which the |
I would say this is definitely the same bug and the posted patch which I tested will fix it. |
@rwmjones Thanks a lot for the investigation on this one! |
Reportedly fixed by 13bb06f. |
This is fixed in There is a kernel update available: https://bodhi.fedoraproject.org/updates/FEDORA-2023-5fdf0dd9fe I've fast-tracked this into Fedora CoreOS: coreos/fedora-coreos-config#2489 and also tagged it into the |
The fix for this went into |
The fix for this went into |
The fix for this went into |
We've noticed a fair amount random timeouts in CI where the machines just don't ever fully come up. For some parts our
--allow-rerun-success
criteria (improved in coreos/coreos-assembler@24df92f) have allowed for the tests to pass since the failure wasn't consistent, but that only applies to some tests. OurtestISO
tests don't benefit from that enhancement and have continued to randomly fail. Not often, but often enough to be annoying. The machines just stop during boot at:Finally we thing we may understand why. As reported by Richard Jones in https://gitlab.com/qemu-project/qemu/-/issues/1696 and on LKML it appears that random hangs in upstream kernels have been observed by other teams and investigated. Richard points to f31dcb1 as the offending commit, but also points to a posted patch that is the believed fix, and that patch mentions that it fixes e9523a0, which is in
v6.3.2
,v6.2.15
,v6.1.28
, etc...Once the proposed patch lands we can then get this back into Fedora/FCOS and hopefully our CI will be happier and any users that may have hit this rare boot issue will be happier too.
The text was updated successfully, but these errors were encountered: