-
Notifications
You must be signed in to change notification settings - Fork 599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Boot issue on dedicated server with talos 1.7.0 since 1.7.0-alpha.1 (1.7.0-alpha.0 worked) #8743
Comments
This is a tough issue to look into. There might be two issues here:
|
P.S. If it's possible, you could try booting in BIOS (non-UEFI) mode to see if that works. |
Thanks for your answers. This is a tough one indeed.
As far as I understand, #8657 does report compatibility issues no ? |
This issue seems to be two issues actually, one is
There should be an option to boot in "legacy mode" (or something like that), which disables UEFI completely. I'm just curious if this is related to UEFI or not. Talos should work both ways, but still. |
I wonder if https://cateee.net/lkddb/web-lkddb/EFI_DISABLE_PCI_DMA.html might be the issue here, it was enabled in alpha.1 version. So I guess the experiment is to add |
that could be it, since it broke booting on arm64 |
Well, that actually solved the problem. Passing With the Thanks guys, you truly sniped this 🙏 What’s the best way of "fixing" this ? Should we add the arguments in our custom images and that’s it ? Or do you consider removing the disable because of compatibility issues ? |
yes, you can do a custom kernel arg for now, and I believe it would still be fine (ignored) if we disable it by default in the kernel config. |
I will actually remove that kernel option, whoever wants that could do a kernel arg to enforce it, but e.g. Ubuntu doesn't enable it by default. |
|
That’s good news. Thanks again for your help guys, truly appreciated 🙏 I will let you guys close this issue or keep it open until the fix are landed. |
This effectively reverts siderolabs#899 completely. Fixes siderolabs/talos#8743 Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com> (cherry picked from commit f414bbd)
Talos 1.7.2 will have this fix included. |
Thanks for this @smira. This pretty much breaks a ton of EFI boot processes. I saw this on the Mac minis as well and patched it inside my builds. |
Bug Report
Boot issue on dedicated server with talos 1.7.0 since 1.7.0-alpha.1 (1.7.0-alpha.0 worked)
Description
We are encountering boot issues with talos
v1.7.0
on a OVH dedicated server. We were previously able to boot talos on this server using1.7.0-alpha.0
version.Logs at boot with
v1.7.0
:And nothing after that. It just hangs.
This is similar to #8657, but we dediced to open another issue since we are using different hardware/provider and were able to pin point versions (see below)
We tried to disable
console
kernel extra args. We had the same result (and yes we checked that the console argument was not here anymore using GRUB)We had the issue with talos
1.7.0
and1.7.1
. Since we previously worked with alpha and beta versions, we tried to check the alpha/beta version in order to find more information that could help the issue.We were able to find the version when the issue starts to happen:
talos
1.7.0-alpha.0
workstalos
1.7.0-alpha.1
and all versions after that does not workWe first supposed that it could be a kernel version issue and hardware incompatibility.
As far as we understand,
v1.7.0-alpha.0
is using kernelv6.6.14
whilev1.7.0-alpha.1
is using kernelv6.6.21
.I will be honest, I tried to read through through the kernel v6.6 patch versions changelog but this is too low level for me and I could not find anything useful
How can we debug this issue further ? Thanks !
Logs
Environment
Working version:
v1.7.0-alpha.0
Non working version:
v1.7.0-alpha.1
and onwardDedicated server specs:
OVH scale a1 server:
AMD EPYC GENOA 9124 - 16c/32t - 3 GHz/3.6 GHz 128Go Ram
The v1.7.0 version works on another (different) dedicated server that runs on a
Intel Xeon-E 2388G - 8c/16t - 3.2 GHz/4.6 GHz
The text was updated successfully, but these errors were encountered: