-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aarch64 struct init causes error interrupt in bare metal kernel #11859
Comments
Hmm, is Also, 16777216 is Have you tried actually implementing some of the exception handlers? |
Not yet but now I'm planning to :D |
So, I have not started to implement the interrupt handler yet but I now have more information about the issue. I also did some more thorough debugging, and I guess(-I'm not a 100% certain or anything-), It really shows if the kernel is compiled in Debug mode, since it then causes an interrupt as soon as the first struct gets initiated (and returned).
gdb log:
In |
So, after experimenting with it, there's a few things which may be interesting. |
So after a short break I decided to implement the interrupt handler. I did that in Zig as well (which in hindsight wasn't the smartest of all things to do but it works. I guess bc an interrupt handler is pretty simple at the end. No anonymous struct, no dma, no nested functions, no memory quirks,... just a few exported functions). The interrupt created by this bug(this issues bug..) is signaled via el1_sync_irq. The only register(that I found!...) that was really worth while looking into, is the esr(exception symptom reg) which holds all kinds of information about an exception. Among that is the exception class, which I thought could be of value. In this case the esr EC is
Edit: So after playing around with it and being able to "properly" debug (bc of the interrupt handler), I can now reproduce a(the?) issue consistently (still only in ReleaseSmall, all other build modes don't even(or only partially) run (without throwing exceptions). |
so this is interesting. The interrupt exception kind of confused me ( mov x0, #3 << 20
msr cpacr_el1, x0 most of the issues above were resolved and my code ran just fine. |
Dropping a comment here for future generations... I ran into exactly the same problem. It's kind of confusing that assigning a struct requires SIMD instructions to be allowed, exception that it happens because struct copies are done using the Q registers... these are intended for use with FP/SIMD operations but are also useful because they allow a lot of bits to be moved in a single instruction. That instruction just happens to be one that traps as access to SVE or advanced SIMD. |
Zig Version
0.9.1
Steps to Reproduce
In order to reproduce the issue, some kind of bare metal aarch64 environment is required. In my case, it's a qemu aarch64 elf bootable compiled and linked with the zig builder. Here is my repository with more context.
The issue is tricky, and I'm still trying to find a constant pattern. It's not really reproducible but instead only occurs in certain scenarios. The more general pattern seems to be that as soon as "the" struct gets initiated, somewhere within the initiation, the CPU jumps to the interrupt (
0x200
) and loops bc no error handler is setup yet.If I compile the kernel bootable directly in
DebugMode
, the cpu also jumps immediately to0x200
no matter the code inkernel_main
which is really interesting I guess.Scenario1:
This scenario is very constant in its behavior, but difficult to reproduce I guess. This scenario depends on an MMIO write
qemu_cfg_write_entry(&ramfb_cfg, select, @sizeOf(qemu_dma.QemuRAMFBCfg));
inramfb_setup
. There the cpu branches to 0x200 in theramfb_cfg
init. Always at the second parameterfourcc
.If it does branch, depends on multiple factors.
It branches to the interrupt, if:
qemu_cfg_write_entry
is called afterwards.fourcc
is 0 (and still at exactly that point...)It does not branch if:
qemu_cfg_write_entry
is not calledqemu_cfg_write_entry
if I init the struct in the main function and then pass the pointer to ramfb_setup() it still branches at the .fourcc in the main function.
The code that runs before that does not have any influence on the issue (I still included it for context).
Sadly there is no minimum reproducible example, except for my repository (but that is really simple to setup on any aarch64 system and all the setup is contained within the build.zig)
Scenarion2:
The second scenario depends on whether the function
kprint_ui
contains another function, which in turn returns a (anonymous [does not make a difference wether it's anonymous or ]) struct. If it does contain that other function(in this caseuito
), the cpu branches to0x200
somewhere within, or more regularly at the return struct init ofuitoa
.If I'm calling
kprint_ui_full
(where theuitoa
fn is not called but instead pasted [not inlined!; if I do that, it still crashes {I guess bc the struct is still inited}]) it does run properly and the kernel ends up in the loop.(Edit)Scenarion3:
I tried to find a workaround and found that it does work but the workaround is really weird and I can't make any sense of it.
My approach to the workaround was to remove as much as excess logic as possible which in this case was the function layout lol.
Afterwards I just removed random bits and checked wether it would work. The result is not too different except for that the
barrier()
function is removed and all the function layouts are removed and the code is pasted(again an inline does not work). Also just removing the barrier function and keeping the function layout does also not work. It's really weird.(the code below works and no branch occurs)
Expected Behavior
Not branch to 0x200 and instead continue to the kernel_main loop.
Actual Behavior
As already mentioned, it branches to an "error" interrupt handler, if I dereference the interrupt handler it returns the opcode
16777216
but I can't find any information on what that means.The text was updated successfully, but these errors were encountered: