-
Notifications
You must be signed in to change notification settings - Fork 231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement vectored mtvec
#691
Conversation
I have synthesized your modifications and compared them to the current version. Interestingly, I can confirm your findings - the hardware is smaller with vectored mode implemented. To this seems like a good deal: more functionality with less hardware 😎 So, I really like this! But we should think about the alignment of the table entries. I am not sure about the actual use cases, but the current alignment only allows a single jump instruction per table entry reaching a ~1MB range. What if we want to place the table in RAM, say at address 0x8000_0000, and have all the interrupt handler functions in ROM, say at address 0x0000_0000? Then a single jump instruction is not able to reach the according handler functions. Is this something we should take care of?! 🤔 Can we use several instructions (in the table) to construct larger branches without polluting registers of t he interrupted context?
So yes! Let's add this! 👍 |
Perfect, thanks for checking. Let's do this. :)
This is actually how it's designed in the ISA... 🤷♂️,
But this makes the control hardware actually simpler. No additional fetching of the address from memory needs to be done. We could solve this in software by jumping to an intermediate larger table that holds more entries per interrupt source and is located close (+/- 1 MB) to the mtvec table. I.e. the trap handler jumps to Alternatively we could support the CLINT ISA Draft in the future. It adds with
Great, I'll work out a latency test then. :) |
I was thinking about a more relaxed alignment of the individual table entries, maybe something like
That would be a nice workaround - if you are not too concerned about interrupt latency, of course.
Yeah I saw that spec... But let's keep things simple for now 😉
Awesome! We should also think about a simple test case for the processor_check to verify this new hardware |
I have tested the latency of the direct and vectored I got some preliminary latency measurements while simulating various different SW configurations:
By not calling any functions, gcc is able to optimize away the saving and restoring of the caller-saved cpu registers in the ISR. The vectored mode seems to reduce the interrupt latency up to 10 times! 🚀
Sure! I'll probably manually install a table, use the |
f9ff5d6
to
b3d6845
Compare
What do I do wrong...? I can compile and run the new test just fine...
I have GCC 12.2.0. |
Hey @NikLeberg! Looks great so far! I'm in the train right now having void vectored_irq_table(void) {
asm volatile(
".org vectored_irq_table + 0*4 \n"
"jal zero,vectored_global_handler \n" // 0 The entire code is compiled first and is linked afterwards. So at compile-time the label You need to keep an "alias" here until linking. Maybe something like this could work (?): asm volatile(
".org vectored_irq_table + 0*4 \n"
"jal zero, %[dst] \n" // 0
: : [dst] "i" ((uint32_t)&vectored_global_handler)
); At least this is what I see from the GitHub actions log. Maybe it is just a version issue - the actions use GCC 12.1. |
By the way, thanks for the detailed latency evaluation! Vectored interrupts might be very handy for small-scale real-time systems running at low operating frequencies. I think FreeRTOS (now?) also provides an option to use vectored interrupts... ?! 🤔 |
Maybe we could put the table into a unique compilation unit - so the labels might remain aliases until linking 🤔 section(".text.vector_table") |
b3d6845
to
7ef0ae6
Compare
That was worth a try, thanks for the idea. Sadly it was not it.. |
7ef0ae6
to
37e8430
Compare
In general, link-time optimization seems like a good thing to do. I am not sure if it would be some (dirty) work-around if we disable that... What do you think about giving an explicit clobber list? This should keep an alias for the addresses which the linker should be able to resolve: asm volatile(
".org vectored_irq_table + 0*4 \n"
"jal zero, %[dst] \n" // 0
: : [dst] "i" ((uint32_t)&vectored_global_handler)
); |
Spoke too soon, it was the inline asm alias thing you suggested earlier. Sorry for not trying that out sooner. Edit: Yes, LTO is always a good thing! I remember stm32 crt0 not allowing to be compiled with lto what really annoyed me. |
37e8430
to
50a7f8a
Compare
50a7f8a
to
dd5ec55
Compare
All passing now. Thanks for the help! ❤ |
Awesome! Thanks for all your work Nik! I this is is very cool new feature 👍
Actually, I use Notepad++ for everything. I guess I am quite a minimalist 😅 Anyway, asciidoc is quite simple (a bit like Markdown) and I think it is sufficient if we update the mtvec CSR description. Beyond the documentation, we still need to update the version number and make an entry for the change log. |
Looks good to me! 👍 Thanks for all your work Nik! |
Well hello there!
I stumbled upon this:
Which I found very interesting and thought we could implement that! It could shorten the latency from IRQ to calling the right ISR as no intermediate generic handler is required. A simple
jal zero, <function>
as mtvec jump table could directly jump to the handler.Example jump table:
This can then be installed with:
SiFive actually implemented this in plain C using some nice GCC compiler tricks.
As the direct mode is still available, no immediate change to the current sw framework is required. But those that wish to utilize the lower latency may enable vectored
mtvec
mode.I'm not sure about the implications about area and speed. Quartus showed non-deterministic values for LUT usage and adding the vectored mode actually decreased LUT usage (with pr: 1474; without pr: 1494).
WIP to get your feedback if this is generally wished for or not. If ok then I can come up with a test to see how much reduction in latency there actually is and also add some documentation.
Cheers,
Nik