implement vectored `mtvec` #691

NikLeberg · 2023-09-17T15:24:42Z

Well hello there!

I stumbled upon this:

An implementation may have different alignment constraints for different modes. In particular, MODE=Vectored may have stricter alignment constraints than MODE=Direct.
[...]
Allowing coarser alignments in Vectored mode enables vectoring to be implemented without a hardware adder circuit.

Which I found very interesting and thought we could implement that! It could shorten the latency from IRQ to calling the right ISR as no intermediate generic handler is required. A simple jal zero, <function> as mtvec jump table could directly jump to the handler.

Example jump table:

.balign 128 // 128 byte alignment i.e. lowest 7 bits all zero
mtvec_jump_table:
  jal zero, trap_handler          //  0 - exception (all)
  nop                             //  1 - n/a
  nop                             //  2 - n/a
  jal zero, msi_handler           //  3 - machine software interrupt
  nop                             //  4 - n/a
  nop                             //  5 - n/a
  nop                             //  6 - n/a
  jal zero, mti_handler           //  7 - machine timer interrupt
  nop                             //  8 - n/a
  nop                             //  9 - n/a
  nop                             // 10 - n/a
  jal zero, mei_handler           // 11 - machine external interrupt
  nop                             // 12 - n/a
  nop                             // 13 - n/a
  nop                             // 14 - n/a
  nop                             // 15 - n/a
  jal zero, firq0_handler         // 16 - fast interrupt  0
  jal zero, firq1_handler         // 17 - fast interrupt  1
  [...]

This can then be installed with:

neorv32_cpu_csr_write(CSR_MTVEC, mtvec_jump_table | 0x1);

SiFive actually implemented this in plain C using some nice GCC compiler tricks.

As the direct mode is still available, no immediate change to the current sw framework is required. But those that wish to utilize the lower latency may enable vectored mtvec mode.

I'm not sure about the implications about area and speed. Quartus showed non-deterministic values for LUT usage and adding the vectored mode actually decreased LUT usage (with pr: 1474; without pr: 1494).

WIP to get your feedback if this is generally wished for or not. If ok then I can come up with a test to see how much reduction in latency there actually is and also add some documentation.

Cheers,
Nik

stnolting · 2023-09-19T18:30:20Z

I'm not sure about the implications about area and speed. Quartus showed non-deterministic values for LUT usage and adding the vectored mode actually decreased LUT usage (with pr: 1474; without pr: 1494).

I have synthesized your modifications and compared them to the current version. Interestingly, I can confirm your findings - the hardware is smaller with vectored mode implemented. To this seems like a good deal: more functionality with less hardware 😎

So, I really like this! But we should think about the alignment of the table entries. I am not sure about the actual use cases, but the current alignment only allows a single jump instruction per table entry reaching a ~1MB range. What if we want to place the table in RAM, say at address 0x8000_0000, and have all the interrupt handler functions in ROM, say at address 0x0000_0000? Then a single jump instruction is not able to reach the according handler functions. Is this something we should take care of?! 🤔 Can we use several instructions (in the table) to construct larger branches without polluting registers of t he interrupted context?

WIP to get your feedback if this is generally wished for or not.

So yes! Let's add this! 👍

NikLeberg · 2023-09-20T22:45:03Z

To this seems like a good deal: more functionality with less hardware 😎

Perfect, thanks for checking. Let's do this. :)

Then a single jump instruction is not able to reach the according handler functions. Is this something we should take care of?! 🤔 Can we use several instructions (in the table) to construct larger branches without polluting registers of t he interrupted context?

This is actually how it's designed in the ISA... 🤷‍♂️, pc is simply to be put at mtvec + 4 * mcause which gives every interrupt cause only one single 32-bit instruction:

When MODE=Vectored, all synchronous exceptions into machine mode cause the pc to be set to the address in the BASE field, whereas interrupts cause the pc to be set to the address in the BASE field plus four times the interrupt cause number. [source]

But this makes the control hardware actually simpler. No additional fetching of the address from memory needs to be done.

We could solve this in software by jumping to an intermediate larger table that holds more entries per interrupt source and is located close (+/- 1 MB) to the mtvec table. I.e. the trap handler jumps to mtvec + 4 * mcause, where a larger table is located from where we can jump to the whole 4 GB address space. The intermediate trampoline table must save or utilize a scratch register and modify mpc though... I fear that with such a setup the latency is not reduced much.

Alternatively we could support the CLINT ISA Draft in the future. It adds with xtvt a new CSR that is an actual real vector table. But this is more complex: On an interrupt the table entry first needs to be fetched from memory before it can be jumped to it.

So yes! Let's add this! 👍

Great, I'll work out a latency test then. :)

stnolting · 2023-09-21T16:30:16Z

This is actually how it's designed in the ISA... 🤷‍♂️, pc is simply to be put at mtvec + 4 * mcause which gives every interrupt cause only one single 32-bit instruction:

I was thinking about a more relaxed alignment of the individual table entries, maybe something like mtvec + 16 * mcause providing space for up to 4 32-bit instructions per entry... 🤔 But you are right - let's stick to the RISC-V specs.

We could solve this in software by jumping to an intermediate larger table that holds more entries per interrupt source and is located close (+/- 1 MB) to the mtvec table

That would be a nice workaround - if you are not too concerned about interrupt latency, of course.

Alternatively we could support the CLINT ISA Draft in the future. It adds with xtvt a new CSR that is an actual real vector table. But this is more complex: On an interrupt the table entry first needs to be fetched from memory before it can be jumped to it.

Yeah I saw that spec... But let's keep things simple for now 😉

Great, I'll work out a latency test then. :)

Awesome! We should also think about a simple test case for the processor_check to verify this new hardware ~~mechanism~~ feature. 😉

NikLeberg · 2023-09-23T13:01:00Z

I have tested the latency of the direct and vectored mtvec mode. It is highly dependant on how the software is set up though. The basic test setup was: CPU is in wfi sleep with mei interrupt enabled. Testbench triggers interrupt and the installed ISR writes GPIO pin 0 to high. The latency is then assumed to be the time between mei assertion and gpio value change.

I got some preliminary latency measurements while simulating various different SW configurations:

26 cycles: Vectored mode + not calling functions in ISR
121 cycles: Vectored mode + calling functions in ISR
73 cycles: Direct mode + simple switch-case handler + not calling functions in ISR
142 cycles: Direct mode + simple switch-case handler + calling functions in ISR
265 cycles: Direct mode + default neorv32 RTE + not calling functions in ISR
285 cycles: Direct mode + default neorv32 RTE + calling functions in ISR

By not calling any functions, gcc is able to optimize away the saving and restoring of the caller-saved cpu registers in the ISR.

The vectored mode seems to reduce the interrupt latency up to 10 times! 🚀

We should also think about a simple test case for the processor_check to verify this new hardware mechanism.

Sure! I'll probably manually install a table, use the sim_irq_trigger() to test mei and msi interrupts, and then re-install the rte with the global_trap_handler() as handler for everything. Do you see any problems with that approach?

NikLeberg · 2023-09-23T15:43:21Z

What do I do wrong...? I can compile and run the new test just fine...

root@neorv32_soc:/workspaces/neorv32_soc/lib/neorv32/sw/example/processor_check# make all
Memory utilization:
   text    data     bss     dec     hex filename
  23636       0     296   23932    5d7c main.elf

root@neorv32_soc:/workspaces/neorv32_soc/lib/neorv32/sw/example/processor_check# make sim
Installing application image to ../../../rtl/core/neorv32_application_image.vhd
Simulating neorv32_application_image.vhd...
[...]
[27] Vectored IRQ (sim) [ok]
[...]

I have GCC 12.2.0.

stnolting · 2023-09-23T16:25:45Z

Hey @NikLeberg!

Looks great so far! I'm in the train right now having ~~bad~~ no internet connection... 🙈
I am no C/software/GCC expert at all, but as far as I can see this might be the problem:

void vectored_irq_table(void) {
  asm volatile(
    ".org  vectored_irq_table + 0*4     \n"
    "jal   zero,vectored_global_handler \n" // 0

The entire code is compiled first and is linked afterwards. So at compile-time the label vectored_global_handler is not known (= the address is not fixed) until linking.

You need to keep an "alias" here until linking. Maybe something like this could work (?):

  asm volatile(
    ".org  vectored_irq_table + 0*4     \n"
    "jal zero, %[dst]                   \n" // 0
  : : [dst] "i" ((uint32_t)&vectored_global_handler)
  );

At least this is what I see from the GitHub actions log. Maybe it is just a version issue - the actions use GCC 12.1.

stnolting · 2023-09-23T16:39:43Z

By the way, thanks for the detailed latency evaluation!

Vectored interrupts might be very handy for small-scale real-time systems running at low operating frequencies. I think FreeRTOS (now?) also provides an option to use vectored interrupts... ?! 🤔

stnolting · 2023-09-24T06:37:31Z

Maybe we could put the table into a unique compilation unit - so the labels might remain aliases until linking 🤔

section(".text.vector_table")

NikLeberg · 2023-09-24T07:58:50Z

That was worth a try, thanks for the idea. Sadly it was not it..
It's the -flto link time optimization flag! The run_check.sh script specifically sets that. Seems as if something gets optimized too early or the inline assembly simply can't be written like that when lto is enabled...

stnolting · 2023-09-24T08:12:05Z

In general, link-time optimization seems like a good thing to do. I am not sure if it would be some (dirty) work-around if we disable that...

What do you think about giving an explicit clobber list? This should keep an alias for the addresses which the linker should be able to resolve:

  asm volatile(
    ".org  vectored_irq_table + 0*4     \n"
    "jal zero, %[dst]                   \n" // 0
  : : [dst] "i" ((uint32_t)&vectored_global_handler)
  );

NikLeberg · 2023-09-24T08:16:17Z

Spoke too soon, it was the inline asm alias thing you suggested earlier. Sorry for not trying that out sooner.

Edit: Yes, LTO is always a good thing! I remember stm32 crt0 not allowing to be compiled with lto what really annoyed me.

NikLeberg · 2023-09-24T09:47:11Z

All passing now. Thanks for the help! ❤
I rebased on latest main. Now the last step would be to add documentation.
Can you share some insights about your editor setup? I'm not too skilled with AsciiDoc to get it visually right on the first time. And waiting for CI to scream at me is a slow workflow. 😄
Do you edit the .adoc files by hand or do you have a specific VsCode extension to recommend? Especially support for creating / editing tables would be nice to have.

stnolting · 2023-09-24T12:54:44Z

Awesome! Thanks for all your work Nik! I this is is very cool new feature 👍

Do you edit the .adoc files by hand or do you have a specific VsCode extension to recommend? Especially support for creating / editing tables would be nice to have

Actually, I use Notepad++ for everything. I guess I am quite a minimalist 😅 Anyway, asciidoc is quite simple (a bit like Markdown) and I think it is sufficient if we update the mtvec CSR description.

Beyond the documentation, we still need to update the version number and make an entry for the change log.

stnolting · 2023-09-25T19:17:08Z

Looks good to me! 👍

Thanks for all your work Nik!

NikLeberg marked this pull request as draft September 17, 2023 15:24

stnolting self-assigned this Sep 19, 2023

stnolting added enhancement New feature or request risc-v compliance Modification to comply with official RISC-V specs. HW Hardware-related labels Sep 19, 2023

stnolting assigned NikLeberg Sep 19, 2023

NikLeberg force-pushed the vectored_mtvec branch from f9ff5d6 to b3d6845 Compare September 23, 2023 15:22

stnolting self-requested a review September 23, 2023 16:40

NikLeberg force-pushed the vectored_mtvec branch from b3d6845 to 7ef0ae6 Compare September 24, 2023 07:30

NikLeberg force-pushed the vectored_mtvec branch from 7ef0ae6 to 37e8430 Compare September 24, 2023 08:11

NikLeberg force-pushed the vectored_mtvec branch from 37e8430 to 50a7f8a Compare September 24, 2023 09:25

NikLeberg added 2 commits September 24, 2023 11:29

[rtl] implement vectored mtvec

5ba50a7

[processor_check] add test for vectored mtvec

dd5ec55

NikLeberg force-pushed the vectored_mtvec branch from 50a7f8a to dd5ec55 Compare September 24, 2023 09:45

NikLeberg added 2 commits September 24, 2023 20:54

[doc] document vectored mtvec mode

d3ec072

[CHANGELOG] add v1.8.9.4

004e602

NikLeberg marked this pull request as ready for review September 25, 2023 16:48

stnolting approved these changes Sep 25, 2023

View reviewed changes

stnolting merged commit 562d838 into stnolting:main Sep 26, 2023
5 checks passed

NikLeberg deleted the vectored_mtvec branch September 27, 2023 19:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement vectored `mtvec` #691

implement vectored `mtvec` #691

NikLeberg commented Sep 17, 2023 •

edited

Loading

stnolting commented Sep 19, 2023 •

edited

Loading

NikLeberg commented Sep 20, 2023

stnolting commented Sep 21, 2023 •

edited

Loading

NikLeberg commented Sep 23, 2023

NikLeberg commented Sep 23, 2023

stnolting commented Sep 23, 2023 •

edited

Loading

stnolting commented Sep 23, 2023

stnolting commented Sep 24, 2023

NikLeberg commented Sep 24, 2023

stnolting commented Sep 24, 2023

NikLeberg commented Sep 24, 2023 •

edited

Loading

NikLeberg commented Sep 24, 2023

stnolting commented Sep 24, 2023

stnolting commented Sep 25, 2023

implement vectored mtvec #691

implement vectored mtvec #691

Conversation

NikLeberg commented Sep 17, 2023 • edited Loading

stnolting commented Sep 19, 2023 • edited Loading

NikLeberg commented Sep 20, 2023

stnolting commented Sep 21, 2023 • edited Loading

NikLeberg commented Sep 23, 2023

NikLeberg commented Sep 23, 2023

stnolting commented Sep 23, 2023 • edited Loading

stnolting commented Sep 23, 2023

stnolting commented Sep 24, 2023

NikLeberg commented Sep 24, 2023

stnolting commented Sep 24, 2023

NikLeberg commented Sep 24, 2023 • edited Loading

NikLeberg commented Sep 24, 2023

stnolting commented Sep 24, 2023

stnolting commented Sep 25, 2023

implement vectored `mtvec` #691

implement vectored `mtvec` #691

NikLeberg commented Sep 17, 2023 •

edited

Loading

stnolting commented Sep 19, 2023 •

edited

Loading

stnolting commented Sep 21, 2023 •

edited

Loading

stnolting commented Sep 23, 2023 •

edited

Loading

NikLeberg commented Sep 24, 2023 •

edited

Loading