-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to place more .rodata
in RAM for performance
#2331
Conversation
I'm somewhat concerned that this name will be too non-descriptive in the long run. Also the amount of additional RAM use seems waaaay out of proportions to me for this changeset. |
Yeah - I'm open to suggestions my first version was "ISR_BOOST" - that was even worse |
esp-hal/src/gpio/mod.rs
Outdated
@@ -91,6 +91,7 @@ pub mod rtc_io; | |||
|
|||
/// Convenience constant for `Option::None` pin | |||
|
|||
#[ram] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No way this isn't already in RAM. 🤔 It's supposed to be interior mutable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At some point it looked in the disassembly like there was a pointer in flash pointing to this ... but now it doesn't look like that from the linker-map anymore ... needs double checking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can understand that what this points to may live in flash, but the innards are an atomicptr so it's impossible to live in flash. Unless the compiler deduced that it never changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant I think I have seen something in flash pointing to the instance in RAM but anyways need double checking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double checked the disassembly .... it's not the case (and I agree it shouldn't) - so this is unnecessary
esp-hal/ld/sections/rwtext.x
Outdated
*(.rodata.*_esp_hal_internal_handler*) | ||
*(.rodata.str.*) | ||
*(.rodata.handle_interrupts) | ||
*(.rodata..Lswitch.table.*) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this double ..
correct?
Do we have any more granularity with what we include - is it possible to only include switch tables from the interrupt module perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't handle_interrupts
already be in RAM? If not, why not, since it has the #[ram]
attribute?
What is _esp_hal_internal_handler
, I don't seem to be able to find that string in the source anywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those look like this .rodata..Lswitch.table._ZN16embassy_executor3raw20TaskStorage$LT$F$GT$4poll17h8bd1af7ee097c9d5E
, yes
We could try to wild-card match a bit more specific - the naming doesn't make that very easy (and I wonder how much this improves performance beyond ISRs - especially for any ... would be interesting)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the naming doesn't make that very easy
I think modern linkers understand the rust demangling scheme; running that output through rustfilt
gives me .rodata..Lswitch.table.embassy_executor::raw::TaskStorage<F>::poll
. I suppose for esp-hal's interrupt module that might look like .rodata..Lswitch.table.esp_hal::interrupt
, can we wildcard that?
and I wonder how much this improves performance beyond ISRs - especially for any ... would be interesting
I do wonder too, but I'm thinking if we can minimize the ram used here, we might be able to enable this config by default - worst case 5~us interrupt latency is really nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure we can limit it here and maybe add another option to place all the switch tables in RAM - might be something which can improve performance in general (I still have no proof for that)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't handle_interrupts already be in RAM? If not, why not, since it has the #[ram] attribute?
Yep - should - needs double checking why I added it (maybe by mistake or during desperately trying things)
What is _esp_hal_internal_handler, I don't seem to be able to find that string in the source anywhere.
Created by the #[handler]
macro and it's in DROM
3c010020 3c010020 5c 4 C:\projects\esp\esp-hal\examples\target\riscv32imc-unknown-none-elf\release\deps\gpio_interrupt-d547762d74cfa5b6.gpio_interrupt.a9128762895f52da-cgu.0.rcgu.o:(.rodata._ZN14gpio_interrupt26__esp_hal_internal_handler17h8cf7e5aa7b7ed9b2E)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
had another look at the linker map file and (looking at wifi_embassy_dhcp) the amount of switch tables (and the size of those items) is not too much - what seems to be a lot is .Lanon.*
and those are named like .Lanon.fad58de7366495db4650cfefac2fcd61.137
- no indication where those are used unfortunately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok checked it
check sizes "wifi_embassy_dhcp"
- baseline 0000068c 00001138
- all 000009a8 00005e10
- no Lswitch 0000068c 00005e10 (- 796)
- no Lanon 000009a8 00001358 (- 19128)
IMHO further qualifying the swtiches won't give us much other than headaches whenever we rename things - unfortunately we won't be able to further qualify Lanon
esp-hal/build.rs
Outdated
"(ESP32 only) Enables a workaround for the issue where SPI in half-duplex mode incorrectly transmits the address on a single line if the data buffer is empty.", | ||
), | ||
( | ||
"place-more-rodata-in-ram", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"place-more-rodata-in-ram", | |
"place-interrupt-switch-tables-in-ram", |
bikeshedding welcome, but something more descriptive than "more" would be better here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree if we can make it specific to the interrupt handling mechanism
Sorry missed the "(for the initial trigger which is what is likely to happen under cache-pressure in a real application)", that's a really big improvement, nice one! |
I decided to split the option into two - so users can now choose to have better interrupt-performance with just a little more RAM usage or sacrifice more RAM to squeeze out the last bit of performance. Results on C6
The little impact of |
Just to make it clear to me, |
Yes! With |
esp-hal/build.rs
Outdated
), | ||
( | ||
"place-switch-tables-in-ram", | ||
Value::Bool(false), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! With "wifi_embassy_dhcp" ESP_HAL_PLACE_ANON_IN_RAM consumes ~ 19k while the swtich tables only add < 1k
I guess I would be okay with making this opt-out considering the low-ish RAM overhead.
03ed1af
to
294eb06
Compare
This is amazing! Are these in microseconds? I remember reports from the forum that says the IDF can do it in ~2us. Not sure what is the difference here though... |
@@ -2475,6 +2475,7 @@ mod asynch { | |||
|
|||
use super::*; | |||
|
|||
#[ram] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this intentional? (Wondering if it was a temporary change for a local test)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one was intentionally - while the waker's internals are places in RAM the array was placed in flash otherwise
Yes, the numbers are in microseconds. The numbers include a bit more than just the interrupt invocation so the real interrupt latency is a bit less - would be interesting to have a realistic comparison to what ESP-IDF can do |
Do we know if placing anon tables in RAM scales with the program size? I assume because we don't use any wild carding, in a big firmware, with enough match statements this could include N number of bytes in RAM? |
Unfortunately, yes - more code means more anon data. Would be good to be able to be more selective there but I don't see how |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figured that was the case, I suppose we can't do anything about that yet, and if we figure out something in the future we can improve it.
Thanks!
Thank you for your contribution!
We appreciate the time and effort you've put into this pull request.
To help us review it efficiently, please ensure you've gone through the following checklist:
Submission Checklist 📝
cargo xtask fmt-packages
command to ensure that all changed code is formatted correctly.CHANGELOG.md
in the proper section.Extra:
Pull Request Details 📖
Description
Adds an option
PLACE_MORE_RODATA_IN_RAM
which will improve interrupt latency (and general performance - but lacking benchmarks here)Closes #1747
Testing
Use this and a logic analyzer to see the difference between the option active and not-active (for the initial trigger which is what is likely to happen under cache-pressure in a real application)
While this measures more than the interrupt-latency it gives a good indication of the impact of this new option:
This certainly comes at the cost of more RAM usage - measured the RAM usage increase in bytes for the
wifi_embassy_dhcp
example (the BLE variant for H2)