Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to place more .rodata in RAM for performance #2331

Merged
merged 7 commits into from
Oct 18, 2024

Conversation

bjoernQ
Copy link
Contributor

@bjoernQ bjoernQ commented Oct 10, 2024

Thank you for your contribution!

We appreciate the time and effort you've put into this pull request.
To help us review it efficiently, please ensure you've gone through the following checklist:

Submission Checklist 📝

  • I have updated existing examples or added new ones (if applicable).
  • I have used cargo xtask fmt-packages command to ensure that all changed code is formatted correctly.
  • My changes were added to the CHANGELOG.md in the proper section.
  • I have added necessary changes to user code to the Migration Guide.
  • My changes are in accordance to the esp-rs API guidelines

Extra:

Pull Request Details 📖

Description

Adds an option PLACE_MORE_RODATA_IN_RAM which will improve interrupt latency (and general performance - but lacking benchmarks here)

Closes #1747

Testing

Use this and a logic analyzer to see the difference between the option active and not-active (for the initial trigger which is what is likely to happen under cache-pressure in a real application)

//% CHIPS: esp32 esp32c2 esp32c3 esp32c6 esp32h2 esp32s2 esp32s3

#![no_std]
#![no_main]

use core::cell::RefCell;

use critical_section::Mutex;
use esp_backtrace as _;
use esp_hal::{
    gpio::{Event, Input, Io, Level, Output, Pull},
    macros::ram,
    prelude::*,
};

#[ram]
static BUTTON: Mutex<RefCell<Option<Input>>> = Mutex::new(RefCell::new(None));

#[ram]
static LED: Mutex<RefCell<Option<Output>>> = Mutex::new(RefCell::new(None));

#[entry]
fn main() -> ! {
    let peripherals = esp_hal::init(esp_hal::Config::default());

    let mut io = Io::new(peripherals.GPIO, peripherals.IO_MUX);
    io.set_interrupt_handler(handler);
    let led = Output::new(io.pins.gpio2, Level::Low);

    cfg_if::cfg_if! {
        if #[cfg(any(feature = "esp32", feature = "esp32s2", feature = "esp32s3"))] {
            let button = io.pins.gpio0;
        } else {
            let button = io.pins.gpio9;
        }
    }

    let mut button = Input::new(button, Pull::Up);

    critical_section::with(|cs| {
        button.listen(Event::FallingEdge);
        BUTTON.borrow_ref_mut(cs).replace(button);
        LED.borrow_ref_mut(cs).replace(led);
    });

    loop {}
}

#[handler]
#[ram]
fn handler() {
    critical_section::with(|cs| {
        LED.borrow_ref_mut(cs).as_mut().unwrap().toggle();

        BUTTON
            .borrow_ref_mut(cs)
            .as_mut()
            .unwrap()
            .clear_interrupt()
    });
}

While this measures more than the interrupt-latency it gives a good indication of the impact of this new option:

  • ESP32 9.71 vs 25.25
  • ESP32-S2 9.07 vs 14.52
  • ESP32-S3 7.55 vs 12.3
  • ESP32-C2 5.02 vs 28.07
  • ESP32-C3 4.15 vs 18.35
  • ESP32-C6 5.19 vs 12.25
  • ESP32-H2 4.75 vs 23.29

This certainly comes at the cost of more RAM usage - measured the RAM usage increase in bytes for the wifi_embassy_dhcp example (the BLE variant for H2)

  • ESP32 +23716
  • ESP32-S2 +21700
  • ESP32-S3 +22932
  • ESP32-C2 +22640
  • ESP32-C3 +20468
  • ESP32-C6 +23388
  • ESP32-H2 +17720

@bugadani
Copy link
Contributor

bugadani commented Oct 10, 2024

I'm somewhat concerned that this name will be too non-descriptive in the long run. Also the amount of additional RAM use seems waaaay out of proportions to me for this changeset.

@bjoernQ
Copy link
Contributor Author

bjoernQ commented Oct 10, 2024

I'm somewhat concerned that this name will be too non-descriptive in the long run.

Yeah - I'm open to suggestions my first version was "ISR_BOOST" - that was even worse

@@ -91,6 +91,7 @@ pub mod rtc_io;

/// Convenience constant for `Option::None` pin

#[ram]
Copy link
Contributor

@bugadani bugadani Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No way this isn't already in RAM. 🤔 It's supposed to be interior mutable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point it looked in the disassembly like there was a pointer in flash pointing to this ... but now it doesn't look like that from the linker-map anymore ... needs double checking

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can understand that what this points to may live in flash, but the innards are an atomicptr so it's impossible to live in flash. Unless the compiler deduced that it never changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant I think I have seen something in flash pointing to the instance in RAM but anyways need double checking

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double checked the disassembly .... it's not the case (and I agree it shouldn't) - so this is unnecessary

*(.rodata.*_esp_hal_internal_handler*)
*(.rodata.str.*)
*(.rodata.handle_interrupts)
*(.rodata..Lswitch.table.*)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this double .. correct?

Do we have any more granularity with what we include - is it possible to only include switch tables from the interrupt module perhaps?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't handle_interrupts already be in RAM? If not, why not, since it has the #[ram] attribute?

What is _esp_hal_internal_handler, I don't seem to be able to find that string in the source anywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those look like this .rodata..Lswitch.table._ZN16embassy_executor3raw20TaskStorage$LT$F$GT$4poll17h8bd1af7ee097c9d5E, yes

We could try to wild-card match a bit more specific - the naming doesn't make that very easy (and I wonder how much this improves performance beyond ISRs - especially for any ... would be interesting)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the naming doesn't make that very easy

I think modern linkers understand the rust demangling scheme; running that output through rustfilt gives me .rodata..Lswitch.table.embassy_executor::raw::TaskStorage<F>::poll. I suppose for esp-hal's interrupt module that might look like .rodata..Lswitch.table.esp_hal::interrupt, can we wildcard that?

and I wonder how much this improves performance beyond ISRs - especially for any ... would be interesting

I do wonder too, but I'm thinking if we can minimize the ram used here, we might be able to enable this config by default - worst case 5~us interrupt latency is really nice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure we can limit it here and maybe add another option to place all the switch tables in RAM - might be something which can improve performance in general (I still have no proof for that)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't handle_interrupts already be in RAM? If not, why not, since it has the #[ram] attribute?

Yep - should - needs double checking why I added it (maybe by mistake or during desperately trying things)

What is _esp_hal_internal_handler, I don't seem to be able to find that string in the source anywhere.

Created by the #[handler] macro and it's in DROM

3c010020 3c010020 5c 4 C:\projects\esp\esp-hal\examples\target\riscv32imc-unknown-none-elf\release\deps\gpio_interrupt-d547762d74cfa5b6.gpio_interrupt.a9128762895f52da-cgu.0.rcgu.o:(.rodata._ZN14gpio_interrupt26__esp_hal_internal_handler17h8cf7e5aa7b7ed9b2E)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

had another look at the linker map file and (looking at wifi_embassy_dhcp) the amount of switch tables (and the size of those items) is not too much - what seems to be a lot is .Lanon.* and those are named like .Lanon.fad58de7366495db4650cfefac2fcd61.137 - no indication where those are used unfortunately

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok checked it

check sizes "wifi_embassy_dhcp"

- baseline	0000068c	00001138
- all		000009a8	00005e10
- no Lswitch	0000068c	00005e10	(- 796)
- no Lanon	000009a8	00001358	(- 19128)

IMHO further qualifying the swtiches won't give us much other than headaches whenever we rename things - unfortunately we won't be able to further qualify Lanon

esp-hal/build.rs Outdated
"(ESP32 only) Enables a workaround for the issue where SPI in half-duplex mode incorrectly transmits the address on a single line if the data buffer is empty.",
),
(
"place-more-rodata-in-ram",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"place-more-rodata-in-ram",
"place-interrupt-switch-tables-in-ram",

bikeshedding welcome, but something more descriptive than "more" would be better here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree if we can make it specific to the interrupt handling mechanism

@MabezDev
Copy link
Member

MabezDev commented Oct 10, 2024

While this measures more than the interrupt-latency it gives a good indication of the impact of this new option:

Very impressive results! Are these first time fire values? Or an average of multiple runs?

Sorry missed the "(for the initial trigger which is what is likely to happen under cache-pressure in a real application)", that's a really big improvement, nice one!

@bjoernQ
Copy link
Contributor Author

bjoernQ commented Oct 11, 2024

I decided to split the option into two - so users can now choose to have better interrupt-performance with just a little more RAM usage or sacrifice more RAM to squeeze out the last bit of performance.

Results on C6

baseline				12.25
ESP_HAL_PLACE_SWITCH_TABLES_IN_RAM	7.25
ESP_HAL_PLACE_ANON_IN_RAM		11.75
both					5.21

The little impact of ESP_HAL_PLACE_ANON_IN_RAM on its own is a bit surprising but in combination we see the same results as before.

@bugadani
Copy link
Contributor

Just to make it clear to me, ESP_HAL_PLACE_ANON_IN_RAM needs (in your measurements) significantly more RAM than ESP_HAL_PLACE_SWITCH_TABLES_IN_RAM, right?

@bjoernQ
Copy link
Contributor Author

bjoernQ commented Oct 11, 2024

Just to make it clear to me, ESP_HAL_PLACE_ANON_IN_RAM needs (in your measurements) significantly more RAM than ESP_HAL_PLACE_SWITCH_TABLES_IN_RAM, right?

Yes! With "wifi_embassy_dhcp" ESP_HAL_PLACE_ANON_IN_RAM consumes ~ 19k while the swtich tables only add < 1k

esp-hal/build.rs Outdated
),
(
"place-switch-tables-in-ram",
Value::Bool(false),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! With "wifi_embassy_dhcp" ESP_HAL_PLACE_ANON_IN_RAM consumes ~ 19k while the swtich tables only add < 1k

I guess I would be okay with making this opt-out considering the low-ish RAM overhead.

@bjoernQ bjoernQ force-pushed the esp-hal/interrupt-latency branch from 03ed1af to 294eb06 Compare October 11, 2024 09:43
@bjoernQ bjoernQ marked this pull request as ready for review October 11, 2024 11:55
@ProfFan
Copy link
Contributor

ProfFan commented Oct 14, 2024

baseline 12.25
ESP_HAL_PLACE_SWITCH_TABLES_IN_RAM 7.25
ESP_HAL_PLACE_ANON_IN_RAM 11.75
both 5.21

This is amazing! Are these in microseconds? I remember reports from the forum that says the IDF can do it in ~2us. Not sure what is the difference here though...

@@ -2475,6 +2475,7 @@ mod asynch {

use super::*;

#[ram]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this intentional? (Wondering if it was a temporary change for a local test)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one was intentionally - while the waker's internals are places in RAM the array was placed in flash otherwise

@bjoernQ
Copy link
Contributor Author

bjoernQ commented Oct 15, 2024

baseline 12.25
ESP_HAL_PLACE_SWITCH_TABLES_IN_RAM 7.25
ESP_HAL_PLACE_ANON_IN_RAM 11.75
both 5.21

This is amazing! Are these in microseconds? I remember reports from the forum that says the IDF can do it in ~2us. Not sure what is the difference here though...

Yes, the numbers are in microseconds. The numbers include a bit more than just the interrupt invocation so the real interrupt latency is a bit less - would be interesting to have a realistic comparison to what ESP-IDF can do

@MabezDev
Copy link
Member

Do we know if placing anon tables in RAM scales with the program size? I assume because we don't use any wild carding, in a big firmware, with enough match statements this could include N number of bytes in RAM?

@bjoernQ
Copy link
Contributor Author

bjoernQ commented Oct 18, 2024

Do we know if placing anon tables in RAM scales with the program size? I assume because we don't use any wild carding, in a big firmware, with enough match statements this could include N number of bytes in RAM?

Unfortunately, yes - more code means more anon data. Would be good to be able to be more selective there but I don't see how

Copy link
Member

@MabezDev MabezDev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured that was the case, I suppose we can't do anything about that yet, and if we figure out something in the future we can improve it.

Thanks!

@MabezDev MabezDev enabled auto-merge October 18, 2024 11:33
@MabezDev MabezDev added this pull request to the merge queue Oct 18, 2024
Merged via the queue into esp-rs:main with commit 358d884 Oct 18, 2024
25 of 28 checks passed
@bjoernQ bjoernQ deleted the esp-hal/interrupt-latency branch November 26, 2024 08:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reduce / eliminate cache misses in interrupt handling
5 participants