rpi4's bare-metal hashing performance is poor without caching #155
Replies: 5 comments 17 replies
-
I presume the file is already entirely copied to RAM when your loader does computation on it? Do you have virtual memory and caching enabled? |
Beta Was this translation helpful? Give feedback.
-
Just in case this wasn’t already discussed: Caching is predicated on the
MMU being enabled on Arm v{7,8}~A. Cacheability expression needs to be done
via page table descriptors.
This is true irrespective of address translation being required or not.
A typical setup for such early boot code is to setup identity maps via a
suitable set of page table entries.
…On Fri, 15 Apr 2022 at 18:35, nihalpasham ***@***.***> wrote:
Yes, the file (to be hashed) is loaded into RAM.
The MMU is disabled, so no virtual memory. I assume by caching, you mean
d-cache. If yes, that's not enabled either. (one of the goals is to ensure
that the bootloader has the smallest possible trusted computing base)
But that's an interesting point. I assumed the only variable to consider
was the single-core frequency. Would enabling them improve performance?
If yes, I'd be curious to know why?
—
Reply to this email directly, view it on GitHub
<#155 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFMKYRSRRIBYNOYS6XGEKLVFFSQBANCNFSM5TP7SIFQ>
.
You are receiving this because you are subscribed to this thread.Message
ID:
<rust-embedded/rust-raspberrypi-OS-tutorials/repo-discussions/155/comments/2573843
@github.com>
|
Beta Was this translation helpful? Give feedback.
-
Ok, I compiled Note: I moved the MMU activation code, so that we're able to log the activation flow. . [ 0.007482] MAIR_EL1: 0xff04
[ 0.075640] Special regions:
[ 0.075698] 0x00080000 - 0x0008ffff | 64 KiB | C RO PX | Kernel code and RO data
[ 0.076652] 0x1fff0000 - 0x1fffffff | 64 KiB | Dev RW PXN | Remapped Device MMIO
[ 0.077638] 0xfe000000 - 0xff84ffff | 24 MiB | Dev RW PXN | Device MMIO
[ 0.078527] BASE ADDR: 0x120000
[ 0.078905] TTBR0_EL1: 0x120000
[ 0.079285] TCR_EL1: 0x200807520
[ 0.079675] SCTLR_EL1: 0xc50838
[ 0.080054] After enabling MMU, SCTLR_EL1: 0xc5183d
[ 0.080648] mingo version 0.10.0
[ 0.081038] Booting on: Raspberry Pi 4
[ 0.081493] MMU online. Special regions:
[ 0.081970] 0x00080000 - 0x0008ffff | 64 KiB | C RO PX | Kernel code and RO data
[ 0.082988] 0x1fff0000 - 0x1fffffff | 64 KiB | Dev RW PXN | Remapped Device MMIO
[ 0.083974] 0xfe000000 - 0xff84ffff | 24 MiB | Dev RW PXN | Device MMIO
[ 0.084862] Current privilege level: EL1
[ 0.085339] Exception handling state:
[ 0.085783] Debug: Masked
[ 0.086173] SError: Masked
[ 0.086563] IRQ: Masked
[ 0.086953] FIQ: Masked
[ 0.087343] Architectural timer resolution: 18 ns
[ 0.087917] Drivers loaded:
[ 0.088253] 1. BCM GPIO
[ 0.088610] 2. BCM PL011 UART
[ 0.089033] Timer test, spinning for 1 second
[ !!! ] Writing through the remapped UART at 0x1FFF_1000
[ 1.089900] Echoing input now However, when I copy and paste the (same) mmu-code from Note:
I plan on getting a hardware debugger. But in the meantime, any thoughts on what I'm doing wrong here? ❯ terminal-s.exe
--- COM3 is connected. Press Ctrl+] to quit ---
......
......
[ 2.211136] MAIR_EL1: 0xff04
[ 2.485412] translation tables populated
[ 2.486370] Special regions:
[ 2.489151] 0x00080000 - 0x000a2fff | 140 KiB | C RO PX | Kernel code and RO data
[ 2.497317] 0x1fff0000 - 0x1fffffff | 64 KiB | Dev RW PXN | Remapped Device MMIO
[ 2.505223] 0xfe000000 - 0xff84ffff | 24 MiB | Dev RW PXN | Device MMIO
[ 2.512347] BASE ADDR: 0x280000
[ 2.515387] TTBR0_EL1: 0x280000
[ 2.518427] TCR_EL1: 0x200807520
[ 2.521555] first isb passed
[ 2.524334] SCTLR_EL1: 0xc50838
[ 2.527375] new SCTLR_EL1: 0xc5183d
[
---- crashes ----- a red led turns on and stays on |
Beta Was this translation helpful? Give feedback.
-
This doc outlines the AArch64 Linux boot protocol and
architectural/micro-architectural expectations:
https://www.kernel.org/doc/Documentation/arm64/booting.txt
The MMU needs to be off and the caching needs to be explicitly disabled
additionally.
…On Thu, Apr 21, 2022 at 7:38 PM Andre Richter ***@***.***> wrote:
Well, the first thing that Linux will do is to set up its own page tables.
I don’t know by heart what the expectation from a previous boot loader
stage is with respect to the architectural state of the memory subsystem.
For starters, I would probably just disable caching again before jumping
Linux.
—
Reply to this email directly, view it on GitHub
<#155 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFMKYVEXFOPYGMKBGRFJETVGGOBZANCNFSM5TP7SIFQ>
.
You are receiving this because you commented.Message ID:
<rust-embedded/rust-raspberrypi-OS-tutorials/repo-discussions/155/comments/2610867
@github.com>
|
Beta Was this translation helpful? Give feedback.
-
@nihalpasham can you do me a favor and check what the speedup is with instruction caching alone? Would be a nice datapoint to have. |
Beta Was this translation helpful? Give feedback.
-
Disclaimer: - I'm assuming this topic can be discussed here. If not, please let me know and I will remove this topic.
Question: ran into an odd issue. I'm working on a secure bootloader that's written entirely in rust. Most of the boot code for the rpi4 is from this repo. I managed to get all the pieces working. However, I've run into a strange performance issue. The gist of it is
OpenSSL and the sha2 crate
running on a standard linux OS + raspberry pi 4 i.e. the hashing-speed for openssl is 121 MiB/s and sha2 is 82 MiB/s, which roughly translates to less than 3 seconds for a 30MB file.I'm hoping folks here who have more experience with a rpi can offer some insight into what's probably missing/wrong.
A link to the implementation. The boot code is present in
/boards/bootloaders/rpi4/src/boot.rs
serial output from an rpi4: as you can see from the logs below, computing a hash kernel and ramdisk takes an additional 80 secs (give or take).
Beta Was this translation helpful? Give feedback.
All reactions