early results with the new DMA peripheral #603

biosbob · 2023-04-25T16:56:51Z

biosbob
Apr 25, 2023
Collaborator

in my ideal SoC, there is no more than 32K IMEM and 32K of DMEM -- both implemented as SRAM that is as "tightly-coupled" to the CPU as possible (to support a true harvard architecture).... more important, i have no need for an icache/dcache!!!!

at reset, my FLASH-based bootloader simply copies the "main" program into IMEM followed by a jump to its entry-point.... and while many of the applications of interest to me can comfortably fit within these 32K SRAM banks, there are times when (at runtime) additional code or data must be "swapped in" from FLASH

while it is entirely possible (through XIP) to directly access code or constants held in FLASH, it is generally more efficient to copy this information into IMEM/DMEM.... as such, i'm using these tightly-coupled IMEM/DMEM blocks as an "application-specific" cache whose contents i completely control....

equally important (though hard to measure on an FPGA) is the impact this design has on power: the FLASH can be powered off when we're executing out of IMEM; and using IMEM/DMEM as tightly-coupled memories enables more efficient execution (which means i can put myself to sleep sooner).... not having the [id]cache also reduces the silicon footprint, which itself reduces power....

in those situations where a block of code/data needs to be "swapped-in" at runtime, we obviously want this to be as efficient as possible.... it's here that the DMA shows its advantage over the CPU simply copying words in a loop....

as an experiment, assume we need to copy (say) 5K bytes of code/data from FLASH to SRAM.... using the DMA, this is effectively 2X faster than using the CPU.... the absolute time is ultimately influenced by the clock-speed as well as whether the FLASH implementation supports QSPI; but still, DMA is better than CPU....

right now, my DMA copy is polling for completion -- which means there is likely some bus contention, especially when i'm copying into the same IMEM from which i'm executing!!!! i'm hoping to see further improvement by awaiting a DMA interrupt -- not only by increasing the 2X speedup, but also in power savings by idling the CPU while waiting....

all good, so far 👍

biosbob · 2023-04-25T17:42:41Z

biosbob
Apr 25, 2023
Collaborator Author

quick update -- using interrupts instead of polling increased 2X to (2.11)X.... while the speed increase is always appreciated, it's the potential power savings that are more important to me....

0 replies

stnolting · 2023-04-25T17:59:32Z

stnolting
Apr 25, 2023
Maintainer

Hey @biosbob!

This is really an interesting research you're doing!

at reset, my FLASH-based bootloader simply copies the "main" program into IMEM followed by a jump to its entry-point

So you use a custom "bootloader", right?

as such, i'm using these tightly-coupled IMEM/DMEM blocks as an "application-specific" cache whose contents i completely control....

Those memory slots for executable are basically software-managed "caches", correct? Do you have some kind of runtime environment or RTOS on the software side that is responsible for this SW cache handling?

equally important (though hard to measure on an FPGA) is the impact this design has on power

The nice thing about the icebreaker / the Lattice ice40 FPGA is that it is optimized for low power. You could track the power consumption by measuring the current on the FPGA core supply line.

Furthermore, the large memory blocks of the FPGA (32kB each) provide several power-down modes that could be used to further shrink energy consumption while transferring new data from the external flash.

right now, my DMA copy is polling for completion -- which means there is likely some bus contention, especially when i'm copying into the same IMEM from which i'm executing!!!! i'm hoping to see further improvement by awaiting a DMA interrupt -- not only by increasing the 2X speedup, but also in power savings by idling the CPU while waiting....

Putting the CPU into power down mode using the wfi instruction should have a significant impact on power consumption! When in sleep mode, the CPU's switching activity is zero. Furthermore, the DMA can operate at maximum efficiency (needing less cycles to complete; further reducing power consumption) as there is no bus contention at all due to the CPU being in sleep mode.

Again, this is very interesting! It would be great if you could keep us (me) updated 😉

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

early results with the new DMA peripheral #603

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

early results with the new DMA peripheral #603

biosbob Apr 25, 2023 Collaborator

Replies: 2 comments

biosbob Apr 25, 2023 Collaborator Author

stnolting Apr 25, 2023 Maintainer

biosbob
Apr 25, 2023
Collaborator

biosbob
Apr 25, 2023
Collaborator Author

stnolting
Apr 25, 2023
Maintainer