early results with the new DMA peripheral #603
Replies: 2 comments
-
quick update -- using interrupts instead of polling increased 2X to (2.11)X.... while the speed increase is always appreciated, it's the potential power savings that are more important to me.... |
Beta Was this translation helpful? Give feedback.
-
Hey @biosbob! This is really an interesting research you're doing!
So you use a custom "bootloader", right?
Those memory slots for executable are basically software-managed "caches", correct? Do you have some kind of runtime environment or RTOS on the software side that is responsible for this SW cache handling?
The nice thing about the icebreaker / the Lattice ice40 FPGA is that it is optimized for low power. You could track the power consumption by measuring the current on the FPGA core supply line. Furthermore, the large memory blocks of the FPGA (32kB each) provide several power-down modes that could be used to further shrink energy consumption while transferring new data from the external flash.
Putting the CPU into power down mode using the Again, this is very interesting! It would be great if you could keep us (me) updated 😉 |
Beta Was this translation helpful? Give feedback.
-
in my ideal SoC, there is no more than 32K IMEM and 32K of DMEM -- both implemented as SRAM that is as "tightly-coupled" to the CPU as possible (to support a true harvard architecture).... more important, i have no need for an icache/dcache!!!!
at reset, my FLASH-based bootloader simply copies the "main" program into IMEM followed by a jump to its entry-point.... and while many of the applications of interest to me can comfortably fit within these 32K SRAM banks, there are times when (at runtime) additional code or data must be "swapped in" from FLASH
while it is entirely possible (through
XIP
) to directly access code or constants held in FLASH, it is generally more efficient to copy this information into IMEM/DMEM.... as such, i'm using these tightly-coupled IMEM/DMEM blocks as an "application-specific" cache whose contents i completely control....equally important (though hard to measure on an FPGA) is the impact this design has on power: the FLASH can be powered off when we're executing out of IMEM; and using IMEM/DMEM as tightly-coupled memories enables more efficient execution (which means i can put myself to sleep sooner).... not having the [id]cache also reduces the silicon footprint, which itself reduces power....
in those situations where a block of code/data needs to be "swapped-in" at runtime, we obviously want this to be as efficient as possible.... it's here that the DMA shows its advantage over the CPU simply copying words in a loop....
as an experiment, assume we need to copy (say) 5K bytes of code/data from FLASH to SRAM.... using the DMA, this is effectively 2X faster than using the CPU.... the absolute time is ultimately influenced by the clock-speed as well as whether the FLASH implementation supports QSPI; but still, DMA is better than CPU....
right now, my DMA copy is polling for completion -- which means there is likely some bus contention, especially when i'm copying into the same IMEM from which i'm executing!!!! i'm hoping to see further improvement by awaiting a DMA interrupt -- not only by increasing the 2X speedup, but also in power savings by idling the CPU while waiting....
all good, so far 👍
Beta Was this translation helpful? Give feedback.
All reactions