DMA

ARTIQ Direct Memory Access (DMA) #553

API sketch

# obtain a handle to a named DMA sequence
my_burst = DMA("my_burst")
# record events into it
with my_burst:
    delay(10*ns)
    ttl0.pulse(20*ns)
    for i in range(100):
        dds2.pulse(300*MHz + i*1*MHz, 220*ns)
# timeline is unaltered and rewound to before the `with`

# potentially in a new experiment, new kernel:

# retrieve a reference to a previously recorded DMA sequence
my_pulse = DMA("my_burst")
t = now_mu()
for i in range(100)
    ttl2.pulse(3*us)
    # trigger one playback of the sequence,
    # and wait until the DMA engine has finished
    my_pulse.play()
    # timeline advanced by length of my_pulse
assert t + seconds_to_mu(100*(3*us + 250*ns)) == now_mu()

# release the DMA sequence
my_pulse.free()

Features

DMA sequences persist across kernels/experiments. Otherwise the traffic and CPU time for recording would reduce the benefit.
There are two use cases: generating the RTIO event sequences at compile time or at runtime. Runtime seems more powerful and generic, especially if sequences persist.
DMA takes 3 system (CPU) clock cycles per RTIO event.
Large DMA sequences are mostly stalled by FIFO depth and won't be limited by DMA. In other cases they will compete for memory bandwidth and slow down the CPU.

Format

An event in a DMA sequence should be serialized as [length, channel_number, timestamp, address, data]. A DMA sequence is a concatenation of events.
Timestamps in a DMA sequence are relative to the beginning of the sequence, but purely additive: no scaling of timestamps.
The first timestamp is whatever delay the first event has w.r.t. to the logical start of the sequence.
RTIO events in DMA are variable length with up to 512 bit data.
DMA sequences can be stored sequentially in DRAM. Old sequences are marked as unused explicitly by the experiment (DMA().free()). Leaking DMA sequences is the user's fault.

Multiple engines

There should be multiple (4?) DMA engines. Otherwise only one DMA sequence can be replayed at any given time.
With multiple engines DMA sequences can be "composed" and would interleave automatically through arbitration at the RTIO interface.
It might be good to support DMA sequences triggering replay of other DMA sequences (separate/same engine(s)?).

Arbitration

There needs to be arbitration between the kernel accessing RTIO channels and the DMA engine doing the same.
The arbitration does not need to be granular. The claim can be on the "entire" RTIO API.
The memory arbiter is round-robin. There will be a slow-down due to sharing of the bandwidth and DRAM dynamics but no (inherent) starvation. The slow-down seems irrelevant since the DRAM can easily outpace the CPU.
If multiple engines or the kernel and a DMA engine, or DDMA and DRTIO upstream access the same channel, they risk generating out-of-sequence events.
This design will also need to foresee distributed DMA. Then concurrency becomes yet a bit trickier. The arbitration would be at the remote end and would need to handle the FIFO status traffic and "FIFO full" events.

DMA for input events

Input DMA segments need to know which channels to collect.
Input DMA segments have pre-allocated memory. Their max length is specified at creation time.
They poll the channels round-robin. Polling rate should be maximized but it seems ok to poll one channel per cycle. This reduces the maximum event rate for a single channel in a larger collection group. Users can allocate/split channels to multiple engines/simultaneous DMA sequences.

Distributed DMA (DDMA)

DMA data should be distributed: the DRTIO links are slower than local fabric and DRAM bandwidth.
The (DRTIO) gateware should determine where to intercept DRTIO events in DMA record mode and which DMA sequence to record into.
The gateware should trigger (broadcast) the execution of a DMA sequence.
Output DDMA for remote channels should be supported (i.e. the DMA engine on Metlino and the channel on Kasli).
Input DDMA for remote channels should not be supported. The polling traffic would kill the advantage.
TODO: interaction with the analyzer

Non-blocking DMA

DMA().play() should not be blocking
there should be a DMA().wait() method to wait for DMA completion
Re-recording an in-use output DMA sequence is forbidden.

TODO (needs specification/funding)

Input DMA API
non-blocking DMA
distributed DMA
multiple engines
complex arbitration (multiple DMA engines, incoming DRTIO, and CPU accessing simultaneously)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DMA

ARTIQ Direct Memory Access (DMA) #553

API sketch

Features

Format

Multiple engines

Arbitration

DMA for input events

Distributed DMA (DDMA)

Non-blocking DMA

TODO (needs specification/funding)

Clone this wiki locally