-
Notifications
You must be signed in to change notification settings - Fork 200
Sébastien Bourdeauducq edited this page Mar 16, 2017
·
9 revisions
ARTIQ Direct Memory Access (DMA) #553
# obtain a handle to a named DMA sequence
my_burst = DMA("my_burst")
# record events into it
with my_burst:
delay(10*ns)
ttl0.pulse(20*ns)
for i in range(100):
dds2.pulse(300*MHz + i*1*MHz, 220*ns)
# timeline is unaltered and rewound to before the `with`
# potentially in a new experiment, new kernel:
# retrieve a reference to a previously recorded DMA sequence
my_pulse = DMA("my_burst")
t = now_mu()
for i in range(100)
ttl2.pulse(3*us)
# trigger one playback of the sequence,
# and wait until the DMA engine has finished
my_pulse.play()
# timeline advanced by length of my_pulse
assert t + seconds_to_mu(100*(3*us + 250*ns)) == now_mu()
# release the DMA sequence
my_pulse.free()
- DMA sequences persist across kernels/experiments. Otherwise the traffic and CPU time for recording would reduce the benefit.
- There are two use cases: generating the RTIO event sequences at compile time or at runtime. Runtime seems more powerful and generic, especially if sequences persist.
- DMA takes 3 system (CPU) clock cycles per RTIO event.
- Large DMA sequences are mostly stalled by FIFO depth and won't be limited by DMA. In other cases they will compete for memory bandwidth and slow down the CPU.
- An event in a DMA sequence should be serialized as
[length, channel_number, timestamp, address, data]
. A DMA sequence is a concatenation of events. - Timestamps in a DMA sequence are relative to the beginning of the sequence, but purely additive: no scaling of timestamps.
- The first timestamp is whatever delay the first event has w.r.t. to the logical start of the sequence.
- RTIO events in DMA are variable length with up to 512 bit data.
- DMA sequences can be stored sequentially in DRAM. Old sequences are marked as unused explicitly by the experiment (
DMA().free()
). Leaking DMA sequences is the user's fault.
- There should be multiple (4?) DMA engines. Otherwise only one DMA sequence can be replayed at any given time.
- With multiple engines DMA sequences can be "composed" and would interleave automatically through arbitration at the RTIO interface.
- It might be good to support DMA sequences triggering replay of other DMA sequences (separate/same engine(s)?).
- There needs to be arbitration between the kernel accessing RTIO channels and the DMA engine doing the same.
- The arbitration does not need to be granular. The claim can be on the "entire" RTIO API.
- The memory arbiter is round-robin. There will be a slow-down due to sharing of the bandwidth and DRAM dynamics but no (inherent) starvation. The slow-down seems irrelevant since the DRAM can easily outpace the CPU.
- If multiple engines or the kernel and a DMA engine, or DDMA and DRTIO upstream access the same channel, they risk generating out-of-sequence events.
- This design will also need to foresee distributed DMA. Then concurrency becomes yet a bit trickier. The arbitration would be at the remote end and would need to handle the FIFO status traffic and "FIFO full" events.
- Input DMA segments need to know which channels to collect.
- Input DMA segments have pre-allocated memory. Their max length is specified at creation time.
- They poll the channels round-robin. Polling rate should be maximized but it seems ok to poll one channel per cycle. This reduces the maximum event rate for a single channel in a larger collection group. Users can allocate/split channels to multiple engines/simultaneous DMA sequences.
- DMA data should be distributed: the DRTIO links are slower than local fabric and DRAM bandwidth.
- The (DRTIO) gateware should determine where to intercept DRTIO events in DMA record mode and which DMA sequence to record into.
- The gateware should trigger (broadcast) the execution of a DMA sequence.
- Output DDMA for remote channels should be supported (i.e. the DMA engine on Metlino and the channel on Kasli).
- Input DDMA for remote channels should not be supported. The polling traffic would kill the advantage.
- TODO: interaction with the analyzer
-
DMA().play()
should not be blocking - there should be a
DMA().wait()
method to wait for DMA completion - Re-recording an in-use output DMA sequence is forbidden.
- Input DMA API
- non-blocking DMA
- distributed DMA
- multiple engines
- complex arbitration (multiple DMA engines, incoming DRTIO, and CPU accessing simultaneously)