memory problems with CUDA-based rings #138

jaycedowell · 2020-04-01T21:00:00Z

A couple of times now I have run into problems passing data between blocks using CUDA-based rings. If I don't force a bifrost.device.synchronize_stream() within the reserve context for the ring, I end up with inconsistent results reading from the ring in another block. I think what is happening is that the ring doesn't know about the asynchronous copies and happily marks the reserved segment as good to go when then reserve is released. Is there a better way to deal with this than sprinkling synchronize_stream() calls around?

The text was updated successfully, but these errors were encountered:

benbarsdell · 2020-04-01T21:48:31Z

Bifrost asynchronicity is based around CPU threads each having their own CUDA stream. All GPU work in a CPU thread must be synchronous with respect to that thread, so it must be followed by a stream synchronize before things are released to other threads. (Using async CUDA APIs and then synchronizing on a per-CPU-thread stream ensures that GPU work is synchronous within the CPU thread but asynchronous between threads).

E.g., the pipeline infrastructure does this for all blocks here:
https://github.com/ledatelescope/bifrost/blob/8a059b3/python/bifrost/pipeline.py#L462

jaycedowell · 2020-04-01T21:51:08Z

Ok, thanks.

Dummysource replaces the ethernet input for throughput testing, and is enabled with the commandline switch --fakesource Add xGPU averaging and subselection. The former has been "tested" in that it outputs appropriate data when the pipeline is fed with the all ones. With all threads active, the pipeline runs at ~40Gb/s on my old Xeon machine, seemingly processing limited by my RTX 2060 GPU. NB: Probably some syncronization barriers are needed, certainly on the block which copies data to the GPU. See ledatelescope/bifrost#138

When blocks blocks write to a ring across the CPU/GPU boundary this copy is [I think] asynchronous, and needs to be synchronized before marking the destination buffer as ready for consumption by downstream consumers. See ledatelescope/bifrost#138

jaycedowell mentioned this issue Apr 1, 2020

Add synchronize call between MOFF and Save epic-astronomy/LWA_EPIC#4

Merged

jaycedowell added bug question and removed bug labels Nov 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memory problems with CUDA-based rings #138

memory problems with CUDA-based rings #138

jaycedowell commented Apr 1, 2020

benbarsdell commented Apr 1, 2020

jaycedowell commented Apr 1, 2020

memory problems with CUDA-based rings #138

memory problems with CUDA-based rings #138

Comments

jaycedowell commented Apr 1, 2020

benbarsdell commented Apr 1, 2020

jaycedowell commented Apr 1, 2020