Can I turn on the “per thread stream” about libcudf ? [QST] #5596

chenrui17 · 2020-06-28T07:19:36Z

I found that cudf default bulid option about multi-strema is "default stream", so I rebuild the libcudf like this way to support multi stream about operation on gpu :
cmake .. -DCMAKE_INSTALL_PREFIX=/opt/rapids -DCMAKE_CXX11_ABI=ON -DPER_THREAD_DEFAULT_STREAM=ON
and I have 3 questions about stream:

when I use "per-thread-stream" , I found that read_parquet performance is better , speed up is up to 1.2X average. my question is Can I just simply modify the option "default stream" to "per-thread-stream" to support one cpu thread to one stream ? and is this way sure to work and improve performance ?
when I use "per-thread-stream", I found that the performance of "read_parquet + groupby_aggregate" is not as expected .the performace is getting worse compared to read_parquet only . and at the same time, there is a 0.7% default stream ,I don't know why .the nsight profile is :
when I use "per-thread-stream", I found that the overlap about kernel is not bad, but the overlap of HostToDevice and the DeviceToHost is not as expected ,like this :

please give me some advice

chenrui17 · 2020-06-28T07:27:22Z

this is qdrep files about per-thread-stream and legacy-default-stream.
and my cpu thread num is 12, gpu thread num is 3, and I use semaphore to control the cpu thread to require device 。

nsight-profile.zip

jrhemstad · 2020-06-28T19:12:46Z

PTDS is largely untested and should be considered largely experimental.

You will find answers to a few of your questions about per-thread default stream here: https://developer.nvidia.com/blog/gpu-pro-tip-cuda-7-streams-simplify-concurrency/

harrism · 2020-07-20T01:38:02Z

PTDS support is in progress. It should work successfully for cuDF now in 0.15 (current development branch), as long as you use the RMM default memory resource, cnmem_memory_resource, or pool_memory_resource. You will probably get better overlap with pool_memory_resource since it synchronizes the device less.

kkraus14 · 2020-08-27T18:57:30Z

Closing as this was implemented in #4995

chenrui17 added Needs Triage Need team to review and classify question Further information is requested labels Jun 28, 2020

kkraus14 added CMake CMake build issue libcudf Affects libcudf (C++/CUDA) code. and removed Needs Triage Need team to review and classify labels Jun 30, 2020

kkraus14 closed this as completed Aug 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can I turn on the “per thread stream” about libcudf ? [QST] #5596

Can I turn on the “per thread stream” about libcudf ? [QST] #5596

chenrui17 commented Jun 28, 2020 •

edited

Loading

chenrui17 commented Jun 28, 2020

jrhemstad commented Jun 28, 2020

harrism commented Jul 20, 2020 •

edited

Loading

kkraus14 commented Aug 27, 2020

Can I turn on the “per thread stream” about libcudf ? [QST] #5596

Can I turn on the “per thread stream” about libcudf ? [QST] #5596

Comments

chenrui17 commented Jun 28, 2020 • edited Loading

chenrui17 commented Jun 28, 2020

jrhemstad commented Jun 28, 2020

harrism commented Jul 20, 2020 • edited Loading

kkraus14 commented Aug 27, 2020

chenrui17 commented Jun 28, 2020 •

edited

Loading

harrism commented Jul 20, 2020 •

edited

Loading