You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I found that cudf default bulid option about multi-strema is "default stream", so I rebuild the libcudf like this way to support multi stream about operation on gpu :
cmake .. -DCMAKE_INSTALL_PREFIX=/opt/rapids -DCMAKE_CXX11_ABI=ON -DPER_THREAD_DEFAULT_STREAM=ON
and I have 3 questions about stream:
when I use "per-thread-stream" , I found that read_parquet performance is better , speed up is up to 1.2X average. my question is Can I just simply modify the option "default stream" to "per-thread-stream" to support one cpu thread to one stream ? and is this way sure to work and improve performance ?
when I use "per-thread-stream", I found that the performance of "read_parquet + groupby_aggregate" is not as expected .the performace is getting worse compared to read_parquet only . and at the same time, there is a 0.7% default stream ,I don't know why .the nsight profile is :
when I use "per-thread-stream", I found that the overlap about kernel is not bad, but the overlap of HostToDevice and the DeviceToHost is not as expected ,like this :
please give me some advice
The text was updated successfully, but these errors were encountered:
this is qdrep files about per-thread-stream and legacy-default-stream.
and my cpu thread num is 12, gpu thread num is 3, and I use semaphore to control the cpu thread to require device 。
PTDS support is in progress. It should work successfully for cuDF now in 0.15 (current development branch), as long as you use the RMM default memory resource, cnmem_memory_resource, or pool_memory_resource. You will probably get better overlap with pool_memory_resource since it synchronizes the device less.
I found that cudf default bulid option about multi-strema is "default stream", so I rebuild the libcudf like this way to support multi stream about operation on gpu :
cmake .. -DCMAKE_INSTALL_PREFIX=/opt/rapids -DCMAKE_CXX11_ABI=ON -DPER_THREAD_DEFAULT_STREAM=ON
and I have 3 questions about stream:
please give me some advice
The text was updated successfully, but these errors were encountered: