Skip to content

Commit

Permalink
Merge pull request #4995 from rongou/per-thread-default-stream
Browse files Browse the repository at this point in the history
[REVIEW] Add CMake option for per-thread default stream
  • Loading branch information
kkraus14 authored May 6, 2020
2 parents 31a595a + 0a7f79c commit f610f14
Show file tree
Hide file tree
Showing 5 changed files with 50 additions and 0 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@

## Improvements

- PR #4995 Add CMake option for per-thread default stream
- PR #5033 Fix Numba deprecations warnings with Numba 0.49+
- PR #4950 Fix import errors with Numba 0.49+
- PR #4825 Update the iloc exp in dataframe.py
Expand Down
10 changes: 10 additions & 0 deletions cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -275,6 +275,16 @@ if(JITIFY_USE_CACHE)
add_definitions("-DJITIFY_USE_CACHE -DCUDF_VERSION=${CMAKE_PROJECT_VERSION}")
endif(JITIFY_USE_CACHE)

###################################################################################################
# - per-thread default stream option --------------------------------------------------------------
# This needs to be defined first so tests and benchmarks can inherit it.

option(PER_THREAD_DEFAULT_STREAM "Build with per-thread default stream" OFF)
if(PER_THREAD_DEFAULT_STREAM)
message(STATUS "Using per-thread default stream")
add_compile_definitions(CUDA_API_PER_THREAD_DEFAULT_STREAM)
endif(PER_THREAD_DEFAULT_STREAM)

###################################################################################################
# - add gtest -------------------------------------------------------------------------------------

Expand Down
31 changes: 31 additions & 0 deletions java/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,3 +77,34 @@ You will get errors if you don't do it consistently. We tried to detect these u

If you have a compatible GPU on your build system the tests will use it. If not you will see a
lot of skipped tests.

## Per-thread Default Stream

The JNI code can be built with *per-thread default stream* (PTDS), which gives each host thread its
own default CUDA stream, and can potentially increase the overlap of data copying and compute
between different threads (see
[blog post](https://devblogs.nvidia.com/gpu-pro-tip-cuda-7-streams-simplify-concurrency/)).

Since the PTDS option is for each compilation unit, it should be done at the same time across the
whole codebase. To enable PTDS, first build RMM:
```shell script
conda activate cudf_dev
cd src/rmm/build
cmake .. -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX -DPER_THREAD_DEFAULT_STREAM=ON
make -j`nproc`
make install
```

then build cuDF:
```shell script
cd src/cudf/cpp/build
cmake .. -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX -DPER_THREAD_DEFAULT_STREAM=ON
make -j`nproc`
make install
```

and finally build the jar:
```shell script
cd src/cudf/java
mvn clean install -DPER_THREAD_DEFAULT_STREAM=ON
```
2 changes: 2 additions & 0 deletions java/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,7 @@
<skipNativeCopy>false</skipNativeCopy>
<cxx11.abi.value>ON</cxx11.abi.value>
<CUDA_STATIC_RUNTIME>OFF</CUDA_STATIC_RUNTIME>
<PER_THREAD_DEFAULT_STREAM>OFF</PER_THREAD_DEFAULT_STREAM>
</properties>

<profiles>
Expand Down Expand Up @@ -333,6 +334,7 @@
executable="cmake">
<arg value="${basedir}/src/main/native"/>
<arg value="-DCUDA_STATIC_RUNTIME=${CUDA_STATIC_RUNTIME}" />
<arg value="-DPER_THREAD_DEFAULT_STREAM=${PER_THREAD_DEFAULT_STREAM}" />
<arg value="-DCMAKE_CXX11_ABI=${cxx11.abi.value}"/>
</exec>
<exec dir="${native.build.path}"
Expand Down
6 changes: 6 additions & 0 deletions java/src/main/native/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -234,6 +234,12 @@ if(USE_NVTX)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DUSE_NVTX")
endif(USE_NVTX)

option(PER_THREAD_DEFAULT_STREAM "Build with per-thread default stream" OFF)
if(PER_THREAD_DEFAULT_STREAM)
message(STATUS "Using per-thread default stream")
add_compile_definitions(CUDA_API_PER_THREAD_DEFAULT_STREAM)
endif(PER_THREAD_DEFAULT_STREAM)

###################################################################################################
# - link libraries --------------------------------------------------------------------------------

Expand Down

0 comments on commit f610f14

Please sign in to comment.