Skip to content

Commit

Permalink
Merge pull request #1488 from pxLi/merge-23.10
Browse files Browse the repository at this point in the history
Merge branch-23.10 to main [skip ci]
  • Loading branch information
pxLi authored Oct 12, 2023
2 parents 73fcd5c + 62ab1a1 commit e5fb14e
Show file tree
Hide file tree
Showing 33 changed files with 1,099 additions and 75 deletions.
6 changes: 3 additions & 3 deletions .github/workflows/auto-merge.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,12 +18,12 @@ name: auto-merge HEAD to BASE
on:
pull_request_target:
branches:
- branch-23.08
- branch-23.10
types: [closed]

env:
HEAD: branch-23.08
BASE: branch-23.10
HEAD: branch-23.10
BASE: branch-23.12

jobs:
auto-merge:
Expand Down
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -47,3 +47,8 @@ target/

## VSCode IDE
.vscode

#Generated files
cufile.log
rmm_log.txt
sanitizer_for_pid_*.log
2 changes: 1 addition & 1 deletion .gitmodules
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
[submodule "thirdparty/cudf"]
path = thirdparty/cudf
url = https://github.com/rapidsai/cudf.git
branch = branch-23.08
branch = branch-23.10
61 changes: 48 additions & 13 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,18 +71,19 @@ settings. If an explicit reconfigure of libcudf is needed (e.g.: when changing c
The following build properties can be set on the Maven command-line (e.g.: `-DCPP_PARALLEL_LEVEL=4`)
to control aspects of the build:

|Property Name |Description | Default |
|------------------------------------|---------------------------------------|---------|
|`CPP_PARALLEL_LEVEL` |Parallelism of the C++ builds | 10 |
|`GPU_ARCHS` |CUDA architectures to target | RAPIDS |
|`CUDF_USE_PER_THREAD_DEFAULT_STREAM`|CUDA per-thread default stream | ON |
|`RMM_LOGGING_LEVEL` |RMM logging control | OFF |
|`USE_GDS` |Compile with GPU Direct Storage support| OFF |
|`BUILD_TESTS` |Compile tests | OFF |
|`BUILD_BENCHMARKS` |Compile benchmarks | OFF |
|`libcudf.build.configure` |Force libcudf build to configure | false |
|`libcudf.clean.skip` |Whether to skip cleaning libcudf build | true |
|`submodule.check.skip` |Whether to skip checking git submodules| false |
| Property Name | Description | Default |
|--------------------------------------|-----------------------------------------|---------|
| `CPP_PARALLEL_LEVEL` | Parallelism of the C++ builds | 10 |
| `GPU_ARCHS` | CUDA architectures to target | RAPIDS |
| `CUDF_USE_PER_THREAD_DEFAULT_STREAM` | CUDA per-thread default stream | ON |
| `RMM_LOGGING_LEVEL` | RMM logging control | OFF |
| `USE_GDS` | Compile with GPU Direct Storage support | OFF |
| `BUILD_TESTS` | Compile tests | OFF |
| `BUILD_BENCHMARKS` | Compile benchmarks | OFF |
| `BUILD_FAULTINJ` | Compile fault injection | ON |
| `libcudf.build.configure` | Force libcudf build to configure | false |
| `libcudf.clean.skip` | Whether to skip cleaning libcudf build | true |
| `submodule.check.skip` | Whether to skip checking git submodules | false |


### Local testing of cross-repo contributions cudf, spark-rapids-jni, and spark-rapids
Expand Down Expand Up @@ -148,7 +149,7 @@ $ ./build/build-in-docker install ...
```

Now cd to ~/repos/NVIDIA/spark-rapids and build with one of the options from
[spark-rapids instructions](https://github.com/NVIDIA/spark-rapids/blob/branch-23.08/CONTRIBUTING.md#building-from-source).
[spark-rapids instructions](https://github.com/NVIDIA/spark-rapids/blob/branch-23.10/CONTRIBUTING.md#building-from-source).

```bash
$ ./build/buildall
Expand Down Expand Up @@ -224,6 +225,40 @@ in errors finding libraries. The script `build/run-in-docker` was created to hel
situation. A test can be run directly using this script or the script can be run without any
arguments to get into an interactive shell inside the container.
```build/run-in-docker target/cmake-build/gtests/ROW_CONVERSION```

#### Testing with Compute Sanitizer
[Compute Sanitizer](https://docs.nvidia.com/compute-sanitizer/ComputeSanitizer/index.html) is a
functional correctness checking suite included in the CUDA toolkit. The RAPIDS Accelerator JNI
supports leveraging the Compute Sanitizer in memcheck mode in the unit tests to help catch any kernels
that may be doing something incorrectly. To run the unit tests with the Compute Sanitizer, append the
`-DUSE_SANITIZER=ON` to the build command. e.g.
```
> ./build/build-in-docker clean package -DUSE_SANITIZER=ON
```

The Compute Sanitizer will output its report into one or multiple log files named as
`sanitizer_for_pid_<pid number>.log` under the current workspace root path.

Please note not all the unit tests can run with Compute Sanitizer. For example, `RmmTest#testEventHandler`,
a problematic test, intentionally tries an illegal allocation because of a too big size as part of the
test, but Compute Sanitizer will still report the errors and fail the whole build process.
`UnsafeMemoryAccessorTest` is for host memory only, so there is no need to run it with
Compute Sanitizer either.

If you think your tests are not suitable for Compute Sanitizer, please add the JUnit5 tag (`@Tag("noSanitizer")`)
to the tests or the test class.
```
@Tag("noSanitizer")
class ExceptionCaseTest { ... }
# or for a single test
class NormalCaseTest {
@Tag("noSanitizer")
public void testOneErrorCase(){ ... }
}
```

### Benchmarks
Benchmarks exist for c++ benchmarks using NVBench and are in the `src/main/cpp/benchmarks` directory.
To build these benchmarks requires the `-DBUILD_BENCHMARKS` build option. Once built, the benchmarks
Expand Down
25 changes: 25 additions & 0 deletions build/sanitizer-java/bin/java
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
#!/bin/bash
#
# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# This special Java executable is specified to the "jvm" configuration of the
# the surefire plugin to intercept forking the processes for tests. Then
# the tests will run with the compute-sanitizer tool.
exec compute-sanitizer --tool memcheck \
--launch-timeout 600 \
--error-exitcode -2 \
--log-file "./sanitizer_for_pid_%p.log" \
java "$@"
76 changes: 76 additions & 0 deletions ci/Dockerfile.multi
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
#
# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

###
# JNI CI image for multi-platform build
#
# Arguments: CUDA_VERSION=11.8.0
#
###
ARG CUDA_VERSION=11.8.0
ARG OS_RELEASE=8
# multi-platform build with: docker buildx build --platform linux/arm64,linux/amd64 <ARGS> on either amd64 or arm64 host
# check available offcial arm-based docker images at https://hub.docker.com/r/nvidia/cuda/tags (OS/ARCH)
FROM --platform=$TARGETPLATFORM nvidia/cuda:$CUDA_VERSION-devel-rockylinux$OS_RELEASE
ARG TOOLSET_VERSION=11
### Install basic requirements
RUN dnf install -y scl-utils
RUN dnf install -y gcc-toolset-${TOOLSET_VERSION} python39
RUN dnf --enablerepo=powertools install -y zlib-devel maven tar wget patch ninja-build
# require git 2.18+ to keep consistent submodule operations
RUN dnf install -y git
## pre-create the CMAKE_INSTALL_PREFIX folder, set writable by any user for Jenkins
RUN mkdir /usr/local/rapids && mkdir /rapids && chmod 777 /usr/local/rapids && chmod 777 /rapids

# 3.22.3+: CUDA architecture 'native' support + flexible CMAKE_<LANG>_*_LAUNCHER for ccache
ARG CMAKE_VERSION=3.26.4
# default as arm64 release
ARG CMAKE_ARCH=aarch64
# aarch64 cmake for arm build
RUN cd /usr/local && wget --quiet https://github.com/Kitware/CMake/releases/download/v${CMAKE_VERSION}/cmake-${CMAKE_VERSION}-linux-${CMAKE_ARCH}.tar.gz && \
tar zxf cmake-${CMAKE_VERSION}-linux-${CMAKE_ARCH}.tar.gz && \
rm cmake-${CMAKE_VERSION}-linux-${CMAKE_ARCH}.tar.gz
ENV PATH /usr/local/cmake-${CMAKE_VERSION}-linux-${CMAKE_ARCH}/bin:$PATH

# ccache for interactive builds
ARG CCACHE_VERSION=4.6
RUN cd /tmp && wget --quiet https://github.com/ccache/ccache/releases/download/v${CCACHE_VERSION}/ccache-${CCACHE_VERSION}.tar.gz && \
tar zxf ccache-${CCACHE_VERSION}.tar.gz && \
rm ccache-${CCACHE_VERSION}.tar.gz && \
cd ccache-${CCACHE_VERSION} && \
mkdir build && \
cd build && \
scl enable gcc-toolset-${TOOLSET_VERSION} \
"cmake .. \
-DCMAKE_BUILD_TYPE=Release \
-DZSTD_FROM_INTERNET=ON \
-DREDIS_STORAGE_BACKEND=OFF && \
cmake --build . --parallel 4 --target install" && \
cd ../.. && \
rm -rf ccache-${CCACHE_VERSION}

## install a version of boost that is needed for arrow/parquet to work
RUN cd /usr/local && wget --quiet https://boostorg.jfrog.io/artifactory/main/release/1.79.0/source/boost_1_79_0.tar.gz && \
tar -xzf boost_1_79_0.tar.gz && \
rm boost_1_79_0.tar.gz && \
cd boost_1_79_0 && \
./bootstrap.sh --prefix=/usr/local && \
./b2 install --prefix=/usr/local --with-filesystem --with-system && \
cd /usr/local && \
rm -rf boost_1_79_0

# disable cuda container constraints to allow running w/ elder drivers on data-center GPUs
ENV NVIDIA_DISABLE_REQUIRE="true"
7 changes: 4 additions & 3 deletions ci/Jenkinsfile.premerge
Original file line number Diff line number Diff line change
Expand Up @@ -141,9 +141,10 @@ pipeline {
container('cpu') {
// check if pre-merge dockerfile modified
def dockerfileModified = sh(returnStdout: true,
script: 'BASE=$(git --no-pager log --oneline -1 | awk \'{ print $NF }\'); ' +
'git --no-pager diff --name-only HEAD $(git merge-base HEAD $BASE) ' +
"-- ${PREMERGE_DOCKERFILE} || true")
script: """BASE=\$(git --no-pager log --oneline -1 | awk \'{ print \$NF }\')
git --no-pager diff --name-only HEAD \$BASE -- ${PREMERGE_DOCKERFILE} || true""").trim()
echo "$dockerfileModified"

if (!dockerfileModified?.trim()) {
TEMP_IMAGE_BUILD = false
}
Expand Down
17 changes: 15 additions & 2 deletions ci/nightly-build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,22 @@ MVN="mvn -Dmaven.wagon.http.retryHandler.count=3 -B"
CUDA_VER=${CUDA_VER:-cuda`nvcc --version | sed -n 's/^.*release \([0-9]\+\)\..*$/\1/p'`}
PARALLEL_LEVEL=${PARALLEL_LEVEL:-4}
USE_GDS=${USE_GDS:-ON}
USE_SANITIZER=${USE_SANITIZER:-ON}
BUILD_FAULTINJ=${BUILD_FAULTINJ:-ON}
ARM64=${ARM64:-false}

profiles="source-javadoc"
if [ "${ARM64}" == "true" ]; then
profiles="${profiles},arm64"
USE_GDS="OFF"
USE_SANITIZER="OFF"
BUILD_FAULTINJ="OFF"
fi

${MVN} clean package ${MVN_MIRROR} \
-Psource-javadoc \
-P${profiles} \
-DCPP_PARALLEL_LEVEL=${PARALLEL_LEVEL} \
-Dlibcudf.build.configure=true \
-DUSE_GDS=${USE_GDS} -Dtest=*,!CuFileTest,!CudaFatalTest,!ColumnViewNonEmptyNullsTest \
-DBUILD_TESTS=ON -Dcuda.version=$CUDA_VER
-DBUILD_TESTS=ON -DBUILD_FAULTINJ=${BUILD_FAULTINJ} -Dcuda.version=$CUDA_VER \
-DUSE_SANITIZER=${USE_SANITIZER}
2 changes: 1 addition & 1 deletion ci/premerge-build.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/bin/bash
#
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
# Copyright (c) 2022-2023, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
3 changes: 2 additions & 1 deletion ci/submodule-sync.sh
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,8 @@ ${MVN} verify ${MVN_MIRROR} \
-DCPP_PARALLEL_LEVEL=${PARALLEL_LEVEL} \
-Dlibcudf.build.configure=true \
-DUSE_GDS=ON -Dtest=*,!CuFileTest,!CudaFatalTest,!ColumnViewNonEmptyNullsTest \
-DBUILD_TESTS=ON
-DBUILD_TESTS=ON \
-DUSE_SANITIZER=ON
verify_status=$?
set -e

Expand Down
62 changes: 57 additions & 5 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@

<groupId>com.nvidia</groupId>
<artifactId>spark-rapids-jni</artifactId>
<version>23.08.0</version>
<version>23.10.0</version>
<packaging>jar</packaging>
<name>RAPIDS Accelerator JNI for Apache Spark</name>
<description>
Expand Down Expand Up @@ -83,9 +83,11 @@
<USE_GDS>OFF</USE_GDS>
<BUILD_TESTS>OFF</BUILD_TESTS>
<BUILD_BENCHMARKS>OFF</BUILD_BENCHMARKS>
<BUILD_FAULTINJ>ON</BUILD_FAULTINJ>
<ai.rapids.cudf.nvtx.enabled>false</ai.rapids.cudf.nvtx.enabled>
<ai.rapids.refcount.debug>false</ai.rapids.refcount.debug>
<cuda.version>cuda11</cuda.version>
<jni.classifier>${cuda.version}</jni.classifier>
<cudf.path>${project.basedir}/thirdparty/cudf</cudf.path>
<hadoop.version>3.2.4</hadoop.version>
<junit.version>5.8.1</junit.version>
Expand Down Expand Up @@ -141,6 +143,12 @@
<version>${junit.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter-engine</artifactId>
<version>${junit.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter-params</artifactId>
Expand Down Expand Up @@ -199,14 +207,51 @@
</excludes>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</profile>
<profile>
<id>test-with-sanitizer</id>
<activation>
<property>
<name>USE_SANITIZER</name>
<value>ON</value>
</property>
</activation>
<build>
<plugins>
<plugin>
<artifactId>maven-surefire-plugin</artifactId>
<executions>
<execution>
<id>default-test</id>
<goals>
<goal>test</goal>
</goals>
<configuration>
<groups>!noSanitizer</groups>
<jvm>${project.basedir}/build/sanitizer-java/bin/java</jvm>
</configuration>
</execution>
<execution>
<!-- Some tests (e.g. error cases) are not suitable to run with sanitizer, so run them separately here -->
<id>sanitizer-excluded-cases-test</id>
<goals>
<goal>test</goal>
</goals>
<configuration>
<groups>noSanitizer</groups>
</configuration>
</execution>
<execution>
<id>non-empty-null-test</id>
<goals>
<goal>test</goal>
</goals>
<configuration>
<argLine>-da:ai.rapids.cudf.AssertEmptyNulls</argLine>
<test>ColumnViewNonEmptyNullsTest</test>
<jvm>${project.basedir}/build/sanitizer-java/bin/java</jvm>
</configuration>
</execution>
</executions>
Expand Down Expand Up @@ -250,7 +295,7 @@
</plugins>
</build>
</profile>
<profile>
<profile>
<id>test-cpp</id>
<activation>
<property>
Expand Down Expand Up @@ -289,6 +334,12 @@
</plugins>
</build>
</profile>
<profile>
<id>arm64</id>
<properties>
<jni.classifier>${cuda.version}-arm64</jni.classifier>
</properties>
</profile>
</profiles>

<build>
Expand Down Expand Up @@ -398,6 +449,7 @@
<arg value="-DUSE_GDS=${USE_GDS}"/>
<arg value="-DBUILD_TESTS=${BUILD_TESTS}"/>
<arg value="-DBUILD_BENCHMARKS=${BUILD_BENCHMARKS}"/>
<arg value="-DBUILD_FAULTINJ=${BUILD_FAULTINJ}"/>
</exec>
<exec dir="${native.build.path}"
failonerror="true"
Expand Down Expand Up @@ -450,7 +502,7 @@
<artifactId>maven-jar-plugin</artifactId>
<version>3.0.2</version>
<configuration>
<classifier>${cuda.version}</classifier>
<classifier>${jni.classifier}</classifier>
</configuration>
<executions>
<execution>
Expand Down
Loading

0 comments on commit e5fb14e

Please sign in to comment.