Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional unit testing for run-to-run bitwise reproducibility #435

Merged
merged 1 commit into from
Aug 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@ else()
endif()

# Thrust project
project(rocthrust LANGUAGES CXX)
# Note: C is required here for dependencies
project(rocthrust LANGUAGES CXX C)

#Adding CMAKE_PREFIX_PATH
list( APPEND CMAKE_PREFIX_PATH /opt/rocm/llvm /opt/rocm ${ROCM_PATH} )
Expand Down
46 changes: 46 additions & 0 deletions cmake/Dependencies.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,52 @@ if(BUILD_TEST)
)
find_package(GTest REQUIRED CONFIG PATHS ${GTEST_ROOT})
endif()

# SQlite (for run-to-run bitwise-reproducibility tests)
# Note: SQLite 3.36.0 enabled the backup API by default, which we need
# for cache serialization. We also want to use a static SQLite,
# and distro static libraries aren't typically built
# position-independent.
include( FetchContent )

if(DEFINED ENV{SQLITE_3_43_2_SRC_URL})
set(SQLITE_3_43_2_SRC_URL_INIT $ENV{SQLITE_3_43_2_SRC_URL})
else()
set(SQLITE_3_43_2_SRC_URL_INIT https://www.sqlite.org/2023/sqlite-amalgamation-3430200.zip)
endif()
set(SQLITE_3_43_2_SRC_URL ${SQLITE_3_43_2_SRC_URL_INIT} CACHE STRING "Location of SQLite source code")
set(SQLITE_SRC_3_43_2_SHA3_256 af02b88cc922e7506c6659737560c0756deee24e4e7741d4b315af341edd8b40 CACHE STRING "SHA3-256 hash of SQLite source code")

# embed SQLite
if(CMAKE_VERSION VERSION_GREATER_EQUAL 3.24)
# use extract timestamp for fetched files instead of timestamps in the archive
cmake_policy(SET CMP0135 NEW)
endif()

message("Downloading SQLite.")
FetchContent_Declare(sqlite_local
URL ${SQLITE_3_43_2_SRC_URL}
URL_HASH SHA3_256=${SQLITE_SRC_3_43_2_SHA3_256}
)
FetchContent_MakeAvailable(sqlite_local)

add_library(sqlite3 OBJECT ${sqlite_local_SOURCE_DIR}/sqlite3.c)
target_include_directories(sqlite3 PUBLIC ${sqlite_local_SOURCE_DIR})
set_target_properties( sqlite3 PROPERTIES
C_VISIBILITY_PRESET "hidden"
VISIBILITY_INLINES_HIDDEN ON
POSITION_INDEPENDENT_CODE ON
LINKER_LANGUAGE CXX
)

# We don't need extensions, and omitting them from SQLite removes the
# need for dlopen/dlclose from within rocThrust.
# We also don't need the shared cache, and omitting it yields some performance improvements.
target_compile_options(
sqlite3
PRIVATE -DSQLITE_OMIT_LOAD_EXTENSION
PRIVATE -DSQLITE_OMIT_SHARED_CACHE
)
endif()

# Benchmark dependencies
Expand Down
43 changes: 43 additions & 0 deletions docs/bitwise-repro.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,46 @@ The default device execution policy, ``thrust::device`` (``thrust::hip::par``) d
In particular, the above operations are only bitwise reprodicible for **associative scan and reduce operators**. Notably, this does not include the pseudo-associative floating point operators.

An alternative version of the above operations that *is* bitwise reproducible with non-associative operators may be selected by using the *deterministic parallel* execution policy, ``thrust::hip::par_det``. Note that this implies a performance overhead, required to ensure that the results are run-to-run reproducible. There is no automatic detection for operator and input type pairs for which the default execution policy, that is ``thrust::hip::par``, is already bitwise reproducible. It is advised to only use ``thrust::hip::par_det`` for non-associative operators. ``thrust::hip::par_det`` may also be used with any of the other rocThrust API functions which are already bitwise reprodicible. In this case the behavior is the same as ``thrust::hip::par``.

=====
Tests
=====
To run the bitwise reproduciblity tests, you'll need to build the reproducibility.hip target.
This target provides bitwise reproduciblity test coverage in two forms:

1. The first form runs tests by issuing multiple calls to the bitwise-reproducible versions of the algorithms mentioned in the section above using the deterministic parallel execution policy.
A special scan operator that inserts a random amount of delay into calculations is used to create variation in the internal timing of operations within the algorithm.
We then check to make sure the results for each call are the same. In this approach, calls are all issued within a single run of the test program.

2. The second form tests bitwise reproducibility across runs of the test program. On the initial run, information about the calls being made to the deterministic algorithms (all inputs and outputs)
is stored in a database file. On subsequent runs, when a deterministic algorithm is called, we look for an corresponding entry in the database (a call to the same algorithm with the same inputs that
produced the same output) and, if such an entry is found, the test succeeds. If no entry is found, the test fails.

Because the second form of the tests requires disk accesses, it can be very time consuming to run. For this reason, it is disabled by default. To enable it, define an environment variable called
``ROCTHRUST_BWR_PATH`` and set it to the path to the database file (or the path where you'd like it created if it doesn't already exist).

It is also necessary to distinguish between the initial run (in which information about calls is inserted into the database), and subsequent runs (in which the output of calls is compared
against existing entries in the database). You can use the ``ROCTHRUST_BWR_GENERATE`` environment variable to do this.
A value of:

* ``1`` indicates that this is the inital test run, and information about calls should be inserted into the database. In this mode, bitwise reproducibility tests will not fail.
* ``0`` (or if the variable is undefined) indicates that this is a subsequent run, and the results of calls should be compared to existing database entries. In this mode, no information is inserted into the database, and tests will fail if no matching database entry is found.

Note that bitwise reproduciblity is only guarenteed within a given combination of ROCm version, rocThrust version, and GPU architecture.
This means that if any of these factors changes, additional database entries need to be generated. To do this, you can run the tests with ``ROCTHRUST_GENERATE=1`` a second time and the database will append additional entries for the new environment.

For example, suppose we are running tests on gfx1030. On the first run, we use the environment variables like this to generate the database file:

``ROCTHRUST_BWR_PATH=/path/to/repro.db ROCTHRUST_BWR_GENERATE=1 reproducibility.hip``

As long as the ROCm version, rocThrust version, and GPU architecture remain the same, we can now run the tests using the database file like this:

``ROCTHRUST_BWR_PATH=/path/to/repro.db reproducibility.hip``

If one or more of the three factors changes - suppose we now want to run on gfx1100 - using the same database file, we must do another inital run with ``ROCTHRUST_BWR_GENERATE=1`` to append new entries to the database for the new environment:

``ROCTHRUST_BWR_PATH=/path/to/repro.db ROCTHRUST_BWR_GENERATE=1 reproducibility.hip``

After that we can test in the same manner as before:

``ROCTHRUST_BWR_PATH=/path/to/repro.db reproducibility.hip``
3 changes: 3 additions & 0 deletions test/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -54,11 +54,14 @@ function(add_rocthrust_test TEST)
target_include_directories(${TEST_TARGET} SYSTEM BEFORE
PUBLIC
$<BUILD_INTERFACE:${CMAKE_CURRENT_BINARY_DIR}>
${sqlite_local_SOURCE_DIR}
)
target_link_libraries(${TEST_TARGET}
PRIVATE
rocthrust
roc::rocprim_hip
PUBLIC
sqlite3
)
if (TARGET GTest::GTest)
target_link_libraries(${TEST_TARGET}
Expand Down
Loading