Additional unit testing for run-to-run bitwise reproducibility #435

umfranzw · 2024-07-22T20:21:44Z

This change requires rocPRIM PR #575
I've rebased it on top of rocThrust PR #431.
You can check out this commit db784c4 for a more specific diff.

Currently, we have tests that verify that the deterministic algorithms
for scan and reduce (available via the det and par__det_nosync
execution policies) produce the same results when run twice in a row
within the same test executable.

This change adds new bitwise reproducibility testing that is performed
across multiple test executable runs.

To do this, the test executable must be run twice. On the first run,
we hash all the inputs and outputs to/from the deterministic function
calls, and store those hashes in an SQLite database. We also insert
information about the ROCm version, rocThrust version, and GPU architecture,
into the database, since the results the deterministic algorithms produce
are allowed to vary if any those factors change.

On the second run, we hash the inputs/output again and check to see
if a row matching them (plus the ROCm version, rocThrust version etc.) exists
in the database.

This change:

Adds SQLite as a dependency in the CMake files. SQLite code is public domain.
Adds a library called CRCpp as a dependency in the CMake files. This library
provides a checksum algorithm that we can use for hashing. It uses a BSD 3-clause
license.
Add the database-based testing to the existing reproducibility tests in
test/test_reproducibility.cpp
Adds classes for database creation/manipulation and building hashes in
tests/bitwise_repro/

Because the run-to-run testing accesses a disk-based database file, it's pretty
slow, so I've disabled it by default. To turn it on, you must define an
environment variable called ROCTHRUST_BWR_PATH and set it to the path to
the SQLite database file to use for the testing.

If the database does not exist, to generate it, you must also define an
environment variable called ROCTHRUST_BWR_GENERATE and set it to 1.
This creates the database file given by ROCTHRUST_BWR_PATH and inserts
data into it as the run-to-run reproducibility tests are executed (the tests
will all pass in this case).

If you want to take an existing database and update it for a new
ROCm version/rocThrust version/GPU architecture, you can set
ROCTHRUST_BWR_GENERATE=1 and point ROCTHRUST_BWR_PATH to
the existing database file.

Snektron

This seems way too much over engineered to me

Why do we need to store the hashes for all ROCm versions if the hash is allowed to change between versions? Can this not be a simple file thats committed to the repository and regenerated when the implementation is changed? This way any changes can be reviewed and we don't need this over engineered database.
Why do we need an entire dependency for computing a simple crc? a custom crc32 implementation is like 20 lines https://rosettacode.org/wiki/CRC-32

umfranzw · 2024-07-25T16:05:22Z

Thanks for the feedback @Snektron.

Why do we need to store the hashes for all ROCm versions if the hash is allowed to change between versions? Can this not be a simple file that's committed to the repository and regenerated when the implementation is changed? This way any changes can be reviewed and we don't need this over engineered database.

As I understand it, the requirement is that the deterministic algorithms must produce the same result for every unique combination of (ROCm version, rocThrust version, GPU architecture). So suppose that we went with the approach of storing the hashes in a simple file that's committed to the repository, and updated that file on every rocThrust version change. If I'm understanding the problem and your suggestion correctly, then I think that the file would need to contain result hashes for all possible versions of ROCm that the current version of rocThrust can be run on, for all possible GPU architectures that are supported. This is basically the same as what the SQLite database file is currently. Storing the info in a database saves the trouble of writing code to locate the hash for a given call signature in the file and compare it with the current one (though I guess we could use separate simple files for each combination, name them in a way that makes lookup easy, and store the hashes in a consistent order). Feel free to let me know if I'm misinterpreting your suggestion - I'm not particularly attached to the database approach, and I'm totally fine with switching to something else if it's simpler.

Why do we need an entire dependency for computing a simple crc? a custom crc32 implementation is like 20 lines https://rosettacode.org/wiki/CRC-32

That's fair. I actually tied to write a version myself initially, but it was too slow without the precomputed table optimization that's used in the link you posted. I'll have to check if we have the license permission to use the Rosetta code version directly - if not, I'll educate myself on that optimization, and see if I can roll our own implementation.

Snektron · 2024-07-25T21:45:22Z

As I understand it, the requirement is that the deterministic algorithms must produce the same result for every unique combination of (ROCm version, rocThrust version, GPU architecture). So suppose that we went with the approach of storing the hashes in a simple file that's committed to the repository, and updated that file on every rocThrust version change. If I'm understanding the problem and your suggestion correctly, then I think that the file would need to contain result hashes for all possible versions of ROCm that the current version of rocThrust can be run on, for all possible GPU architectures that are supported.

The idea was that because the file with the hashes are committed into the repository, they are automatically tied to a particular version of rocThrust. By extension, a particular version of ROCm contains a particular version of rocThrust. Similarly, a particular version of rocThrust is tied to a particular version of ROCm and by extension a parituclar version of clang that it contains. In theory its all one big package: when for example clang is updated and reorders the operations (its allowed to do this when --fast-math is enabled), that change only makes it over to the customer when a new version of ROCm is released, at which point we have already updated the hashes: they would always be those for the latest version of the compiler, for the latest version of ROCm, and for the latest version of rocThrust and rocPRIM. I guess that does depend on a relatively tight integration loop between releases of the compiler and updating rocPRIM and rocThrust.

Are you expecting that people mix and match ROCm/ROCm clang and rocThrust versions?

umfranzw · 2024-07-26T21:42:17Z

I guess I had assumed that people would be able to mix and match ROCm and rocThrust versions - for example, if they install rocThrust or ROCm via the Github repos, they are free to checkout whatever release tags they want, provided they can build and run together. I'm not sure how likely this is for the customers who've requested the bitwise reproducibility testing, but I thought it might be nice to cover the case.

Aside from the mix-and-match issue, there are a couple other less-significant factors I just wanted to mention as well - feel free to let me know what you think about them. Both rocFFT and rocSPARSE have implemented similar bitwise reproducibility testing in the past using databases, so I checked in with some of the devs on those projects. The reasons they gave for choosing to use a database mainly included:

The ability to query and compare results without having to do any text parsing.
The database stores a history that allows you to see how environmental changes affect reproducibility over time.
The bitwise repro tests are pretty slow. If a CI test machine has multiple GPUs installed, then multiple tests can be run in parallel, and they can read/write to the same database file simultaneously.

Snektron · 2024-07-27T23:02:16Z

Aside from the mix-and-match issue, there are a couple other less-significant factors I just wanted to mention as well - feel free to let me know what you think about them. Both rocFFT and rocSPARSE have implemented similar bitwise reproducibility testing in the past using databases, so I checked in with some of the devs on those projects. The reasons they gave for choosing to use a database mainly included:

If there is prior art on this topic I'm not as much against it.

* The ability to query and compare results without having to do any text parsing.
* The database stores a history that allows you to see how environmental changes affect reproducibility over time.

This is true, and you also don't need to run a bunch of git commands. I guess the downside is that one needs additional CI tooling for this to be visible. Out of curiosity, do rocFFT and rocSPARSE have tooling available for this?

umfranzw · 2024-07-29T20:09:30Z

This is true, and you also don't need to run a bunch of git commands. I guess the downside is that one needs additional CI tooling for this to be visible. Out of curiosity, do rocFFT and rocSPARSE have tooling available for this?

I don't believe there are currently any existing CI tools that have been built specifically for looking at the history of bitwise reproducibilty - I got the sense that the devs' idea was that, if such a history needed to be produced, it could be done pretty easily with some SQL queries (or a Python script that runs them).

umfranzw · 2024-07-29T20:30:24Z

I've removed the CRCpp library and replaced it with a custom implementation.

umfranzw · 2024-08-02T17:42:30Z

Added documentation to describe how to invoke the new tests.

Currently, we have tests that verify that the deterministic algorithms for scan and reduce (available via the det and par__det_nosync execution policies) produce the same results when run twice in a row within the same test executable. This change adds new bitwise reproducibility testing that is performed across multiple test executable runs. To do this, the test executable must be run twice. On the first run, we hash all the inputs and outputs to/from the deterministic function calls, and store those hashes in an SQLite database. We also insert information about the ROCm version, rocThrust version, and GPU architecture, into the database, since the results the deterministic algorithms produce are allowed to vary if any those factors change. On the second run, we hash the inputs/output again and check to see if a row matching them (plus the ROCm version, rocThrust version etc.) exists in the database. This change: - Adds SQLite as a dependency in the CMake files. SQLite code is public domain. - Add the database-based testing to the existing reproducibility tests in test/test_reproducibility.cpp - Adds classes for database creation/manipulation and building hashes in tests/bitwise_repro/ - Adds documentation that describes how to invoke the new tests Because the run-to-run testing accesses a disk-based database file, it's pretty slow, so I've disabled it by default. To turn it on, you must define an environment variable called ROCTHRUST_BWR_PATH and set it to the path to the SQLite database file to use for the testing. If the database does not exist, to generate it, you must also define an environment variable called ROCTHRUST_BWR_GENERATE and set it to 1. This creates the database file given by ROCTHRUST_BWR_PATH and inserts data into it as the run-to-run reproducibility tests are executed (the tests will all pass in this case). If you want to take an existing database and update it for a new ROCm version/rocThrust version/GPU architecture, you can set ROCTHRUST_BWR_GENERATE=1 and point ROCTHRUST_BWR_PATH to the existing database file.

umfranzw · 2024-08-06T15:55:05Z

Rebased to resolve conflicts and refactored a bit to remove compiler warnings when building other tests.

stanleytsang-amd

CI failures seem unrelated to the PR.

@Snektron

Hi @Snektron, apologies for not requesting a re-review earlier. I need to merge this change today, and I'll need to dismiss this review in order to do that. If you have any further concerns about this change, please don't hesitate to reach out and we can chat.

umfranzw requested review from a team, stanleytsang-amd, RobsonRLemos and lawruble13 as code owners July 22, 2024 20:21

umfranzw requested a review from NguyenNhuDi July 22, 2024 20:22

Snektron assigned Snektron and unassigned Snektron Jul 23, 2024

Snektron self-requested a review July 23, 2024 19:42

Snektron previously requested changes Jul 23, 2024

View reviewed changes

umfranzw force-pushed the test_bitwise_repro_rebased branch 2 times, most recently from 0e5d030 to 99d1b99 Compare July 29, 2024 20:30

umfranzw force-pushed the test_bitwise_repro_rebased branch from 99d1b99 to fa03072 Compare August 2, 2024 17:42

umfranzw force-pushed the test_bitwise_repro_rebased branch from fa03072 to 4fc0c44 Compare August 6, 2024 15:54

stanleytsang-amd approved these changes Aug 9, 2024

View reviewed changes

umfranzw merged commit 5883c6a into ROCm:develop Aug 9, 2024
8 of 17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Additional unit testing for run-to-run bitwise reproducibility #435

Additional unit testing for run-to-run bitwise reproducibility #435

umfranzw commented Jul 22, 2024

Snektron left a comment

umfranzw commented Jul 25, 2024 •

edited

Loading

Snektron commented Jul 25, 2024

umfranzw commented Jul 26, 2024 •

edited

Loading

Snektron commented Jul 27, 2024

umfranzw commented Jul 29, 2024

umfranzw commented Jul 29, 2024

umfranzw commented Aug 2, 2024

umfranzw commented Aug 6, 2024

stanleytsang-amd left a comment

Additional unit testing for run-to-run bitwise reproducibility #435

Additional unit testing for run-to-run bitwise reproducibility #435

Conversation

umfranzw commented Jul 22, 2024

Snektron left a comment

Choose a reason for hiding this comment

umfranzw commented Jul 25, 2024 • edited Loading

Snektron commented Jul 25, 2024

umfranzw commented Jul 26, 2024 • edited Loading

Snektron commented Jul 27, 2024

umfranzw commented Jul 29, 2024

umfranzw commented Jul 29, 2024

umfranzw commented Aug 2, 2024

umfranzw commented Aug 6, 2024

stanleytsang-amd left a comment

Choose a reason for hiding this comment

umfranzw commented Jul 25, 2024 •

edited

Loading

umfranzw commented Jul 26, 2024 •

edited

Loading