-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Additional unit testing for run-to-run bitwise reproducibility #435
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems way too much over engineered to me
- Why do we need to store the hashes for all ROCm versions if the hash is allowed to change between versions? Can this not be a simple file thats committed to the repository and regenerated when the implementation is changed? This way any changes can be reviewed and we don't need this over engineered database.
- Why do we need an entire dependency for computing a simple crc? a custom crc32 implementation is like 20 lines https://rosettacode.org/wiki/CRC-32
Thanks for the feedback @Snektron.
As I understand it, the requirement is that the deterministic algorithms must produce the same result for every unique combination of (ROCm version, rocThrust version, GPU architecture). So suppose that we went with the approach of storing the hashes in a simple file that's committed to the repository, and updated that file on every rocThrust version change. If I'm understanding the problem and your suggestion correctly, then I think that the file would need to contain result hashes for all possible versions of ROCm that the current version of rocThrust can be run on, for all possible GPU architectures that are supported. This is basically the same as what the SQLite database file is currently. Storing the info in a database saves the trouble of writing code to locate the hash for a given call signature in the file and compare it with the current one (though I guess we could use separate simple files for each combination, name them in a way that makes lookup easy, and store the hashes in a consistent order). Feel free to let me know if I'm misinterpreting your suggestion - I'm not particularly attached to the database approach, and I'm totally fine with switching to something else if it's simpler.
That's fair. I actually tied to write a version myself initially, but it was too slow without the precomputed table optimization that's used in the link you posted. I'll have to check if we have the license permission to use the Rosetta code version directly - if not, I'll educate myself on that optimization, and see if I can roll our own implementation. |
The idea was that because the file with the hashes are committed into the repository, they are automatically tied to a particular version of rocThrust. By extension, a particular version of ROCm contains a particular version of rocThrust. Similarly, a particular version of rocThrust is tied to a particular version of ROCm and by extension a parituclar version of clang that it contains. In theory its all one big package: when for example clang is updated and reorders the operations (its allowed to do this when --fast-math is enabled), that change only makes it over to the customer when a new version of ROCm is released, at which point we have already updated the hashes: they would always be those for the latest version of the compiler, for the latest version of ROCm, and for the latest version of rocThrust and rocPRIM. I guess that does depend on a relatively tight integration loop between releases of the compiler and updating rocPRIM and rocThrust. Are you expecting that people mix and match ROCm/ROCm clang and rocThrust versions? |
I guess I had assumed that people would be able to mix and match ROCm and rocThrust versions - for example, if they install rocThrust or ROCm via the Github repos, they are free to checkout whatever release tags they want, provided they can build and run together. I'm not sure how likely this is for the customers who've requested the bitwise reproducibility testing, but I thought it might be nice to cover the case. Aside from the mix-and-match issue, there are a couple other less-significant factors I just wanted to mention as well - feel free to let me know what you think about them. Both rocFFT and rocSPARSE have implemented similar bitwise reproducibility testing in the past using databases, so I checked in with some of the devs on those projects. The reasons they gave for choosing to use a database mainly included:
|
If there is prior art on this topic I'm not as much against it.
This is true, and you also don't need to run a bunch of git commands. I guess the downside is that one needs additional CI tooling for this to be visible. Out of curiosity, do rocFFT and rocSPARSE have tooling available for this? |
I don't believe there are currently any existing CI tools that have been built specifically for looking at the history of bitwise reproducibilty - I got the sense that the devs' idea was that, if such a history needed to be produced, it could be done pretty easily with some SQL queries (or a Python script that runs them). |
0e5d030
to
99d1b99
Compare
I've removed the CRCpp library and replaced it with a custom implementation. |
99d1b99
to
fa03072
Compare
Added documentation to describe how to invoke the new tests. |
Currently, we have tests that verify that the deterministic algorithms for scan and reduce (available via the det and par__det_nosync execution policies) produce the same results when run twice in a row within the same test executable. This change adds new bitwise reproducibility testing that is performed across multiple test executable runs. To do this, the test executable must be run twice. On the first run, we hash all the inputs and outputs to/from the deterministic function calls, and store those hashes in an SQLite database. We also insert information about the ROCm version, rocThrust version, and GPU architecture, into the database, since the results the deterministic algorithms produce are allowed to vary if any those factors change. On the second run, we hash the inputs/output again and check to see if a row matching them (plus the ROCm version, rocThrust version etc.) exists in the database. This change: - Adds SQLite as a dependency in the CMake files. SQLite code is public domain. - Add the database-based testing to the existing reproducibility tests in test/test_reproducibility.cpp - Adds classes for database creation/manipulation and building hashes in tests/bitwise_repro/ - Adds documentation that describes how to invoke the new tests Because the run-to-run testing accesses a disk-based database file, it's pretty slow, so I've disabled it by default. To turn it on, you must define an environment variable called ROCTHRUST_BWR_PATH and set it to the path to the SQLite database file to use for the testing. If the database does not exist, to generate it, you must also define an environment variable called ROCTHRUST_BWR_GENERATE and set it to 1. This creates the database file given by ROCTHRUST_BWR_PATH and inserts data into it as the run-to-run reproducibility tests are executed (the tests will all pass in this case). If you want to take an existing database and update it for a new ROCm version/rocThrust version/GPU architecture, you can set ROCTHRUST_BWR_GENERATE=1 and point ROCTHRUST_BWR_PATH to the existing database file.
fa03072
to
4fc0c44
Compare
Rebased to resolve conflicts and refactored a bit to remove compiler warnings when building other tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI failures seem unrelated to the PR.
Hi @Snektron, apologies for not requesting a re-review earlier. I need to merge this change today, and I'll need to dismiss this review in order to do that. If you have any further concerns about this change, please don't hesitate to reach out and we can chat.
This change requires rocPRIM PR #575
I've rebased it on top of rocThrust PR #431.
You can check out this commit db784c4 for a more specific diff.
Currently, we have tests that verify that the deterministic algorithms
for scan and reduce (available via the det and par__det_nosync
execution policies) produce the same results when run twice in a row
within the same test executable.
This change adds new bitwise reproducibility testing that is performed
across multiple test executable runs.
To do this, the test executable must be run twice. On the first run,
we hash all the inputs and outputs to/from the deterministic function
calls, and store those hashes in an SQLite database. We also insert
information about the ROCm version, rocThrust version, and GPU architecture,
into the database, since the results the deterministic algorithms produce
are allowed to vary if any those factors change.
On the second run, we hash the inputs/output again and check to see
if a row matching them (plus the ROCm version, rocThrust version etc.) exists
in the database.
This change:
provides a checksum algorithm that we can use for hashing. It uses a BSD 3-clause
license.
test/test_reproducibility.cpp
tests/bitwise_repro/
Because the run-to-run testing accesses a disk-based database file, it's pretty
slow, so I've disabled it by default. To turn it on, you must define an
environment variable called ROCTHRUST_BWR_PATH and set it to the path to
the SQLite database file to use for the testing.
If the database does not exist, to generate it, you must also define an
environment variable called ROCTHRUST_BWR_GENERATE and set it to 1.
This creates the database file given by ROCTHRUST_BWR_PATH and inserts
data into it as the run-to-run reproducibility tests are executed (the tests
will all pass in this case).
If you want to take an existing database and update it for a new
ROCm version/rocThrust version/GPU architecture, you can set
ROCTHRUST_BWR_GENERATE=1 and point ROCTHRUST_BWR_PATH to
the existing database file.