-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update RMM adaptors, containers and tests to use get/set_current_device_resource_ref() #1661
Update RMM adaptors, containers and tests to use get/set_current_device_resource_ref() #1661
Conversation
Co-authored-by: Lawrence Mitchell <wence@gmx.li>
Co-authored-by: Lawrence Mitchell <wence@gmx.li>
Co-authored-by: Bradley Dice <bdice@bradleydice.com>
…ism/rmm into fea-get_current_device_resource_ref
…source_ref state as well.
On the Python side, we need someone to own concrete objects (we can't just provide classes that wrap Adapters also need a way to keep their upstream alive. We are removing the Here is a very slide-ware sketch approach for discussion:
This is effectively working around not (yet) having |
// Note: even though set_per_device_resource() and set_per_device_resource_ref() are not | ||
// interchangeable, we call the latter from the former to maintain resource_ref | ||
// state consistent with the resource pointer state. This is necessary because the | ||
// Python API still uses the raw pointer API. Once the Python API is updated to use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cudf JNI also uses set_per_device_resource
and get_current_device_resource
today, so we'll also be needing to change our code to use _ref
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, with a question about the (now-redundant) class template parameter on all the adaptor MRs.
* @param alignment_threshold Only allocations with a size larger than or equal to this threshold | ||
* are aligned. | ||
*/ | ||
explicit aligned_resource_adaptor(Upstream* upstream, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how to handle this migration without making a breaking change, but this class no longer needs to be a templated class, but rather this constructor should be templated.
And hence, question: does the constructor even need to exist, if there is transparent conversion from Upstream *
to device_async_resource_ref
?
That is, what doesn't work if the only constructor is:
aligned_resource_adaptor(device_async_resource_ref upstream, ...);
Applies mutatis mutandis to the other adaptor MR changes as well, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, the plan is to remove the template parameter and the Upstream*
constructors, once we add the resource_ref
constructors and convert all of RAPIDS to use them. But we can't do it yet.
See #1457
Let's not merge until I can do some more downstream testing, including JNI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got about 2/3 through the changes. I’ll finish tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a couple more comments but overall I think this is a good direction and this PR is of sufficient quality that I feel comfortable approving it. Thanks for answering my questions.
@@ -40,6 +40,7 @@ function(ConfigureTestInternal TEST_NAME) | |||
PUBLIC "SPDLOG_ACTIVE_LEVEL=SPDLOG_LEVEL_${RMM_LOGGING_LEVEL}") | |||
target_compile_options(${TEST_NAME} PUBLIC $<$<COMPILE_LANG_AND_ID:CXX,GNU,Clang>:-Wall -Werror | |||
-Wno-error=deprecated-declarations>) | |||
target_compile_options(${TEST_NAME} PUBLIC "$<$<CONFIG:Debug>:-O0>") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this temporary / for testing?
target_compile_options(${TEST_NAME} PUBLIC "$<$<CONFIG:Debug>:-O0>") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like it to stay, at least until this fixes the problem: . It only applies to Debug builds. rapidsai/rapids-cmake#634 (comment)
@@ -233,14 +233,17 @@ TYPED_TEST(MRRefTest, UnsupportedAlignmentTest) | |||
for (std::size_t num_trials = 0; num_trials < NUM_TRIALS; ++num_trials) { | |||
for (std::size_t alignment = MinTestedAlignment; alignment <= MaxTestedAlignment; | |||
alignment *= TestedAlignmentMultiplier) { | |||
#ifdef NDEBUG |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is related to the CMakeLists.txt change I noted above? What's the rationale for this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In debug mode, the deallocation will assert, crashing the test. I was debugging when getting this PR working, and discovered this.
CI does not test our debug builds, so these things creep in.
/merge |
Merge after rapidsai/rmm#1661 Creates and uses CUDF internal wrappers around RMM `current_device_resource` functions. I've marked this PR as breaking because it breaks the ABI, however the API is compatible. For reviewers, the most substantial additions are in the new file `<cudf/utilities/memory_resource.hpp>`, and in the `DEVELOPER_GUIDE.md` and `*.rst` docs. The rest are all replacements of an include and all calls to `rmm::get_current_device_resource()` with `cudf::get_current_device_resource_ref()`. Closes #16676 Authors: - Mark Harris (https://github.com/harrism) Approvers: - Nghia Truong (https://github.com/ttnghia) - GALI PREM SAGAR (https://github.com/galipremsagar) - https://github.com/nvdbaranec - David Wendt (https://github.com/davidwendt) URL: #16679
Description
Closes #1660.
This adds a constructor to each MR adaptor to take a resource_ref rather than an
Upstream*
. It also updates RMM to useget_current_device_resource_ref()
everywhere: in containers, in tests, in adaptors, Thrust allocator, polymorphic allocator, execution_policy, etc.Importantly, this PR also modifies
set_current_device_resource()
to basically callset_current_device_resource_ref()
. This is necessary, because while RMM C++ usesget_current_device_resource_ref()
everywhere, the Python API still uses the raw pointer APIset_current_device_resource()
. So we need the latter to update the state for the former. This is a temporary bootstrap to help with the refactoring.Checklist