-
Notifications
You must be signed in to change notification settings - Fork 734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][CUDA] Cuda adapter multi device context #10737
Conversation
Some basic perf results on Geforce 1050Ti:
I don't think we can reasonably say that this is patch is actually giving a speedup, but maybe we can be assured that it is not causing a terrible perf degradation. |
This patch also changes the |
I think folding in of the rename of |
Yeah I agree but semantic changes were happening in this PR since the 1-to-1 mapping of |
We should be able to reuse a lot of the logic. Will start working on HIP version as soon as this merges |
UR_ASSERT(phNativeContext, UR_RESULT_ERROR_INVALID_NULL_POINTER); | ||
|
||
*phNativeContext = | ||
reinterpret_cast<ur_native_handle_t>(hContext->getDevices()[0]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we make the ur_native_handle_t
a list of context here and do the [0]
in the SYCL runtime (I think we still have two interop interfaces for CUDA and one of them returns a list of contexts)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK sure
ur_cast<ur_buffer_ *>(MemArg.Mem)->LastEventWritingToMemObj; | ||
MemDepEvent && std::find(DepEvents.begin(), DepEvents.end(), | ||
MemDepEvent) == DepEvents.end()) { | ||
DepEvents.push_back(MemDepEvent); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since CUDA streams are in-order would it be simpler to just enqueue the copies on the same stream as the commands? That way we might be able to avoid dealing with extra events and synchronization and just piggy back on the regular command's events and synchronization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can enqueue the copies on the same stream but in the case that depEvents
are kernels on a different device we can't rely on streams. We need to use CUEvents
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't that be covered by the regular command's events?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No the regular commands events are only the ones that are passed to the adapter by the RT, which may be from a queue from a different SYCL context. Since we are unifying the context then the SYCL RT knows that it doesn't need to use dep events within the same context because the PI will handle these dependencies, as openCL does
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oooh of course that makes sense, let me go through this again in a bit
@hdelan : were we expecting improvements or regressions in performance, or neither? |
We were hopefully expecting neither, although some enqueue operations need to use mutexes now and also |
Fixes #7808 |
The `sycl::context` does not map clearly to a native context for CUDA and HIP backends. This is especially true now that we are adding support for multi device context #10737 . It would be good to start this deprecation process. PRs to oneMKL and oneDNN to follow
@sergey-semenov Could you please review the PR ? Thanks. |
Hi @jinz2014 this PR is outdated, I will make a new PR for the CUDA adapter in UR in about a month or so |
This patch enables a multi device context in CUDA adapter. This also enables a multi-device platform as a side effect. Management of memory across buffers within the same context is now handled in the CUDA adapter.