-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CUDA][XPTI] Fix XPTI-based CUDA tracing capabilities #1173
Conversation
This patch introduces four major changes: - Enable XPTI tracing for out-of-tree adapters by actually linking them with XPTI static library and define XPTI_ENABLE_INSTRUMENTATION macro (which is globally defined in SYCL Runtime builds) - Replace stream name prefix "sycl.experimental" with "ur.adapter" - Eliminate debug stream. The reasons for its existance are historical. In SYCL Runtime there was no mechanism to capture PI layer arguments. The first version of XPTI would only provide function name, which was enough for profiling, but not enough for debugging purposes. Once argument capturing was introduced in XPTI, SYCL Runtime started exporting a new stream. The reason for that is preformance: packaging arguments introduces unwanted overhead for profilers. CUDA plugin (as well as Level Zero) simply followed the same pattern without reflecting on the reasons behind it. However, having two streams doesn't make much sense since CUPTI always returns both function arguments and return value in its callbacks. Hence, the debug stream is now call stream, and the original call stream is gone for good. - Add documentation on the actual contents of the call stream for CUDA Adapter.
Codecov ReportAll modified and coverable lines are covered by tests ✅
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #1173 +/- ##
=======================================
Coverage 15.70% 15.70%
=======================================
Files 223 223
Lines 31518 31518
Branches 3556 3556
=======================================
Hits 4951 4951
Misses 26516 26516
Partials 51 51 ☔ View full report in Codecov by Sentry. |
@@ -22,14 +22,11 @@ | |||
#include <iostream> | |||
|
|||
#ifdef XPTI_ENABLE_INSTRUMENTATION | |||
constexpr auto CUDA_CALL_STREAM_NAME = "sycl.experimental.cuda.call"; | |||
constexpr auto CUDA_DEBUG_STREAM_NAME = "sycl.experimental.cuda.debug"; | |||
constexpr auto CUDA_CALL_STREAM_NAME = "ur.adapter.cuda.call"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the case of UR, the stream name is just ur
. For consistency, should we call this one ur.adapter.cuda
or rename the ur one to ur.call
? I have no preference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, ur.call
is a better choice, because there's so much more that I'd like UR to export to XPTI other than calls. Device-side profiling info, diagnostic messages, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing existing stream name is also better done in a separate PR, and I guess, I'll have to prepare PRs for SYCL as well.
@pbalcer I see the configuration step failing due to a missing cupti library. I don't see that locally, but I was able to track down a differently configured machine and reproduce that issue. Apparently, UR uses CMake 3.14 and find_package(CUDA), which has been deprecated for a very long time. The reason behind that in the original SYCL repository was compatibility with upstream LLVM, which in turn did require compatibility with Ubuntu 20.04. Now that both UR being a separate project and LLVM uplifting CMake requirements to 3.20, we can switch to find_package(CUDAToolkit), which apparently resolves the issue. If there's no objection, I can make a separate change for that, and then rebase this patch. |
Sounds good. I think the reason we used 3.14 is because that's what sycl cmake file uses - so that will need updating as well. There's another patch, #1070, that uses dlopen for cupti. It should solve this problem, but I think your suggested fix is reasonable regardless. |
I think bumping the required CMake version should be reported as an issue so we can address it in our own time. No need to open an PR. There are a number of things to consider when doing a bump like this like making sure all our CI will continuing working with the change.
This is necessary for other reasons so we will merge that in any case, if it also removes the need to bump the required CMake version that's a bonus. |
Speaking of #1070, doesn't that fix this same issue? |
I don't see it setting |
As I said earlier, SYCL Runtime did have it aligned to LLVM version, and that was the very reason it didn't use |
@oneapi-src/unified-runtime-cuda-write and particularly @pasaulais should be involved in this discussion. |
We've had a lot of problem with just linking And this is a big issue because when it can't find the This is why we've decided to switch it to |
Closing due to going with the approach in #1070. |
This patch introduces four major changes: