-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add No Support Label for ROCm GPU in pytorch profiler tutorial #2674
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/2674
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 3aa5d06 with merge base dc448c2 (): This comment was automatically generated by Dr. CI and updates every 15 minutes. |
No ciflow labels are configured for this repo. |
@pytorchbot label "docathon-h2-2023" |
532b321
to
4c3dd3e
Compare
@@ -7,7 +7,7 @@ | |||
Introduction | |||
------------ | |||
PyTorch 1.8 includes an updated profiler API capable of | |||
recording the CPU side operations as well as the CUDA kernel launches on the GPU side. | |||
recording the CPU side operations as well as the CUDA kernel launches on the GPU side (``AMD ROCm™`` GPUs are not supported). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeffdaily is this indeed the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, why are you adding a backtricks here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ROCm has full upstream support of both modern pytorch profiling via kineto + ROCm's libroctracer, as well as the older autograd profiler via ROCm's roctx.
Run any application, pytorch or otherwise, using ROCm's rocprof. The PyTorch GPU traces will be collected as part of your rocprof output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@malfet I quoted this function supported_activities
in official PyTorch API which does not mention anything about ROCm GPUs. I suppose they should have something here for the ROCm profiling if that's the case.
Also the backtricks is due to an error in PySpelling test where ROCm
is not in dictionary, so the test caught it as a misspelled word. I suppose I can either do this or add it in a custom dictionary list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should update that comment. ROCm profiler uses roctracer library to trace on-device HIP kernels. Passing ProfilerActivity.CUDA
is the correct activity to use for both CUDA and ROCm due to ROCm PyTorch's strategy of "masquerading" as CUDA so that users do not have to make any changes to their pytorch models when running on ROCm. PyTorch chose to expose "CUDA" in public APIs and we chose to reuse them to make the transition to ROCm from CUDA easier on users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To further substantiate the claim, can you please specify in PR description why your are making it (I.e. you've tried to run the tutorial on GPU such-n-such using PyTorch-X.Y.Z and instead of certain output you got something else. Because right now you just add a blank statement, which would be very hard to verify
@malfet I was testing the issue #2014 but since I did not have specific hardware(i.e. AMD GPU) for the testing the trace I assumed that the person who raised the issue has provided a correct trace image which seems to give wrong output. Since the trace is fine for CPU and CUDA (on testing), and there as no mention of ROCm in profiler API if they are supported, I raised this PR. Can you verify this tutorial and https://pytorch.org/tutorials/intermediate/tensorboard_profiler_tutorial.html on ROCm and see if the output trace is correct? Seems like it is off in time for about 250ms which is not the case for CPU and CUDA devices. Thanks @jeffdaily |
I'm assigning this to @hongxiayang to verify. |
|
Ok seems like this issue is resolved, so I am closing this PR |
Fixes #2014
Description
This PR is in relation to the issue of ROCm support in PyTorch Profiler, which I have seen can give some bugs on testing with PyTorch API. The official files in
PyTorch/Profiler
does not have an option for its support yet as given here: https://github.com/pytorch/pytorch/blob/7f1cbc8b5a7de8794c36833baa47bb1343833589/torch/profiler/profiler.py#L35ROCm usage can give some unexpected results so right now it's best to mark the usage unsupported. I plan to raise this issue with the PyTorch team as it will be important to support ROCm GPUs as their usage grows.
Checklist
cc @aaronenyeshi @chaekit @jcarreiro @sekyondaMeta @svekars @carljparker @NicolasHug @kit1980 @subramen