-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
profiler ROCm tracing result was wrong #2014
Comments
/assigntome |
This issue has been unassigned due to inactivity. If you are still planning to work on this, you can still send a PR referencing this issue. |
/assigntome |
I am not able to get any output in tensorboard after running the pt_profiler. It gives the following error and displays a blank page in PyTorch profiler tab, Is it due to the fact that the tensorboard profiler is being depreciated? I have tested this on colab and my local setup(MacBook Air M1).
Also I was thinking about rewriting this tutorial using |
try to open the .trace file with chrome://tracing in chrome web browser. it might work. And it's true that pt_profiler displays a blank page in trace tab. But I find some kernel info in other pages. |
This issue has been unassigned due to inactivity. If you are working on this issue, assign it to yourself and send a PR ASAP. |
/assigntome |
Thanks @demonsan for the reply. You are right, I could get the trace through |
@svekars @carljparker Do you have any thoughts changing this tutorial to using something new for tracing or dropping as the example altogether as seems like |
You're right. There are some overhead operations. That makes sense. And torch.compile seems to be supported after pytorch2.0. Howerver, I didn't have such env for testing and couldn't dig more on this. Sorry about that. |
No worries @demonsan, I checked on Seems like the eager backend has a lower overhead with torch.compile usage, which is expected, and so I seem to think this can be very interesting to experiment with. I am also giving my google colab link here for you to experiment with, https://colab.research.google.com/drive/189ax076si63ekmb1kZhqZaQ6ljUww56t?usp=sharing |
@demonsan I tested the PyTorch_profiler on different browsers and on VS Code extension and seems like it indeed work, just not on Safari for me. Not sure why this happens, anyway. Now I am not able to reproduce what you posted in the issue i.e. if we keep |
i think there is nothing else. Perhaps I use rocm version of pytorch not cuda. Maybe rocprofiler or anything else works in different ways? I think this problem is not a big deal and could work on my purpose. I'm just curious on this behavior. Thanks for your help. :) |
@demonsan Thanks for the reply. Since I don't have access to ROCm, can you please share the trace file for that case so that I can analyse the problems/bottlenecks in the process. Also it seems like @svekars @carljparker can you please tell us some team from PyTorch that works on AMD ROCm support so that we can report these issues to them, or should we just raise this issue in PyTorch itself |
@demonsan @svekars @carljparker seems like right now it would be best to mark this tutorial not supported for AMD ROCm GPUs as per the PyTorch API guide. I will try to raise this issue in the PyTorch dev discussions and main issue list as well |
I created a pytorch issue pytorch/pytorch#113698. Will update there. |
I will put up a pull request for this topic regarding my learnings. (#2684) |
I try to profile a resnet50 model based on https://pytorch.org/tutorials/intermediate/tensorboard_profiler_tutorial.html.
But the trace result in chrome://tracing is weird.
The profiler should record from 0ms but here record from nearly 250 ms. And the active step didn't launch the kernel on GPU.
The schedule was set up to wait=1, warmup=1, active=1 .
cc @aaronenyeshi @chaekit @sekyondaMeta @svekars @carljparker @NicolasHug @kit1980 @subramen @robieta
The text was updated successfully, but these errors were encountered: