Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

profiler ROCm tracing result was wrong #2014

Closed
demonsan opened this issue Aug 22, 2022 · 17 comments · Fixed by #2684
Closed

profiler ROCm tracing result was wrong #2014

demonsan opened this issue Aug 22, 2022 · 17 comments · Fixed by #2684

Comments

@demonsan
Copy link

demonsan commented Aug 22, 2022

I try to profile a resnet50 model based on https://pytorch.org/tutorials/intermediate/tensorboard_profiler_tutorial.html.
But the trace result in chrome://tracing is weird.
image
The profiler should record from 0ms but here record from nearly 250 ms. And the active step didn't launch the kernel on GPU.
The schedule was set up to wait=1, warmup=1, active=1 .

cc @aaronenyeshi @chaekit @sekyondaMeta @svekars @carljparker @NicolasHug @kit1980 @subramen @robieta

@ver2king
Copy link

ver2king commented Jun 7, 2023

/assigntome

@svekars
Copy link
Contributor

svekars commented Oct 24, 2023

This issue has been unassigned due to inactivity. If you are still planning to work on this, you can still send a PR referencing this issue.

@svekars svekars added docathon-h2-2023 and removed docathon-h1-2023 A label for the docathon in H1 2023 labels Oct 30, 2023
@guptaaryan16
Copy link

/assigntome

@guptaaryan16
Copy link

guptaaryan16 commented Nov 6, 2023

I am not able to get any output in tensorboard after running the pt_profiler. It gives the following error and displays a blank page in PyTorch profiler tab, Is it due to the fact that the tensorboard profiler is being depreciated? I have tested this on colab and my local setup(MacBook Air M1).
Can someone confirm my findings by reproducing this on their end?
Although seems like we can visualise the trace here: https://ui.perfetto.dev/ as referred in this issue pytorch/kineto#805

> tensorboard --logdir=log --load_fast=false
TensorFlow installation not found - running with reduced feature set.
W1106 21:01:02.988501 8174490368 profile_plugin_loader.py:71] Unable to load profiler plugin. Import error: cannot import name 'builder' from 'google.protobuf.internal' (/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/google/protobuf/internal/__init__.py)
I1106 21:01:04.105515 6173667328 plugin.py:429] Monitor runs begin
I1106 21:01:04.106191 6173667328 plugin.py:444] Find run directory /Users/guptaaryan16/Desktop/OSS/test/result
I1106 21:01:04.107095 6190493696 plugin.py:493] Load run result
I1106 21:01:04.112741 6190493696 loader.py:57] started all processing
Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
TensorBoard 2.10.0 at http://localhost:6006/ (Press CTRL+C to quit)
I1106 21:01:05.205070 6190493696 plugin.py:497] Run result loaded
I1106 21:01:05.205360 6207320064 plugin.py:467] Add run result
W1106 21:01:12.815429 6190493696 application.py:558] path /data/index.js not found, sending 404
W1106 21:01:16.148443 6190493696 application.py:558] path /data/index.js not found, sending 404

Also I was thinking about rewriting this tutorial using torch/kineto, Any thoughts on that?
cc @svekars @carljparker

@demonsan
Copy link
Author

demonsan commented Nov 6, 2023

I am not able to get any output in tensorboard after running the pt_profiler. It gives the following error and displays a blank page in PyTorch profiler tab, Is it due to the fact that the tensorboard profiler is being depreciated? I have tested this on colab and my local setup(MacBook Air M1). Can someone confirm my findings by reproducing this on their end? Although seems like we can visualise the trace here: https://ui.perfetto.dev/ as referred in this issue pytorch/kineto#805

> tensorboard --logdir=log --load_fast=false
TensorFlow installation not found - running with reduced feature set.
W1106 21:01:02.988501 8174490368 profile_plugin_loader.py:71] Unable to load profiler plugin. Import error: cannot import name 'builder' from 'google.protobuf.internal' (/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/google/protobuf/internal/__init__.py)
I1106 21:01:04.105515 6173667328 plugin.py:429] Monitor runs begin
I1106 21:01:04.106191 6173667328 plugin.py:444] Find run directory /Users/guptaaryan16/Desktop/OSS/test/result
I1106 21:01:04.107095 6190493696 plugin.py:493] Load run result
I1106 21:01:04.112741 6190493696 loader.py:57] started all processing
Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
TensorBoard 2.10.0 at http://localhost:6006/ (Press CTRL+C to quit)
I1106 21:01:05.205070 6190493696 plugin.py:497] Run result loaded
I1106 21:01:05.205360 6207320064 plugin.py:467] Add run result
W1106 21:01:12.815429 6190493696 application.py:558] path /data/index.js not found, sending 404
W1106 21:01:16.148443 6190493696 application.py:558] path /data/index.js not found, sending 404

Also I was thinking about rewriting this tutorial using torch/kineto, Any thoughts on that? cc @svekars @carljparker

try to open the .trace file with chrome://tracing in chrome web browser. it might work. And it's true that pt_profiler displays a blank page in trace tab. But I find some kernel info in other pages.

@svekars
Copy link
Contributor

svekars commented Nov 7, 2023

This issue has been unassigned due to inactivity. If you are working on this issue, assign it to yourself and send a PR ASAP.

@guptaaryan16
Copy link

/assigntome

@guptaaryan16
Copy link

Thanks @demonsan for the reply. You are right, I could get the trace through chrome://tracing but seems like pt_profiler doesn't work. Also seems like your question about 250ms delay is due to the fact that there is an overhead for the run time of training step due to some overhead operations. But I will try to experiment with torch.compile and see if this still happens, Any thoughts on this?

cc @svekars @carljparker

@guptaaryan16
Copy link

@svekars @carljparker Do you have any thoughts changing this tutorial to using something new for tracing or dropping as the example altogether as seems like torch/kineto plugin for tensorboard is no longer supported in new versions of tensorboard and PyTorch ( reference issue pytorch/kineto#805)

@demonsan
Copy link
Author

demonsan commented Nov 8, 2023

Thanks @demonsan for the reply. You are right, I could get the trace through chrome://tracing but seems like pt_profiler doesn't work. Also seems like your question about 250ms delay is due to the fact that there is an overhead for the run time of training step due to some overhead operations. But I will try to experiment with torch.compile and see if this still happens, Any thoughts on this?

cc @svekars @carljparker

You're right. There are some overhead operations. That makes sense. And torch.compile seems to be supported after pytorch2.0. Howerver, I didn't have such env for testing and couldn't dig more on this. Sorry about that.

@guptaaryan16
Copy link

No worries @demonsan, I checked on torch.compile(backend='eager') and got the following results on Google Colab T4 GPU, although tracing file was about 400 mb so I can't show you all the results,
Screenshot 2023-11-09 at 3 09 29 AM

Seems like the eager backend has a lower overhead with torch.compile usage, which is expected, and so I seem to think this can be very interesting to experiment with. I am also giving my google colab link here for you to experiment with, https://colab.research.google.com/drive/189ax076si63ekmb1kZhqZaQ6ljUww56t?usp=sharing

@guptaaryan16
Copy link

@demonsan I tested the PyTorch_profiler on different browsers and on VS Code extension and seems like it indeed work, just not on Safari for me. Not sure why this happens, anyway.

Now I am not able to reproduce what you posted in the issue i.e. if we keep wait=1, warmup=1, active=1 then I should get the output which you posted on the issue right, can you mention what other code changes did you make in the tutorial to get this output.
cc @svekars @carljparker

@demonsan
Copy link
Author

demonsan commented Nov 9, 2023

@demonsan I tested the PyTorch_profiler on different browsers and on VS Code extension and seems like it indeed work, just not on Safari for me. Not sure why this happens, anyway.

Now I am not able to reproduce what you posted in the issue i.e. if we keep wait=1, warmup=1, active=1 then I should get the output which you posted on the issue right, can you mention what other code changes did you make in the tutorial to get this output. cc @svekars @carljparker

i think there is nothing else. Perhaps I use rocm version of pytorch not cuda. Maybe rocprofiler or anything else works in different ways? I think this problem is not a big deal and could work on my purpose. I'm just curious on this behavior. Thanks for your help. :)

@guptaaryan16
Copy link

@demonsan Thanks for the reply. Since I don't have access to ROCm, can you please share the trace file for that case so that I can analyse the problems/bottlenecks in the process. Also it seems like torch_profile function is supported only for CUDA profile activities, so maybe you have to report this issue in PyTorch for this support in future.

@svekars @carljparker can you please tell us some team from PyTorch that works on AMD ROCm support so that we can report these issues to them, or should we just raise this issue in PyTorch itself

@guptaaryan16
Copy link

@demonsan @svekars @carljparker seems like right now it would be best to mark this tutorial not supported for AMD ROCm GPUs as per the PyTorch API guide. I will try to raise this issue in the PyTorch dev discussions and main issue list as well

@hongxiayang
Copy link
Contributor

I created a pytorch issue pytorch/pytorch#113698. Will update there.

@hongxiayang
Copy link
Contributor

hongxiayang commented Nov 15, 2023

I will put up a pull request for this topic regarding my learnings. (#2684)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment