profiler ROCm tracing result was wrong #2014

demonsan · 2022-08-22T06:15:33Z

I try to profile a resnet50 model based on https://pytorch.org/tutorials/intermediate/tensorboard_profiler_tutorial.html.
But the trace result in chrome://tracing is weird.

The profiler should record from 0ms but here record from nearly 250 ms. And the active step didn't launch the kernel on GPU.
The schedule was set up to wait=1, warmup=1, active=1 .

cc @aaronenyeshi @chaekit @sekyondaMeta @svekars @carljparker @NicolasHug @kit1980 @subramen @robieta

ver2king · 2023-06-07T19:47:46Z

/assigntome

svekars · 2023-10-24T18:40:36Z

This issue has been unassigned due to inactivity. If you are still planning to work on this, you can still send a PR referencing this issue.

guptaaryan16 · 2023-11-01T21:57:47Z

/assigntome

guptaaryan16 · 2023-11-06T15:39:22Z

I am not able to get any output in tensorboard after running the pt_profiler. It gives the following error and displays a blank page in PyTorch profiler tab, Is it due to the fact that the tensorboard profiler is being depreciated? I have tested this on colab and my local setup(MacBook Air M1).
Can someone confirm my findings by reproducing this on their end?
Although seems like we can visualise the trace here: https://ui.perfetto.dev/ as referred in this issue pytorch/kineto#805

> tensorboard --logdir=log --load_fast=false
TensorFlow installation not found - running with reduced feature set.
W1106 21:01:02.988501 8174490368 profile_plugin_loader.py:71] Unable to load profiler plugin. Import error: cannot import name 'builder' from 'google.protobuf.internal' (/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/google/protobuf/internal/__init__.py)
I1106 21:01:04.105515 6173667328 plugin.py:429] Monitor runs begin
I1106 21:01:04.106191 6173667328 plugin.py:444] Find run directory /Users/guptaaryan16/Desktop/OSS/test/result
I1106 21:01:04.107095 6190493696 plugin.py:493] Load run result
I1106 21:01:04.112741 6190493696 loader.py:57] started all processing
Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
TensorBoard 2.10.0 at http://localhost:6006/ (Press CTRL+C to quit)
I1106 21:01:05.205070 6190493696 plugin.py:497] Run result loaded
I1106 21:01:05.205360 6207320064 plugin.py:467] Add run result
W1106 21:01:12.815429 6190493696 application.py:558] path /data/index.js not found, sending 404
W1106 21:01:16.148443 6190493696 application.py:558] path /data/index.js not found, sending 404

Also I was thinking about rewriting this tutorial using torch/kineto, Any thoughts on that?
cc @svekars @carljparker

demonsan · 2023-11-06T23:31:11Z

I am not able to get any output in tensorboard after running the pt_profiler. It gives the following error and displays a blank page in PyTorch profiler tab, Is it due to the fact that the tensorboard profiler is being depreciated? I have tested this on colab and my local setup(MacBook Air M1). Can someone confirm my findings by reproducing this on their end? Although seems like we can visualise the trace here: https://ui.perfetto.dev/ as referred in this issue pytorch/kineto#805
> tensorboard --logdir=log --load_fast=false
TensorFlow installation not found - running with reduced feature set.
W1106 21:01:02.988501 8174490368 profile_plugin_loader.py:71] Unable to load profiler plugin. Import error: cannot import name 'builder' from 'google.protobuf.internal' (/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/google/protobuf/internal/__init__.py)
I1106 21:01:04.105515 6173667328 plugin.py:429] Monitor runs begin
I1106 21:01:04.106191 6173667328 plugin.py:444] Find run directory /Users/guptaaryan16/Desktop/OSS/test/result
I1106 21:01:04.107095 6190493696 plugin.py:493] Load run result
I1106 21:01:04.112741 6190493696 loader.py:57] started all processing
Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
TensorBoard 2.10.0 at http://localhost:6006/ (Press CTRL+C to quit)
I1106 21:01:05.205070 6190493696 plugin.py:497] Run result loaded
I1106 21:01:05.205360 6207320064 plugin.py:467] Add run result
W1106 21:01:12.815429 6190493696 application.py:558] path /data/index.js not found, sending 404
W1106 21:01:16.148443 6190493696 application.py:558] path /data/index.js not found, sending 404
Also I was thinking about rewriting this tutorial using torch/kineto, Any thoughts on that? cc @svekars @carljparker

try to open the .trace file with chrome://tracing in chrome web browser. it might work. And it's true that pt_profiler displays a blank page in trace tab. But I find some kernel info in other pages.

svekars · 2023-11-07T01:05:11Z

This issue has been unassigned due to inactivity. If you are working on this issue, assign it to yourself and send a PR ASAP.

guptaaryan16 · 2023-11-07T07:10:25Z

/assigntome

guptaaryan16 · 2023-11-07T19:59:17Z

Thanks @demonsan for the reply. You are right, I could get the trace through chrome://tracing but seems like pt_profiler doesn't work. Also seems like your question about 250ms delay is due to the fact that there is an overhead for the run time of training step due to some overhead operations. But I will try to experiment with torch.compile and see if this still happens, Any thoughts on this?

cc @svekars @carljparker

guptaaryan16 · 2023-11-07T20:04:27Z

@svekars @carljparker Do you have any thoughts changing this tutorial to using something new for tracing or dropping as the example altogether as seems like torch/kineto plugin for tensorboard is no longer supported in new versions of tensorboard and PyTorch ( reference issue pytorch/kineto#805)

demonsan · 2023-11-08T00:17:22Z

Thanks @demonsan for the reply. You are right, I could get the trace through chrome://tracing but seems like pt_profiler doesn't work. Also seems like your question about 250ms delay is due to the fact that there is an overhead for the run time of training step due to some overhead operations. But I will try to experiment with torch.compile and see if this still happens, Any thoughts on this?

cc @svekars @carljparker

You're right. There are some overhead operations. That makes sense. And torch.compile seems to be supported after pytorch2.0. Howerver, I didn't have such env for testing and couldn't dig more on this. Sorry about that.

guptaaryan16 · 2023-11-08T21:46:30Z

No worries @demonsan, I checked on torch.compile(backend='eager') and got the following results on Google Colab T4 GPU, although tracing file was about 400 mb so I can't show you all the results,

Seems like the eager backend has a lower overhead with torch.compile usage, which is expected, and so I seem to think this can be very interesting to experiment with. I am also giving my google colab link here for you to experiment with, https://colab.research.google.com/drive/189ax076si63ekmb1kZhqZaQ6ljUww56t?usp=sharing

guptaaryan16 · 2023-11-09T16:19:56Z

@demonsan I tested the PyTorch_profiler on different browsers and on VS Code extension and seems like it indeed work, just not on Safari for me. Not sure why this happens, anyway.

Now I am not able to reproduce what you posted in the issue i.e. if we keep wait=1, warmup=1, active=1 then I should get the output which you posted on the issue right, can you mention what other code changes did you make in the tutorial to get this output.
cc @svekars @carljparker

demonsan · 2023-11-09T23:27:15Z

@demonsan I tested the PyTorch_profiler on different browsers and on VS Code extension and seems like it indeed work, just not on Safari for me. Not sure why this happens, anyway.

Now I am not able to reproduce what you posted in the issue i.e. if we keep wait=1, warmup=1, active=1 then I should get the output which you posted on the issue right, can you mention what other code changes did you make in the tutorial to get this output. cc @svekars @carljparker

i think there is nothing else. Perhaps I use rocm version of pytorch not cuda. Maybe rocprofiler or anything else works in different ways? I think this problem is not a big deal and could work on my purpose. I'm just curious on this behavior. Thanks for your help. :)

guptaaryan16 · 2023-11-10T07:43:20Z

@demonsan Thanks for the reply. Since I don't have access to ROCm, can you please share the trace file for that case so that I can analyse the problems/bottlenecks in the process. Also it seems like torch_profile function is supported only for CUDA profile activities, so maybe you have to report this issue in PyTorch for this support in future.

@svekars @carljparker can you please tell us some team from PyTorch that works on AMD ROCm support so that we can report these issues to them, or should we just raise this issue in PyTorch itself

guptaaryan16 · 2023-11-12T19:02:55Z

@demonsan @svekars @carljparker seems like right now it would be best to mark this tutorial not supported for AMD ROCm GPUs as per the PyTorch API guide. I will try to raise this issue in the PyTorch dev discussions and main issue list as well

hongxiayang · 2023-11-14T22:27:39Z

I created a pytorch issue pytorch/pytorch#113698. Will update there.

hongxiayang · 2023-11-15T01:57:07Z

I will put up a pull request for this topic regarding my learnings. (#2684)

pytorch-bot bot added the module: rocm label Aug 22, 2022

svekars added tensorboard module: profiler labels Nov 14, 2022

svekars added medium docathon-h1-2023 A label for the docathon in H1 2023 labels May 31, 2023

github-actions bot assigned ver2king Jun 7, 2023

svekars unassigned ver2king Oct 24, 2023

svekars added docathon-h2-2023 and removed docathon-h1-2023 A label for the docathon in H1 2023 labels Oct 30, 2023

pytorch-bot bot added the ciflow/rocm label Nov 1, 2023

github-actions bot assigned guptaaryan16 Nov 1, 2023

svekars unassigned guptaaryan16 Nov 7, 2023

github-actions bot assigned guptaaryan16 Nov 7, 2023

guptaaryan16 mentioned this issue Nov 12, 2023

Add No Support Label for ROCm GPU in pytorch profiler tutorial #2674

Closed

4 tasks

hongxiayang mentioned this issue Nov 14, 2023

verify ROCM profiler behavior listed in https://github.com/pytorch/tutorials/issues/2014 pytorch/pytorch#113698

Closed

hongxiayang mentioned this issue Nov 15, 2023

adding ROCm specifics for the tensorboard profiler tutorials #2684

Merged

4 tasks

svekars closed this as completed in #2684 Dec 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

profiler ROCm tracing result was wrong #2014

profiler ROCm tracing result was wrong #2014

demonsan commented Aug 22, 2022 •

edited by pytorch-bot bot

Loading

ver2king commented Jun 7, 2023

svekars commented Oct 24, 2023

guptaaryan16 commented Nov 1, 2023

guptaaryan16 commented Nov 6, 2023 •

edited

Loading

demonsan commented Nov 6, 2023

svekars commented Nov 7, 2023

guptaaryan16 commented Nov 7, 2023

guptaaryan16 commented Nov 7, 2023

guptaaryan16 commented Nov 7, 2023

demonsan commented Nov 8, 2023

guptaaryan16 commented Nov 8, 2023

guptaaryan16 commented Nov 9, 2023

demonsan commented Nov 9, 2023

guptaaryan16 commented Nov 10, 2023

guptaaryan16 commented Nov 12, 2023

hongxiayang commented Nov 14, 2023

hongxiayang commented Nov 15, 2023 •

edited

Loading

profiler ROCm tracing result was wrong #2014

profiler ROCm tracing result was wrong #2014

Comments

demonsan commented Aug 22, 2022 • edited by pytorch-bot bot Loading

ver2king commented Jun 7, 2023

svekars commented Oct 24, 2023

guptaaryan16 commented Nov 1, 2023

guptaaryan16 commented Nov 6, 2023 • edited Loading

demonsan commented Nov 6, 2023

svekars commented Nov 7, 2023

guptaaryan16 commented Nov 7, 2023

guptaaryan16 commented Nov 7, 2023

guptaaryan16 commented Nov 7, 2023

demonsan commented Nov 8, 2023

guptaaryan16 commented Nov 8, 2023

guptaaryan16 commented Nov 9, 2023

demonsan commented Nov 9, 2023

guptaaryan16 commented Nov 10, 2023

guptaaryan16 commented Nov 12, 2023

hongxiayang commented Nov 14, 2023

hongxiayang commented Nov 15, 2023 • edited Loading

demonsan commented Aug 22, 2022 •

edited by pytorch-bot bot

Loading

guptaaryan16 commented Nov 6, 2023 •

edited

Loading

hongxiayang commented Nov 15, 2023 •

edited

Loading