Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collect CUDA/CPU profiling info into result sheets. #5921

Merged
merged 2 commits into from
Nov 28, 2023

Conversation

golechwierowicz
Copy link
Collaborator

@golechwierowicz golechwierowicz commented Nov 23, 2023

This PR:

  1. Adds CUDA/CPU collection capabilties to the script.
  2. Modifies result_analyzer.py to analyze newly collected results.
  3. Moves CUDA synchronize/XLA device synchronize into the profiler.
  4. Fixes list typing for Python 3.8+.

Tested with command:

python3 xla/benchmarks/experiment_runner.py --dynamo=openxla --xla=PJRT --test=train --filter=basic_gnn_gcn$ --suite-name=torchbench --accelerator=cuda --progress-bar --output-dirname=/tmp/output --repeat=2 --print-subprocess --no-resume --profile-cuda-cpu-collect --profile-cuda python3 xla/benchmarks/result_analyzer.py --output-dir=/tmp/output

This PR:
0. Adds CUDA/CPU collection capabilties to the script.
1. Modifies result_analyzer.py to analyze newly collected results.
2. Moves CUDA synchronize/XLA device synchronize into the profiler.
3. Fixes list typing for Python 3.8+.

Tested with command:
python3 xla/benchmarks/experiment_runner.py --dynamo=openxla --xla=PJRT --test=train --filter=basic_gnn_gcn$ --suite-name=torchbench --accelerator=cuda --progress-bar --output-dirname=/tmp/output --repeat=2 --print-subprocess --no-resume --profile-cuda-cpu-collect --profile-cuda
python3 xla/benchmarks/result_analyzer.py --output-dir=/tmp/output
Copy link
Collaborator

@frgossen frgossen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, ty!

benchmarks/experiment_runner.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@frgossen frgossen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one comment.

)
return

kernel_dump = prof.profiler.total_average()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious where is this total_average() defined?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@golechwierowicz golechwierowicz merged commit c03afb1 into master Nov 28, 2023
18 checks passed
@golechwierowicz golechwierowicz deleted the olechwierowicz/add_profiling_data branch November 28, 2023 08:05
@zpcore zpcore mentioned this pull request Nov 28, 2023
ManfeiBai pushed a commit to ManfeiBai/PyTorchXLA that referenced this pull request Dec 1, 2023
* Collect CUDA/CPU profiling info into result sheets.

This PR:
0. Adds CUDA/CPU collection capabilties to the script.
1. Modifies result_analyzer.py to analyze newly collected results.
2. Moves CUDA synchronize/XLA device synchronize into the profiler.
3. Fixes list typing for Python 3.8+.

Tested with command:
python3 xla/benchmarks/experiment_runner.py --dynamo=openxla --xla=PJRT --test=train --filter=basic_gnn_gcn$ --suite-name=torchbench --accelerator=cuda --progress-bar --output-dirname=/tmp/output --repeat=2 --print-subprocess --no-resume --profile-cuda-cpu-collect --profile-cuda
python3 xla/benchmarks/result_analyzer.py --output-dir=/tmp/output

* Lint, and add _s suffix to metrics

---------

Co-authored-by: root <root@olechwierowicz9.zrh.corp.google.com>
ManfeiBai pushed a commit to ManfeiBai/PyTorchXLA that referenced this pull request Dec 1, 2023
* Collect CUDA/CPU profiling info into result sheets.

This PR:
0. Adds CUDA/CPU collection capabilties to the script.
1. Modifies result_analyzer.py to analyze newly collected results.
2. Moves CUDA synchronize/XLA device synchronize into the profiler.
3. Fixes list typing for Python 3.8+.

Tested with command:
python3 xla/benchmarks/experiment_runner.py --dynamo=openxla --xla=PJRT --test=train --filter=basic_gnn_gcn$ --suite-name=torchbench --accelerator=cuda --progress-bar --output-dirname=/tmp/output --repeat=2 --print-subprocess --no-resume --profile-cuda-cpu-collect --profile-cuda
python3 xla/benchmarks/result_analyzer.py --output-dir=/tmp/output

* Lint, and add _s suffix to metrics

---------

Co-authored-by: root <root@olechwierowicz9.zrh.corp.google.com>
@miladm
Copy link
Collaborator

miladm commented Dec 4, 2023

Thanks.

cc @zpcore to take advantage of this feature in future benchmarking automation work.

@miladm miladm requested a review from zpcore December 4, 2023 16:47
@miladm miladm added the xla:gpu label Dec 4, 2023
chunnienc pushed a commit to chunnienc/xla that referenced this pull request Dec 14, 2023
* Collect CUDA/CPU profiling info into result sheets.

This PR:
0. Adds CUDA/CPU collection capabilties to the script.
1. Modifies result_analyzer.py to analyze newly collected results.
2. Moves CUDA synchronize/XLA device synchronize into the profiler.
3. Fixes list typing for Python 3.8+.

Tested with command:
python3 xla/benchmarks/experiment_runner.py --dynamo=openxla --xla=PJRT --test=train --filter=basic_gnn_gcn$ --suite-name=torchbench --accelerator=cuda --progress-bar --output-dirname=/tmp/output --repeat=2 --print-subprocess --no-resume --profile-cuda-cpu-collect --profile-cuda
python3 xla/benchmarks/result_analyzer.py --output-dir=/tmp/output

* Lint, and add _s suffix to metrics

---------

Co-authored-by: root <root@olechwierowicz9.zrh.corp.google.com>
golechwierowicz added a commit that referenced this pull request Jan 12, 2024
* Collect CUDA/CPU profiling info into result sheets.

This PR:
0. Adds CUDA/CPU collection capabilties to the script.
1. Modifies result_analyzer.py to analyze newly collected results.
2. Moves CUDA synchronize/XLA device synchronize into the profiler.
3. Fixes list typing for Python 3.8+.

Tested with command:
python3 xla/benchmarks/experiment_runner.py --dynamo=openxla --xla=PJRT --test=train --filter=basic_gnn_gcn$ --suite-name=torchbench --accelerator=cuda --progress-bar --output-dirname=/tmp/output --repeat=2 --print-subprocess --no-resume --profile-cuda-cpu-collect --profile-cuda
python3 xla/benchmarks/result_analyzer.py --output-dir=/tmp/output

* Lint, and add _s suffix to metrics

---------

Co-authored-by: root <root@olechwierowicz9.zrh.corp.google.com>
bhavya01 pushed a commit that referenced this pull request Apr 22, 2024
* Collect CUDA/CPU profiling info into result sheets.

This PR:
0. Adds CUDA/CPU collection capabilties to the script.
1. Modifies result_analyzer.py to analyze newly collected results.
2. Moves CUDA synchronize/XLA device synchronize into the profiler.
3. Fixes list typing for Python 3.8+.

Tested with command:
python3 xla/benchmarks/experiment_runner.py --dynamo=openxla --xla=PJRT --test=train --filter=basic_gnn_gcn$ --suite-name=torchbench --accelerator=cuda --progress-bar --output-dirname=/tmp/output --repeat=2 --print-subprocess --no-resume --profile-cuda-cpu-collect --profile-cuda
python3 xla/benchmarks/result_analyzer.py --output-dir=/tmp/output

* Lint, and add _s suffix to metrics

---------

Co-authored-by: root <root@olechwierowicz9.zrh.corp.google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants