-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failing Torchbench Models: tracking issue #5932
Comments
Can we please add a pass rate table in the weekly report that includes: Inference
Training
|
Weekly update (Jan 8 ~ Jan 12): Pass rate (out of 99 benchmarks):
Models fixed:
PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]
Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch] |
Weekly update (Jan 15 ~ Jan 19): Pass rate (out of 99 benchmarks):
Models that started failing:
PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]
PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch] |
Can we track separate passrate tables for L4 and A100 GPUs going forward @ysiraichi? |
Weekly update (Jan 29 ~ Feb 2): Pass rate (out of 99 benchmarks):A100
L4
Models Summary (for A100)
PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]
PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch] |
Weekly update (Feb 5 ~ Feb 9): Pass rate (out of 99 benchmarks):A100
L4
Models Summary
PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]
PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch] |
Weekly update (Feb 12 ~ Feb 16): Pass rate (out of 99 benchmarks):Could not run the benchmarks this time, due to a compilation issue: #6564 PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]
PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]
Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]
|
Weekly update (Feb 19 ~ Feb 23): Pass rate (out of 99 benchmarks):There was an error in the benchmarking scripts, making it so we were unable to run using XLA: #6612 PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]
PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]
Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch] |
Pass rate (out of 99 benchmarks):A100
L4
Models Summary
|
Weekly update (Feb 26 ~ Mar 01): Pass rate (out of 99 benchmarks):
A100
L4
Models Summary
PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]
PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]
Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch] |
Weekly update (Mar 04 ~ Mar 08): Pass rate (out of 99 benchmarks):
A100
L4
Models Summary (A100)
PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]
PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]
Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch] |
Weekly update (Mar 11 ~ Mar 15): Pass rate (out of 99 benchmarks):
A100
L4
Models Summary (A100)No summary this week because:
PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]
PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]
Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch] |
@ysiraichi The regression you saw might be due to #6677 (open xla pin update). Our team is looking into this issue. |
Weekly update (Mar 18 ~ Mar 21): Pass rate (out of 99 benchmarks):
A100
L4
Models Summary (A100)
PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]
Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch] |
Last week, the results were unchanged. |
Weekly update (Apr 1 ~ Apr 5): Pass rate (out of 99 benchmarks):
A100
L4
Models Summary (A100)
PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]
PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]
Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch] |
Weekly update (Apr 8 ~ Apr 12): Pass rate (out of 99 benchmarks):
A100
L4
Models Summary (A100)
PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]
PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch] |
Weekly update (Apr 15 ~ Apr 19): Pass rate (out of 99 benchmarks):
A100
L4
Models Summary (A100)PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch] |
Weekly update (Apr 22 ~ Apr 26): Pass rate (out of 99 benchmarks):
A100
L4
Models Summary (A100)
PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch] |
Weekly update (Apr 29 ~ May 3): Pass rate (out of 99 benchmarks):
A100
L4
Models Summary (A100)
PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch] |
Weekly update (May 6 ~ May 10): Pass rate (out of 99 benchmarks):
A100
L4
Notes
Models Summary (A100)
PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch] |
Weekly update (May 13 ~ May 17): Pass rate (out of 99 benchmarks):
A100
L4
Models Summary (A100)All the difference shown bellow is likely the result of #7067, which fixes AMP. Reason: (i) training benchmarks use AMP, by default; and (ii) there are some inference benchmarks that use AMP instead of
PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch] |
Weekly update (May 20 ~ May 24): Pass rate (out of 99 benchmarks):
A100
L4
Models Summary (A100)PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]
PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]
Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch] |
Weekly update (May 27 ~ May 29): PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]
PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch] |
Weekly update (June 3 ~ June 6): Pass rate (out of 99 benchmarks):
A100
L4
Models Summary (A100)
PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch] |
Weekly update (June 10 ~ June 14): Pass rate (out of 99 benchmarks):
A100
L4
Models Summary (A100)
PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]
Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch] |
Weekly update (June 17 ~ June 21): Pass rate (out of 99 benchmarks):
A100
L4
Models Summary (A100)
PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]
PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch] |
Weekly update (June 24 ~ June 28): Pass rate (out of 99 benchmarks):
A100
L4
Models Summary (A100)
PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]
Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch] |
Weekly update (July 1 ~ July 5): Pass rate (out of 99 benchmarks):
A100
L4
Models Summary (A100)
PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]
PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch] |
Weekly update (July 8 ~ July 12): Pass rate (out of 99 benchmarks):
A100
L4
Models Summary (A100)
PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch] |
Weekly update (July 15 ~ July 19): Pass rate (out of 99 benchmarks):
A100
L4
Models Summary (A100)
PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch] |
Weekly update (July 22 ~ July 26): Pass rate (out of 99 benchmarks):
A100
L4
Models Summary (A100)
PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]
PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch] |
Weekly update (July 29 ~ Aug 9): Pass rate (out of 99 benchmarks):
A100
L4
Models Summary (A100)
PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]
PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch] |
Weekly update (Aug 12 ~ Aug 16): Pass rate (out of 99 benchmarks):
A100
L4
PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch] |
Weekly update (Aug 19 ~ Aug 23): Pass rate (out of 99 benchmarks):
A100
L4
Models Summary (A100)
PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch] |
Weekly update (Aug 26 ~ Aug 30): Pass rate (out of 99 benchmarks):
A100
L4
Models Summary (A100)
PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]
PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch] |
Weekly update (Sep 2 ~ Sep 6): Pass rate (out of 99 benchmarks):
A100
L4
Models Summary (A100)
PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch] |
Weekly update (Sep 9 ~ Sep 13): Pass rate (out of 99 benchmarks):
A100
L4
Models Summary (A100)
PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]
Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch] |
Weekly update (Sep 16 ~ Sep 20): Pass rate (out of 99 benchmarks):
A100
L4
PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]
Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch] |
Summary of Contributions (9th Feb)
Improve the number of models in TorchBench that work with Dynamo as a tracer: These passing rates are now comparable to those from torch.compile using Inductor. Some of the fixes also improved the previous tracer that PyTorch/XLA used to use.
Improve the benchmarking tools used by Google: The initial Google runs benchmarking these models showed a discrepancy of about 15 models with the results reported. We identified and fixed 10+ issues that helped reconcile Google's benchmarks with those reported and, in turn, with the PyTorch HUD.
Current State
This post has two lists:
Each of them shows the failing models:
openxla
)These lists were created using the benchmarking scripts that currently live in the upstream. The following command was executed:
python xla/benchmarks/experiment_runner.py \ --suite-name torchbench \ --accelerator cuda \ --xla PJRT --xla None \ --dynamo openxla --dynamo inductor --dynamo None \ --test eval --test train \ --repeat 30 --iterations-per-run 5 \ --print-subprocess \ --no-resume
Environment
Inference
Non-Dynamo. Pass rate: 78/81 - 96% (against inductor)
[x] DALLE2_pytorchas_strided_copy
materialize a new tensor withindex
. #6624moco
fails to run. #6083moco
inference fails to run on dynamo. #7636moco
fails to run with CUDA OpenXLA fallback. #7647nvidia_deeprecommender
fails to run. #6006pytorch_CycleGAN_and_pix2pix
fails to run. #6007[ ] simple_gpt[ ] simple_gpt_tp_manual[ ] tacotron2tacotron2
fails to run in eager-mode. #6112XlaDeviceToAtenDevice
. #5743lift_fresh
. pytorch#112202vision_maskrcnn
failing on inference with dynamo afterbfloat16
conversion. #6557index
: fix index of 0-element tensor by 0-element tensor. #7113Dynamo+
openxla
. 78/81 - 96% (against inductor)[x] DALLE2_pytorch_unsafe_index
. #5707XlaDeviceToAtenDevice
. #5743lift_fresh
. pytorch#112202as_strided_copy
materialize a new tensor withindex
. #6624openxla
) fails when returningtensor.expand
. #5837FunctionalTensor
metas. pytorch#121007moco
fails to run. #6083moco
inference fails to run on dynamo. #7636moco
fails to run with CUDA OpenXLA fallback. #7647nvidia_deeprecommender
fails to run. #6006XlaDeviceToAtenDevice
. #5743lift_fresh
. pytorch#112202XlaDeviceToAtenDevice
. #5743lift_fresh
. pytorch#112202pytorch_CycleGAN_and_pix2pix
fails to run. #6007xla_args
before computation. #5823Models also Failing on Inductor
Inference Failing on Inductor CUDA with the Same Error
Benchmarks that raise the same error on inductor:
Inference Failing on Inductor CUDA with Different Errors
Training
Non-Dynamo. Pass rate: 64/66 - 96% (against inductor)
[ ] DALLE2_pytorchdlrm
fails to run on training. #6008_embedding_bag_backward
and forcesparse=false
. #7584as_strided_copy
materialize a new tensor withindex
. #6624[ ] llama_v2_7b_16hmoco
fails to run. #6083moco
inference fails to run on dynamo. #7636moco
fails to run with CUDA OpenXLA fallback. #7647nvidia_deeprecommender
fails to run. #6006pytorch_CycleGAN_and_pix2pix
fails to run. #6007[ ] tacotron2tacotron2
fails to run in eager-mode. #6112Dynamo+
openxla
. Pass rate: 55/66 - 83% (against inductor)dlrm
fails to run on training. #6008_embedding_bag_backward
and forcesparse=false
. #7584as_strided_copy
materialize a new tensor withindex
. #6624hf_Reformer
fails to run on dynamo+openxla
training. #6009FunctionalTensor
metas. pytorch#121007moco
fails to run. #6083moco
inference fails to run on dynamo. #7636moco
fails to run with CUDA OpenXLA fallback. #7647nvidia_deeprecommender
fails to run. #6006pytorch_CycleGAN_and_pix2pix
fails to run. #6007Models also Failing on Inductor
No Training Support on Inductor CUDA
Benchmarks that raise the error:
Model's DEFAULT_TRAIN_BSIZE is not implemented
.Training Failing on Inductor CUDA with the Same Error
Benchmarks that raise the same error on inductor:
Training Failing on Inductor CUDA with Different Errors
cc @JackCaoG @miladm
The text was updated successfully, but these errors were encountered: