-
Notifications
You must be signed in to change notification settings - Fork 467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[torchbench] Inductor failing on training #6988
Comments
Haven't tried to run it with PyTorch benchmarking script. Will do so next week. |
@zpcore, have you run into this issue on torchbench auto stack? |
Yes, checked our dashboard, most of the tests are failing today (v5p success reduce from 56 -> 11). Oh, but for the GPU (e.g., H100) run, even though it is not completed. I didn't see the failing yet. |
Yes. Interestingly, I did not encounter this issue with L4, only using A100. |
@zpcore It looks like it had something to do with: networkx/networkx#7028 |
🐛 Bug
Using the upstream benchmarking script, inductor training (all models) has been failing for a while for me. I tried creating a new docker environment, but the error didn't seem to be going away. Anyone else?
To Reproduce
I'm running the following command line:
Environment
cc @miladm @JackCaoG @vanbasten23 @frgossen @cota @golechwierowicz
The text was updated successfully, but these errors were encountered: