-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[torchbench] speech_transformer
fails to run on dynamo.
#6831
Comments
We noticed those model floating point precision mismatch failures are related to #6669, where the |
Thanks for reporting the issue. We should do type cast to match the precision in the op lowering itself. @bhavya01, we probably to need do an explicit floating type check and promotion in the op. Could you take a look when you get the time? |
@ysiraichi @zpcore Is it possible for you to pin point where is this happening in the code. AFAIK, we promote types in the div op. A simple script like shows that this should work
|
Thanks @bhavya01 for the update. I retested with the following command:
, and compared the HLO difference between 03/12 (last version no issue) and 03/28. Some foundings here: HLO for 03/12 (Pass):
HLO for 03/28 (ERROR):
We can see that Check further, I notice that we are missing the op
However, in 03/28 (ERROR), we only have:
Basically every time we call divide with data type I don't know why we also cast into BF16 originally with native operator |
I think that #6873 should fix the issue. This PR fixes the shapes in XLA node for the div op. |
Thanks @bhavya01 for the head up, I notice that the PR will call promoteType, it will force the two operator to return the same type. That probably explains where the |
Manually tested with #6873 and the test passed. Will close the issue for now. Thanks! |
🐛 Bug
Running the upstreamed benchmarking scripts with the following command results in an unexpected error.
python xla/benchmarks/experiment_runner.py \ --suite-name torchbench \ --accelerator cuda \ --xla PJRT \ --dynamo openxla \ --test train --test eval \ --repeat 8 --iterations-per-run 1 \ --print-subprocess \ --no-resume -k speech_transformer
Environment
cc @miladm @JackCaoG @vanbasten23 @zpcore @frgossen @golechwierowicz @cota
The text was updated successfully, but these errors were encountered: