Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed bug in precision logic #232

Merged
merged 1 commit into from
Oct 12, 2023
Merged

Fixed bug in precision logic #232

merged 1 commit into from
Oct 12, 2023

Conversation

SIR-unit
Copy link
Contributor

My change here aligns val() with the impimentation in upstream. Without my change, the data samples will ALWAYS be cast to half when run on CUDA. However, the model may not be half precision in this case as per the logic above this section (The "if training: ... else: ...", specifically lines 145&149). This means the code ALWAYS crashes when run on CUDA when not passing --half as the model weights will be normal precision, but all input will be half precision.

My change ensures that weights and input have same precision. Additionally, half precision is used if and only if --half is passed, the devices is not cpu, and the specific backend supports half precision

My change here aligns val() with the impimentation in upstream.  Without my change, the data samples will ALWAYS be cast to half when run on CUDA.  However, the model may not be half precision in this case as per the logic above this section (The "if training: ... else: ...", specifically lines 145&149).  This means the code ALWAYS crashes when run on CUDA when not passing --half as the model weights will be normal precision, but all input will be half precision.

My change ensures that weights and input have same precision.  Additionally, half precision is used if and only if --half is passed, the devices is not cpu, and the specific backend supports half precision
@SIR-unit
Copy link
Contributor Author

It may be worth mentioning that val.py will not crash if "--device cpu" is given, as that would make everything normal precision. This is the case for your test case of val, which is why your code has not been failing checks so far.

@fcakyon fcakyon merged commit ad9052a into fcakyon:main Oct 12, 2023
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants