Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix triangular solvers on Windows CUDA #1665

Merged
merged 9 commits into from
Aug 14, 2024
Merged

Conversation

upsj
Copy link
Member

@upsj upsj commented Aug 11, 2024

The alignment of uninitialized_array was currently only 1 based on unsigned char, and the whole setup screams undefined behavior to me, so this makes it much safer for our current use cases (since casting between complex and real types is well-defined).

Additionally, MSVC doesn't properly find isnan as a device function, so we need to add our own implementation.

@upsj upsj added the 1:ST:ready-for-review This PR is ready for review label Aug 11, 2024
@upsj upsj requested a review from a team August 11, 2024 12:53
@upsj upsj self-assigned this Aug 11, 2024
@ginkgo-bot ginkgo-bot added mod:cuda This is related to the CUDA module. type:matrix-format This is related to the Matrix formats mod:hip This is related to the HIP module. labels Aug 11, 2024
Copy link
Member

@yhmtsai yhmtsai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering whether jacboi using any type whose size is not divisible by 4 for shared memory may lead an issue here. But Jacobi seems not to use it.

@upsj upsj added 1:ST:ready-to-merge This PR is ready to merge. 1:ST:no-changelog-entry Skip the wiki check for changelog update and removed 1:ST:ready-for-review This PR is ready for review labels Aug 12, 2024
@upsj upsj changed the title Fix uninitialized_array alignment Fix triangular solvers on Windows CUDA Aug 12, 2024
@upsj upsj force-pushed the uninitialized_array_alignment branch from acfe081 to 5fb42a1 Compare August 12, 2024 14:57
include/ginkgo/core/base/math.hpp Outdated Show resolved Hide resolved
Copy link
Member

@thoasm thoasm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to have the is_nan_exact function moved to the TRS kernel file to avoid future misuse.

include/ginkgo/core/base/math.hpp Outdated Show resolved Hide resolved
include/ginkgo/core/base/math.hpp Outdated Show resolved Hide resolved
include/ginkgo/core/base/math.hpp Outdated Show resolved Hide resolved
@upsj upsj requested a review from thoasm August 12, 2024 20:19
@upsj upsj removed the 1:ST:no-changelog-entry Skip the wiki check for changelog update label Aug 12, 2024
include/ginkgo/core/base/math.hpp Show resolved Hide resolved
Comment on lines -1226 to +1230
return std::isnan(value);
using std::isnan;
return isnan(value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is std::isnan(value) different from the updated one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, but this has a better chance of using the CUDA isnan function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunately it still doesn't work with MSVC, which means I'll have to file a bug report some time ;)

Copy link
Member

@thoasm thoasm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for applying my suggestions.
I only have a minor nit regarding the documentation; the rest looks good!

cuda/solver/common_trs_kernels.cuh Outdated Show resolved Hide resolved
Comment on lines -1226 to +1230
return std::isnan(value);
using std::isnan;
return isnan(value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, but this has a better chance of using the CUDA isnan function.

cuda/solver/common_trs_kernels.cuh Outdated Show resolved Hide resolved
MSVC doesn't treat the is_nan properly, so we do a byte comparison instead
- move is_nan_exact to internal code
- rename to float_to_uint_impl
- increase tolerance for additional test
For things to work in Windows, the shared libraries need to be in the working directory or PATH
Co-authored-by: Thomas Grützmacher <thomas.gruetzmacher@tum.de>
@upsj upsj force-pushed the uninitialized_array_alignment branch from 4734f20 to 95fe3f4 Compare August 14, 2024 07:00
@upsj upsj merged commit 2c06c8a into develop Aug 14, 2024
10 of 14 checks passed
@upsj upsj deleted the uninitialized_array_alignment branch August 14, 2024 10:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1:ST:ready-to-merge This PR is ready to merge. mod:cuda This is related to the CUDA module. mod:hip This is related to the HIP module. type:matrix-format This is related to the Matrix formats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants