Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Types gt::bfloat16_t and gt::complex_bfloat16_t #283

Merged
merged 5 commits into from
Nov 29, 2023

Conversation

cmpfeil
Copy link
Contributor

@cmpfeil cmpfeil commented Nov 22, 2023

The types gt::bfloat16_t and gt::complex_bfloat16_t are provided as 1:1 copies of gt::float16_t and gt::complex_float16_t, respectively.

The 1:1 adaptions from [complex_]float16_t.h --> [complex_]bfloat16_t.h are:

  • cuda_fp16.h --> cuda_bf16.h
  • __half --> __nv_bfloat16
  • CUDA_ARCH >= 530 --> CUDA_ARCH >= 800
  • GTENSOR_ENABLE_FP16 --> GTENSOR_ENABLE_BF16

Similarly, test_[complex_]bfloat16_t.cxx is a copy of test_[complex]float16_t.cxx with any instance of float16_t replaced by bfloat16_t.

Tested on A100, built with modules gcc/11 and cuda/11.4 loaded as well as -DGTENSOR_DEVICE=cuda -DGTENSOR_ENABLE_BF16=ON -DCMAKE_CUDA_ARCHITECTURES=80 added to the cmake call.

Note: Obviously, it would be cleaner to have a class templated on storage_type and compute_type covering both, gt::float16_t and gt::bfloat16_t, at the same time (and possibly also a future gt::float8_t). However, as this is only a temporary solution until NVIDIA supports this within thrust, it might be good enough. Any opinions on this?

Any feedback is highly appreciated!

  + Is a 1:1 clone of gt::[complex_]float16_t
  + Only adaptions are
    * cuda_fp16.h         --> cuda_bf16.h
    * __half              --> __nv_bfloat16
    * CUDA_ARCH >= 530    --> CUDA_ARCH >= 800
    * GTENSOR_ENABLE_FP16 --> GTENSOR_ENABLE_BF16
@bd4
Copy link
Contributor

bd4 commented Nov 27, 2023

This looks fine to me. Some day we might have official clean support for all these types in CUDA/HIP/SYCL, but in the meantime just having something that works is very useful and I am not too concerned about making it pretty.

@cmpfeil cmpfeil marked this pull request as ready for review November 28, 2023 09:02
Copy link
Contributor

@germasch germasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the _ for private member variables, looks good to me. That is probably the case for float16_t as well (haven't checked), so could be updated in a subsequent PR.

The other comment I have is that it may be worth checking whether the duplication of the tests could be avoided (googletest does supported templated tests, though we're generally not taking advantage of that, but we probably should...)

Comment on lines +90 to +91
private:
storage_type x;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally use an underscore suffix for private member variables, so x_ here would be appreciated, as it serves as a hint that something is a member variable when it's used within class member functions, and it arguably makes x(x) in the constructor a bit less confusing.

@germasch germasch merged commit 99f3fa4 into wdmapp:main Nov 29, 2023
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants