Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] Fix sycl::vec::convert<> to allow conversion to and from sycl::vec of bfloat16 type to that of other data types #14105

Merged
merged 25 commits into from
Jun 21, 2024

Conversation

uditagarwal97
Copy link
Contributor

Follow-up of and blocked by: #14085

After this change:
On host, conversion between vec<bfloat16> and vec<float> will happen element-by-element. While on device, we'll use Spirv intrinsic OpConvertFToBF16INTEL and OpConvertBF16ToFINTEL (https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_bfloat16_conversion.asciidoc) for vector conversion.

@uditagarwal97 uditagarwal97 changed the title [SYCL] Fix sycl::vec::convert<> to allow conversion between sycl::vec of float and bfloat16 type [SYCL] Fix sycl::vec::convert<> to allow conversion between sycl::vec of float and bfloat16 types Jun 9, 2024
@uditagarwal97 uditagarwal97 self-assigned this Jun 9, 2024
@uditagarwal97 uditagarwal97 marked this pull request as ready for review June 12, 2024 20:07
@uditagarwal97 uditagarwal97 requested review from a team as code owners June 12, 2024 20:07
@uditagarwal97 uditagarwal97 marked this pull request as draft June 12, 2024 20:46
@uditagarwal97
Copy link
Contributor Author

Converted this PR back to draft to:
(1) Accommodate changes in vec::convert after #14130 gets merged.
(2) Further simplify the changes in this PR after adding BF16 to uint16 conversion to detail::convertToOpenCLType

detail::convertImpl<T, R, roundingMode, 1, OpenCLT, OpenCLR>(
vec_data<DataT>::get(getValue(I)))));
// For float -> bf16.
if constexpr (isFloatToBF16Conv) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

detail::convertImpl<> expects OpenCL type as input and returns the OpenCL type corresponding to convertT. In the case of BF16, the OpenCL type will be uint16 for device and bfloat16 on host.
However, currently, vec_data<bfloat16>::get() returns bfloat16 value on both device and host.

As a workaround to this, I've added explicit if constexpr for BF16 <--> float conversion. A proper fix would require more if conditions/if defs, which IMO, is not worth it since we will anyway be replacing vector.hpp with vector_poreview.hpp soon.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we can just refactor the entire convertImpl, if you have a good plan/picture for that.

std::is_same_v<DataT, bfloat16> && std::is_same_v<convertT, float>;
if constexpr (isFloatToBF16Conv || isBF16ToFloatConv) {
static_assert(roundingMode == rounding_mode::automatic ||
roundingMode == rounding_mode::rte);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a message to this static assert to explicitly say that not all rounding modes are supported for bfloat16?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Fixed in 8a6caf1

Comment on lines 211 to 212
template <typename NativeBFT, typename NativeFloatT, int VecSize>
inline NativeFloatT ConvertBF16ToF(NativeBFT val) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can NativeFloatT be anything other than float?

Copy link
Contributor Author

@uditagarwal97 uditagarwal97 Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On host, no. NativeFloatT is always float.

// On host, ensure that we don't convert BF16 to uint16 for conversion.
static_assert(std::is_same_v<NativeBFT, sycl::ext::oneapi::bfloat16>);

return (NativeFloatT)val;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use C-style casts.

@@ -498,6 +528,51 @@ __SYCL_FLOAT_FLOAT_CONVERT_FOR_TYPE(double)
#undef __SYCL_FLOAT_FLOAT_CONVERT
#undef __SYCL_FLOAT_FLOAT_CONVERT_FOR_TYPE

template <typename NativeBFT, typename NativeFloatT, int VecSize>
inline NativeFloatT ConvertBF16ToF(NativeBFT vec) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the scalar case, are we going vec<bf16,1> -> operator[] -> cast_to_ushort->cast back to bf16 -> convert to float here + in the caller? Do you think it still makes sense after we changed storage type in vec?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that convertImpl accepts native OpenCL type for device, whether it is uint16 (For vec<bfloat, 1>) or uint16 ext_vector_type() (For vec<bfloat, N>).
I had to do the casts to provide a unified interface for vec::convert (to use convertImpl), plus I expect compiler to get rid of these extra casts.

A long term solution, would be to refactor convertImpl entirely but that is tangential to this PR.

Comment on lines 797 to 798
// Currently, for BF16 <--> float conversion, we only support
// Round-to-even rounding mode.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect that bfloat maps precisely onto floats, so that direction should "support" all the rounding modes. Am I wrong here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, there can not be a 1:1 mapping between float and bfloat as bfloat has only 8-bit mantissa while float as 24-bit mantissa. The default rounding mode is RTE(https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Env.html#_rounding_modes_for_conversions) for floating point to floating point conversion.

detail::convertImpl<T, R, roundingMode, 1, OpenCLT, OpenCLR>(
vec_data<DataT>::get(getValue(I)))));
// For float -> bf16.
if constexpr (isFloatToBF16Conv) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we can just refactor the entire convertImpl, if you have a good plan/picture for that.

@uditagarwal97
Copy link
Contributor Author

@cperkinsintel Since @aelovikov-intel is OOO, could you help review this PR?
I have made the following changes since @aelovikov-intel's last review:

  1. Except double, now we can convert sycl::vec of all the types to sycl::vec<bfloat16> and vice versa, with all rounding modes supported. For double, we only support RTE rounding mode.
  2. On Intel HW's, I used __imf_ builtins for element-by-element conversion between bfloat16 and other data types. For non-Intel HWs and host, I've added a helper class ConvertToBfloat16 to ext/oneapi/bfloat16.hpp to facilitate conversion to/from bfloat16 with different rounding modes.
  3. For conversion between sycl::vec<float> and sycl::vec<bfloat16> when RTE rounding mode is used, I used OpConvertFToBF16INTEL and OpConvertBF16ToFINTEL for optimized vectorized conversion. For other rounding modes, we default to imf_ builtins.

@uditagarwal97 uditagarwal97 changed the title [SYCL] Fix sycl::vec::convert<> to allow conversion between sycl::vec of float and bfloat16 types [SYCL] Fix sycl::vec::convert<> to allow conversion to and from sycl::vec of bfloat16 type to that of other data types Jun 20, 2024
roundingMode == SYCLRoundingMode::rte,
"Only automatic/RTE rounding mode is supported for double type.");
return getBFloat16FromDoubleWithRoundingMode(a, roundingMode);
}
Copy link
Contributor

@cperkinsintel cperkinsintel Jun 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a possibility of other floating types besides float and double? Half? Should there be a std::is_floating_point<T> clause for the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. I've added the clause for half as well.

Copy link
Contributor

@cperkinsintel cperkinsintel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, had one question.

@uditagarwal97
Copy link
Contributor Author

@intel/llvm-gatekeepers the PR is ready to be merged!

@againull againull merged commit 02c6bba into intel:sycl Jun 21, 2024
14 checks passed
againull pushed a commit that referenced this pull request Jun 25, 2024
Followup and blocked by: #14105

Currently, `vec<bfloat>` math builtins do element-by-element operations.
This PR optimize `vec<bfloat>` math builtins by:
(1) Converting `vec<bfloat>` to `vec<float>`.
(2) Do the operation on `vec<float>` (which uses Spirv built-ins
underneath for optimized vector operations).
(3) Convert back the return value to `vec<bfloat>`.

Look at the beautiful diff in
`check_device_code/vector/vector_bf16_builtins.cpp` to visualize the
device code generated before and after this optimization.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants