Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] Use pair of native::sin/cos for sincos under __FAST_MATH__ #10481

Merged
merged 1 commit into from
Aug 8, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 19 additions & 2 deletions sycl/include/sycl/builtins.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -734,8 +734,8 @@ std::enable_if_t<__FAST_MATH_GENFLOAT(T), T> sin(T x) __NOEXC {

// svgenfloat sincos (svgenfloat x, genfloatptr cosval)
template <typename T, typename T2>
std::enable_if_t<
detail::is_svgenfloat<T>::value && detail::is_genfloatptr<T2>::value, T>
std::enable_if_t<__FAST_MATH_GENFLOAT(T) && detail::is_genfloatptr<T2>::value,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure where __FAST_MATH_WITH_GENFLOAT comes from or what it is, but is there any chance that these two enable_if might both be true at the same time and then result in an ambiguous call? Do you need to ensure they are mutuallly exclusive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to do it exactly the same as done for sin/cos/...

T>
sincos(T x, T2 cosval) __NOEXC {
detail::check_vector_size<T, T2>();
return __sycl_std::__invoke_sincos<T>(x, cosval);
Expand Down Expand Up @@ -2500,6 +2500,23 @@ std::enable_if_t<detail::is_svgenfloatf<T>::value, T> cos(T x) __NOEXC {
return native::cos(x);
}

// svgenfloat sincos (svgenfloat x, genfloatptr cosval)
// This is a performance optimization to ensure that sincos isn't slower than a
// pair of sin/cos executed separately. Theoretically, calling non-native sincos
// might be faster than calling native::sin plus native::cos separately and we'd
// need some kind of cost model to make the right decision (and move this
// entirely to the JIT/AOT compilers). However, in practice, this simpler
// solution seems to work just fine and matches how sin/cos above are optimized
// for the fast math path.
template <typename T, typename T2>
std::enable_if_t<
detail::is_svgenfloatf<T>::value && detail::is_genfloatptr<T2>::value, T>
sincos(T x, T2 cosval) __NOEXC {
detail::check_vector_size<T, T2>();
*cosval = native::cos(x);
return native::sin(x);
}

// svgenfloatf exp (svgenfloatf x)
template <typename T>
std::enable_if_t<detail::is_svgenfloatf<T>::value, T> exp(T x) __NOEXC {
Expand Down