Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][libdevice] Add fast_* in imf libdevice #10004

Merged
merged 4 commits into from
Jun 28, 2023

Conversation

jinge90
Copy link
Contributor

@jinge90 jinge90 commented Jun 21, 2023

No description provided.

Signed-off-by: jinge90 <ge.jin@intel.com>
@jinge90
Copy link
Contributor Author

jinge90 commented Jun 21, 2023

Hi, @xtian-github @akolesov-intel @zettai-reido
This PR aims to add "fast_" math functions to imf libdevice to correspond to NV libdevice "_nv_fast" functions. Currently, the mapping is following:
__nv_fast_exp10f(x)--------------->sycl::native::exp(log10 * x)
__nv_fast_expf(x)------------------>sycl::native::exp(x)
__nv_fast_fdividef(x)--------------->sycl::native::divide(x, y)
__nv_fast_log10f(x)---------------->sycl::native::log10(x)
__nv_fast_log2f(x)------------------->sycl::native::log2(x)
__nv_fast_log(x)--------------------->sycl::native::log(x)
__nv_fast_powf(x)------------------->sycl::naitve::powr(x)

Could you help reivew?
Thanks very much.

@jinge90 jinge90 temporarily deployed to aws June 21, 2023 02:21 — with GitHub Actions Inactive
}

static inline float __fast_fdividef(float x, float y) {
unsigned ybits = __builtin_bit_cast(unsigned, y);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @akolesov-intel and @zettai-reido
For __nv_fast_fdividef: https://docs.nvidia.com/cuda/libdevice-users-guide/__nv_fast_fdividef.html#__nv_fast_fdividef
NV has requirements for 2^126 < y < 2^128 which sycl native math doesn't have, the code below is to handle this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. We might optimize it in future updates: simplify the range check and use fast approximation while fdividef allows 2 ulp rather than correctly rounded x/y.

@jinge90 jinge90 temporarily deployed to aws June 21, 2023 04:20 — with GitHub Actions Inactive
Copy link

@xtian-github xtian-github left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

@zettai-reido zettai-reido left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve, but see two possible changes.

Usually the constants are spelled (in hexadecimal),
so there is no variety on how compiler treats it.

And floating point is continuous, especially without sign.
So >2^126 can be replaced with one comparison with hex constant too.

libdevice/device_imf.hpp Outdated Show resolved Hide resolved
unsigned xexp_bits = (xbits >> 23) & 0xFF;
unsigned yman_bits = ybits & 0x7F'FFFF;
unsigned xman_bits = xbits & 0x7F'FFFF;
if ((yexp_bits == 0xFD && yman_bits != 0) || (yexp_bits == 0xFE)) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(ybits > 0x7e80'0000) should do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link

@akolesov-intel akolesov-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you!

Copy link
Contributor

@AlexeySachkov AlexeySachkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sycl-post-link part LGTM

Signed-off-by: jinge90 <ge.jin@intel.com>
@jinge90 jinge90 temporarily deployed to aws June 25, 2023 03:51 — with GitHub Actions Inactive
@jinge90 jinge90 temporarily deployed to aws June 25, 2023 04:30 — with GitHub Actions Inactive
@jinge90 jinge90 temporarily deployed to aws June 25, 2023 06:28 — with GitHub Actions Inactive
@jinge90 jinge90 temporarily deployed to aws June 25, 2023 07:06 — with GitHub Actions Inactive
@jinge90
Copy link
Contributor Author

jinge90 commented Jun 27, 2023

Hi, @intel/llvm-reviewers-runtime
Could you help review this patch?
Thanks very much.

Copy link
Contributor

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Runtime changes LGTM!

@jinge90 jinge90 temporarily deployed to aws June 28, 2023 01:28 — with GitHub Actions Inactive
@jinge90 jinge90 temporarily deployed to aws June 28, 2023 02:08 — with GitHub Actions Inactive
@jinge90 jinge90 requested a review from a team June 28, 2023 02:39
@jinge90
Copy link
Contributor Author

jinge90 commented Jun 28, 2023

Hi, @intel/llvm-gatekeepers
Could you help review and merge this patch?
Thanks very much.

@steffenlarsen
Copy link
Contributor

Taking @AlexeySachkov's review as an approval.

@steffenlarsen steffenlarsen merged commit d96e507 into intel:sycl Jun 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants