-
Notifications
You must be signed in to change notification settings - Fork 734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][libdevice] Add fast_* in imf libdevice #10004
Conversation
Signed-off-by: jinge90 <ge.jin@intel.com>
Hi, @xtian-github @akolesov-intel @zettai-reido Could you help reivew? |
} | ||
|
||
static inline float __fast_fdividef(float x, float y) { | ||
unsigned ybits = __builtin_bit_cast(unsigned, y); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @akolesov-intel and @zettai-reido
For __nv_fast_fdividef: https://docs.nvidia.com/cuda/libdevice-users-guide/__nv_fast_fdividef.html#__nv_fast_fdividef
NV has requirements for 2^126 < y < 2^128 which sycl native math doesn't have, the code below is to handle this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. We might optimize it in future updates: simplify the range check and use fast approximation while fdividef allows 2 ulp rather than correctly rounded x/y.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I approve, but see two possible changes.
Usually the constants are spelled (in hexadecimal),
so there is no variety on how compiler treats it.
And floating point is continuous, especially without sign.
So >2^126 can be replaced with one comparison with hex constant too.
libdevice/device_imf.hpp
Outdated
unsigned xexp_bits = (xbits >> 23) & 0xFF; | ||
unsigned yman_bits = ybits & 0x7F'FFFF; | ||
unsigned xman_bits = xbits & 0x7F'FFFF; | ||
if ((yexp_bits == 0xFD && yman_bits != 0) || (yexp_bits == 0xFE)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(ybits > 0x7e80'0000) should do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sycl-post-link
part LGTM
Signed-off-by: jinge90 <ge.jin@intel.com>
Hi, @intel/llvm-reviewers-runtime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Runtime changes LGTM!
Hi, @intel/llvm-gatekeepers |
Taking @AlexeySachkov's review as an approval. |
No description provided.