-
Notifications
You must be signed in to change notification settings - Fork 734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][CUDA] Add no-fast-math to tests that rely on it. #9889
Conversation
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
@JackAKirk , can you please merge latest |
Thanks, I've done this now. |
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
@cperkinsintel Would you be able to review this please? |
Modified E2E tests pass in the pre-commit. |
// REQUIRES: aspect-ext_oneapi_bfloat16_math_functions | ||
// RUN: %clangxx -fsycl -fsycl-targets=%{sycl_triple} %if any-device-is-cuda %{ -Xsycl-target-backend --cuda-gpu-arch=sm_80 %} %s -o %t.out | ||
// RUN: %clangxx -fsycl -fsycl-targets=%{sycl_triple} %if any-device-is-cuda %{ -Xsycl-target-backend --cuda-gpu-arch=sm_80 %} %s -o %t.out %{mathflags} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JackAKirk - I'm working with this test and on our slightly older shared CUDA dev machines sm_80
gets rejected. But with sm_75
the test both compiles and behaves as expected.
Can I just switch this to sm_75
? Or, better yet, given that we have a REQUIRES: aspect-ext_oneapi_bfloat16_math_functions
in this test, can the whole %if any-device-is-cuda ... %}
block be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test behaves differently depending on whether it is compiled for sm_xx>=sm_80 or not:
- sm_80 and above uses some native bfloat16 math instructions
- below sm_80 always uses generic impls
So I set it to compile with sm_80 flag because this is what the CI has, and allows testing of the native impls.
It is possible to remove the arch flag, and it is probably the best thing to do now that bfloat16 is generically supported, to avoid confusion. Unfortunately this means that the native impls won't be tested automatically via the CI. But I suppose we could test this in release testing.
ext_oneapi_bfloat16_math_functions
is really an artifact of earlier times when bfloat16 was not generically implemented for all devices: I think it should be removed.
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that, I think I'll have it test both and add a comment summarizing what you just said. Or, at minimum, the comment.
Same as #9419. This updates a few tests that were missed where the
fast-math
flag affects at least the cuda backend. These tests assumeno-fast-math
precision.