Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cuBLAS] Gemm tests using half can fail #599

Open
Rbiessy opened this issue Oct 21, 2024 · 0 comments
Open

[cuBLAS] Gemm tests using half can fail #599

Rbiessy opened this issue Oct 21, 2024 · 0 comments
Labels
BLAS domain BLAS domain issue/request bug A request to fix an issue

Comments

@Rbiessy
Copy link
Contributor

Rbiessy commented Oct 21, 2024

Summary

cuBLAS tests running Gemm with half precision can fail with wrong results on A100.

Version

Using the tip of develop as of today (6923d40).

Environment

Using A100 with the DPC++ release 2024.2.0 and the associated Codeplay Nvidia plugin. The CUDA version is 12.6.2, OS is Ubuntu 22.04.

Steps to reproduce

cmake -Bbuild-a100 -GNinja -DCMAKE_CXX_COMPILER=`which icpx` -DENABLE_MKLCPU_BACKEND=OFF -DENABLE_MKLGPU_BACKEND=OFF -DENABLE_CUBLAS_BACKEND=ON -DENABLE_CURAND_BACKEND=ON -DENABLE_CUSOLVER_BACKEND=ON -DENABLE_CUFFT_BACKEND=ON -DREF_BLAS_ROOT=/path/to/lapack/install -DREF_LAPACK_ROOT=/path/to/lapack/install .
cd build-a100
ninja
ctest -R ".*GemmUsmTests.*Half.*" --output-on-failure

Observed behavior

Full log: log_a100.txt
Short extract:

[ RUN      ] GemmUsmTestSuite/GemmUsmTests.HalfHalfFloatPrecision/Column_Major_NVIDIA_A100_PCIE_40GB
relative error = 0.496206 absolute error = 0.382722 limit = 0.00010848
Difference in entry (0,0): DPC++ 0.388574 vs. Reference 0.771296
relative error = 1.36303 absolute error = 1.67412 limit = 0.00010848
Difference in entry (1,0): DPC++ 0.445891 vs. Reference -1.22823
relative error = 1.05343 absolute error = 0.664006 limit = 0.00010848
Difference in entry (2,0): DPC++ 0.0336805 vs. Reference -0.630325
relative error = 1.0674 absolute error = 0.514821 limit = 0.00010848
Difference in entry (3,0): DPC++ 0.0325077 vs. Reference -0.482313
relative error = 0.789876 absolute error = 0.992507 limit = 0.00010848
Difference in entry (4,0): DPC++ -0.264029 vs. Reference -1.25654
relative error = 0.925093 absolute error = 1.07784 limit = 0.00010848

The differences between the output and reference seem too large to be due to a precision issue.

Expected behavior

The tests should pass.

@Rbiessy Rbiessy added bug A request to fix an issue BLAS domain BLAS domain issue/request labels Oct 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BLAS domain BLAS domain issue/request bug A request to fix an issue
Projects
None yet
Development

No branches or pull requests

1 participant