[cuBLAS] Gemm tests using half can fail #599

Rbiessy · 2024-10-21T14:53:07Z

Summary

cuBLAS tests running Gemm with half precision can fail with wrong results on A100.

Version

Using the tip of develop as of today (6923d40).

Environment

Using A100 with the DPC++ release 2024.2.0 and the associated Codeplay Nvidia plugin. The CUDA version is 12.6.2, OS is Ubuntu 22.04.

Steps to reproduce

cmake -Bbuild-a100 -GNinja -DCMAKE_CXX_COMPILER=`which icpx` -DENABLE_MKLCPU_BACKEND=OFF -DENABLE_MKLGPU_BACKEND=OFF -DENABLE_CUBLAS_BACKEND=ON -DENABLE_CURAND_BACKEND=ON -DENABLE_CUSOLVER_BACKEND=ON -DENABLE_CUFFT_BACKEND=ON -DREF_BLAS_ROOT=/path/to/lapack/install -DREF_LAPACK_ROOT=/path/to/lapack/install .
cd build-a100
ninja
ctest -R ".*GemmUsmTests.*Half.*" --output-on-failure

Observed behavior

Full log: log_a100.txt
Short extract:

[ RUN      ] GemmUsmTestSuite/GemmUsmTests.HalfHalfFloatPrecision/Column_Major_NVIDIA_A100_PCIE_40GB
relative error = 0.496206 absolute error = 0.382722 limit = 0.00010848
Difference in entry (0,0): DPC++ 0.388574 vs. Reference 0.771296
relative error = 1.36303 absolute error = 1.67412 limit = 0.00010848
Difference in entry (1,0): DPC++ 0.445891 vs. Reference -1.22823
relative error = 1.05343 absolute error = 0.664006 limit = 0.00010848
Difference in entry (2,0): DPC++ 0.0336805 vs. Reference -0.630325
relative error = 1.0674 absolute error = 0.514821 limit = 0.00010848
Difference in entry (3,0): DPC++ 0.0325077 vs. Reference -0.482313
relative error = 0.789876 absolute error = 0.992507 limit = 0.00010848
Difference in entry (4,0): DPC++ -0.264029 vs. Reference -1.25654
relative error = 0.925093 absolute error = 1.07784 limit = 0.00010848

The differences between the output and reference seem too large to be due to a precision issue.

Expected behavior

The tests should pass.

The text was updated successfully, but these errors were encountered:

Rbiessy added bug A request to fix an issue BLAS domain BLAS domain issue/request labels Oct 21, 2024

Rbiessy mentioned this issue Oct 21, 2024

Rename oneMKL Interface to oneMath #602

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[cuBLAS] Gemm tests using half can fail #599

[cuBLAS] Gemm tests using half can fail #599

Rbiessy commented Oct 21, 2024

[cuBLAS] Gemm tests using half can fail #599

[cuBLAS] Gemm tests using half can fail #599

Comments

Rbiessy commented Oct 21, 2024

Summary

Version

Environment

Steps to reproduce

Observed behavior

Expected behavior