We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cuBLAS tests running Gemm with half precision can fail with wrong results on A100.
Using the tip of develop as of today (6923d40).
Using A100 with the DPC++ release 2024.2.0 and the associated Codeplay Nvidia plugin. The CUDA version is 12.6.2, OS is Ubuntu 22.04.
cmake -Bbuild-a100 -GNinja -DCMAKE_CXX_COMPILER=`which icpx` -DENABLE_MKLCPU_BACKEND=OFF -DENABLE_MKLGPU_BACKEND=OFF -DENABLE_CUBLAS_BACKEND=ON -DENABLE_CURAND_BACKEND=ON -DENABLE_CUSOLVER_BACKEND=ON -DENABLE_CUFFT_BACKEND=ON -DREF_BLAS_ROOT=/path/to/lapack/install -DREF_LAPACK_ROOT=/path/to/lapack/install . cd build-a100 ninja ctest -R ".*GemmUsmTests.*Half.*" --output-on-failure
Full log: log_a100.txt Short extract:
[ RUN ] GemmUsmTestSuite/GemmUsmTests.HalfHalfFloatPrecision/Column_Major_NVIDIA_A100_PCIE_40GB relative error = 0.496206 absolute error = 0.382722 limit = 0.00010848 Difference in entry (0,0): DPC++ 0.388574 vs. Reference 0.771296 relative error = 1.36303 absolute error = 1.67412 limit = 0.00010848 Difference in entry (1,0): DPC++ 0.445891 vs. Reference -1.22823 relative error = 1.05343 absolute error = 0.664006 limit = 0.00010848 Difference in entry (2,0): DPC++ 0.0336805 vs. Reference -0.630325 relative error = 1.0674 absolute error = 0.514821 limit = 0.00010848 Difference in entry (3,0): DPC++ 0.0325077 vs. Reference -0.482313 relative error = 0.789876 absolute error = 0.992507 limit = 0.00010848 Difference in entry (4,0): DPC++ -0.264029 vs. Reference -1.25654 relative error = 0.925093 absolute error = 1.07784 limit = 0.00010848
The differences between the output and reference seem too large to be due to a precision issue.
The tests should pass.
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Summary
cuBLAS tests running Gemm with half precision can fail with wrong results on A100.
Version
Using the tip of develop as of today (6923d40).
Environment
Using A100 with the DPC++ release 2024.2.0 and the associated Codeplay Nvidia plugin. The CUDA version is 12.6.2, OS is Ubuntu 22.04.
Steps to reproduce
Observed behavior
Full log: log_a100.txt
Short extract:
The differences between the output and reference seem too large to be due to a precision issue.
Expected behavior
The tests should pass.
The text was updated successfully, but these errors were encountered: