You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a limitation of the current implementation: only arrays whose size is a multiple of the threadblock size (e.g. (M = 128, N = 128, K = 64) for WMMA mixed-precision) are supported at the moment.
One way to support arbitrary matrix dimensions would be to predicate the loads from global memory to only access elements inside the bounds of the global matrix.
The
BLAS.gemmEx!
function errors on array size<128
. I want to experiment the functionality on small array size like 16x16.Error message
Also, the computed value is not correct if the array size is not an exponent of
The text was updated successfully, but these errors were encountered: