Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add safety check so that TransposeBigMLFloat16 test passes #77

Merged

Conversation

sstamenk
Copy link

Added a check that queries hipGetDeviceProperties and returns false if gridDim.y is larger than the device maximum supported gridDim.y.

This check existed inside CUDA EP but was absent from ROCM EP and as a result TransposeBigMLFloat16 test was failing.

…ap kernel so that TransposeBigMLFloat16 test passes
@TedThemistokleous
Copy link

fix format using their lintrunner -a tool

@sstamenk sstamenk changed the title Added maximum gridDim.y overflow check before calling transposeNoOverlap kernel so that TransposeBigMLFloat16 test passes Add safety check to that TransposeBigMLFloat16 test passes Nov 18, 2024
@@ -470,7 +470,7 @@ inline rocblas_status rocblasTransposeHelper(hipStream_t /*stream*/, rocblas_han
return rocblas_dgeam(handle, transa, transb, m, n, alpha, A, lda, beta, B, ldb, C, ldc);
}

inline bool CanUse_rocblasTransposeHelper_MLFloat16(int /*m*/, int /*n*/) { return true; } // CUDA has a limited grid size of 65536, ROCm has higher limits.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you remove the inline here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the implementation of the function inside fpgeneric.cu to mirror the way it was done inside CUDA EP.

@TedThemistokleous TedThemistokleous merged commit 061c493 into rocm6.3_internal_testing Nov 19, 2024
10 of 15 checks passed
@TedThemistokleous
Copy link

Upstreaming this.

@sstamenk sstamenk deleted the fix_transpose_big_mlfloat16_test branch November 19, 2024 16:02
@sstamenk sstamenk changed the title Add safety check to that TransposeBigMLFloat16 test passes Add safety check so that TransposeBigMLFloat16 test passes Nov 20, 2024
TedThemistokleous pushed a commit that referenced this pull request Jan 3, 2025
* Added maximum gridDim.y overflow heck before calling transposeNoOverlap kernel so that TransposeBigMLFloat16 test passes

* Fix formatting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants