-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hotfix: Fix offset calculation to prevent overflow if offset is really large #1298
Conversation
amcamd
commented
Mar 17, 2023
- Contains:
- Torre's fix to add offset to array pointers in call to RocblasContractionProblem
- Andrew's trsm changes to avoid int32_t overflow in trsm offset calculations
- Alex's tensile_tag.txt for changes in Tensile
* works around tensile side offset size limitations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All looks good to me, and same changes as passing in develop. Only thing to watch is if how the Tensile hotfix get's merged first as may break the required tensile_tag.
@amcamd not sure how you have clang-format failure if you cherry-picked |
I had to resolve a merge conflict, from ROCm 5.5 to now trtri_gemm_block has changed to rocblas_trtri_gemm_block. I needed to reverse the change. I guess that clang-format did not run on the commit after resolving the merge conflict. I will fix it. |
clients/gtest/trsm_gtest.yaml
Outdated
- { M: 46345, N: 4, lda: 46345, ldb: 46345 } | ||
# - { M: 47000, N: 4, lda: 47000, ldb: 47000 } # calls rocblas_internal_gemm_template with batch_count=367, stride_a=6016128 | ||
|
||
- &size_t_right_matrix_size_range | ||
- { M: 4, N: 46345, lda: 46345, ldb: 4 } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests timed out for me on gfx90a, but passed when I increased the timeout threshold. I'm testing all other configs as well just to be safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@daineAMD Naveen's speedup for the initialization of the triangle matrix in https://github.com/ROCmSoftwarePlatform/rocBLAS-internal/pull/1551 . This PR is in the develop branch but it is not in release/rocm-rel-5.5. Tests will run faster on the develop branch than on release/rocm-rel-5.5.
* split out blas1_gtest to build faster