Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hotfix: Fix offset calculation to prevent overflow if offset is really large #1298

Merged
merged 11 commits into from
Mar 28, 2023

Conversation

amcamd
Copy link
Contributor

@amcamd amcamd commented Mar 17, 2023

  • Contains:
    • Torre's fix to add offset to array pointers in call to RocblasContractionProblem
    • Andrew's trsm changes to avoid int32_t overflow in trsm offset calculations
    • Alex's tensile_tag.txt for changes in Tensile

Copy link
Contributor

@TorreZuk TorreZuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks good to me, and same changes as passing in develop. Only thing to watch is if how the Tensile hotfix get's merged first as may break the required tensile_tag.

@TorreZuk
Copy link
Contributor

@amcamd not sure how you have clang-format failure if you cherry-picked

@amcamd
Copy link
Contributor Author

amcamd commented Mar 17, 2023

I had to resolve a merge conflict, from ROCm 5.5 to now trtri_gemm_block has changed to rocblas_trtri_gemm_block. I needed to reverse the change. I guess that clang-format did not run on the commit after resolving the merge conflict. I will fix it.

Comment on lines 65 to 69
- { M: 46345, N: 4, lda: 46345, ldb: 46345 }
# - { M: 47000, N: 4, lda: 47000, ldb: 47000 } # calls rocblas_internal_gemm_template with batch_count=367, stride_a=6016128

- &size_t_right_matrix_size_range
- { M: 4, N: 46345, lda: 46345, ldb: 4 }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests timed out for me on gfx90a, but passed when I increased the timeout threshold. I'm testing all other configs as well just to be safe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@daineAMD Naveen's speedup for the initialization of the triangle matrix in https://github.com/ROCmSoftwarePlatform/rocBLAS-internal/pull/1551 . This PR is in the develop branch but it is not in release/rocm-rel-5.5. Tests will run faster on the develop branch than on release/rocm-rel-5.5.

@amcamd amcamd merged commit cdd561f into release/rocm-rel-5.5 Mar 28, 2023
yoichiyoshida pushed a commit to yoichiyoshida/rocBLAS that referenced this pull request May 10, 2023
* split out blas1_gtest to build faster
@amcamd amcamd deleted the hotfix-5.5-swdev-381033-trsm branch July 5, 2023 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants