Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MetaSchedule] Tile and pack intermediate output for CUDA TensorCore #14108

Merged
merged 15 commits into from
Mar 6, 2023

Conversation

vinx13
Copy link
Member

@vinx13 vinx13 commented Feb 23, 2023

This PR changes the meta schedule rule for CUDA tensor core. It adds additional tiling and layout transformation (pack the buffer by 16x16 accumulator shape) of the output in the shared memory. By transforming the buffer to higher rank with innermost shape of 16x16, it avoids strided access that CompactBufferRegion and compute_at cannot handle and reduces shared memory usage for the output

@tvm-bot
Copy link
Collaborator

tvm-bot commented Feb 23, 2023

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

Generated by tvm-bot

@vinx13 vinx13 force-pushed the feat/ms-pack-shared-mem-output-1 branch 2 times, most recently from 112c99b to 92e3a8c Compare February 24, 2023 03:59
@vinx13 vinx13 force-pushed the feat/ms-pack-shared-mem-output-1 branch from 92e3a8c to 6767475 Compare March 3, 2023 23:47
@vinx13 vinx13 marked this pull request as ready for review March 4, 2023 00:45
@vinx13 vinx13 requested a review from Hzfengsy March 4, 2023 00:49
src/tir/schedule/ir_comparator.h Outdated Show resolved Hide resolved
Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn>
@Hzfengsy Hzfengsy merged commit 424c749 into apache:main Mar 6, 2023
@Hzfengsy
Copy link
Member

Hzfengsy commented Mar 6, 2023

Thanks @vinx13

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants