[MetaSchedule] Tile and pack intermediate output for CUDA TensorCore #14108

vinx13 · 2023-02-23T22:52:11Z

This PR changes the meta schedule rule for CUDA tensor core. It adds additional tiling and layout transformation (pack the buffer by 16x16 accumulator shape) of the output in the shared memory. By transforming the buffer to higher rank with innermost shape of 16x16, it avoids strided access that CompactBufferRegion and compute_at cannot handle and reduces shared memory usage for the output

tvm-bot · 2023-02-23T22:52:14Z

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

cc @ibsidorenko _{See #10317 for details}

_{Generated by tvm-bot}

src/tir/schedule/ir_comparator.h

Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn>

Hzfengsy · 2023-03-06T11:24:47Z

Thanks @vinx13

vinx13 force-pushed the feat/ms-pack-shared-mem-output-1 branch 2 times, most recently from 112c99b to 92e3a8c Compare February 24, 2023 03:59

vinx13 added 13 commits March 3, 2023 15:00

[MetaSchedule] Tile and pack intermediate output for CUDA TensorCore

f18009f

clean up schedule rule mltc

a661835

add lhs analyzer

596b83f

prevent simplifying single point

d85b771

clean up

270dd5e

lint

86d4002

fix rewrite_tensorize test

8f4f32b

fix software pipeline test

f642ff2

fix compile on mac

088e514

fix test cases

ad27f82

remove unused

27e2db9

rebase

abac67c

only use json format for roundtrip

6767475

vinx13 force-pushed the feat/ms-pack-shared-mem-output-1 branch from 92e3a8c to 6767475 Compare March 3, 2023 23:47

vinx13 marked this pull request as ready for review March 4, 2023 00:45

lint

aefb25c

vinx13 requested a review from Hzfengsy March 4, 2023 00:49

Hzfengsy approved these changes Mar 4, 2023

View reviewed changes

src/tir/schedule/ir_comparator.h Outdated Show resolved Hide resolved

Update src/tir/schedule/ir_comparator.h

cbad45b

Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn>

Hzfengsy merged commit 424c749 into apache:main Mar 6, 2023

vinx13 mentioned this pull request Mar 6, 2023

[Bug][MetaSchedule] Failed to tune fp16 dense_add workload of some shapes on cuda #14137

Closed

ysh329 mentioned this pull request Apr 17, 2023

[Release] v0.12.0 Release Candidate Notes #14645

Closed

LeiWang1999 mentioned this pull request Feb 13, 2024

[TIR] Enhance and fix tensorize schedule for some case #16560

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MetaSchedule] Tile and pack intermediate output for CUDA TensorCore #14108

[MetaSchedule] Tile and pack intermediate output for CUDA TensorCore #14108

vinx13 commented Feb 23, 2023 •

edited

Loading

tvm-bot commented Feb 23, 2023 •

edited

Loading

Hzfengsy commented Mar 6, 2023

[MetaSchedule] Tile and pack intermediate output for CUDA TensorCore #14108

[MetaSchedule] Tile and pack intermediate output for CUDA TensorCore #14108

Conversation

vinx13 commented Feb 23, 2023 • edited Loading

tvm-bot commented Feb 23, 2023 • edited Loading

Hzfengsy commented Mar 6, 2023

vinx13 commented Feb 23, 2023 •

edited

Loading

tvm-bot commented Feb 23, 2023 •

edited

Loading