[TIR] Fix of inter thread reduction with shared memory prefetch #16406
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a fix of
LowerCrossThreadReduction
: The pass will remove all the loops with thread bind under the inter thread reduction block, which will introduce some issues when we meet the case where there could be other non-reduction blocks under the reduction thread.Before removing a thread-bound loop, check if the block(s) under this loop has reduction block var. If the block(s) under have reduction do not have any reduction block var, it means that block is not reduction, and therefore this thread-bound loop should be kept. Otherwise, we remove the thread-bound loop as usual.
related discussion: https://discuss.tvm.apache.org/t/missing-thread-bind-loops-under-block-reduction-when-transformed-with-tir/16232/6
Please CC @MasterJH5574