Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix MaxPool Block / Width Sharding with Large Kernels / Wide Reductions #14531

Merged
merged 11 commits into from
Nov 8, 2024

Conversation

wransom-TT
Copy link
Contributor

@wransom-TT wransom-TT commented Oct 31, 2024

For Block and Width Sharding with large kernels, Max Pool produced incorrect results. This has been fixed, plus support has been added for large kernels with wide reductions which was previously untested.

Ticket

Link to Github Issue

Problem description

When block and width sharding functionality was introduced the large reader and compute kernels were not updated and these cases were not covered by test cases. Additionally, the large kernels did not support wide reductions.

What's changed

  • Test cases have been expanded to include large kernels, wide reductions, and more strides for the standard height/width/block sharding tests
  • The large compute and reader kernels have been updated to support width and block sharding
  • The large compute and reader kernels have been updated to support wide reductions
  • All the max pool kernels have been refactored slightly to achieve more alignments between the large / wide / standard kernel implementations

Checklist

  • Post commit CI passes
  • [N/A] Blackhole Post commit (if applicable)
  • [N/A] Model regression CI testing passes (if applicable)
  • [N/A] Device performance regression CI testing passes (if applicable)
  • New/Existing tests provide coverage for changes

@wransom-TT wransom-TT marked this pull request as ready for review November 2, 2024 05:36
if (is_large_kernel) {
uint32_t max_pool_partials_cb_id = tt::CB::c_intermed1; // max_pool partials
uint32_t max_pool_partials_cb_pagesize = in_cb_sz;
uint32_t max_pool_partials_cb_pagesize = out_cb_pagesize;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment, This variable should be min(out_cb_pagesize, TILE_SIZE8nbytes).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey looks like a small typo, I updated it to min(out_cb_pagesize, TILE_SIZE * 8 * out_nbytes);, let me know if this isn't what you meant!

@wransom-TT wransom-TT requested a review from nkpatel-tt November 6, 2024 23:44
@wransom-TT wransom-TT merged commit 8ca460b into main Nov 8, 2024
118 checks passed
@wransom-TT wransom-TT deleted the wransom/LargeKernel branch November 8, 2024 17:43
ct-clmsn pushed a commit to ct-clmsn/tt-metal that referenced this pull request Nov 12, 2024
…ns (tenstorrent#14531)

* tenstorrent#14249: Fixed bug for width and block sharding with large kernel sizes and wide reductions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants