Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove reduce ops workarounds once Metal issues are fixed #1624

Closed
mrakitaTT opened this issue Dec 18, 2024 · 0 comments · Fixed by #2164
Closed

Remove reduce ops workarounds once Metal issues are fixed #1624

mrakitaTT opened this issue Dec 18, 2024 · 0 comments · Fixed by #2164
Assignees

Comments

@mrakitaTT
Copy link
Contributor

mrakitaTT commented Dec 18, 2024

Revert the Reduce ops workarounds introduced in PR #1625

@mrakitaTT mrakitaTT self-assigned this Dec 18, 2024
mrakitaTT added a commit that referenced this issue Dec 20, 2024
This PR adds TTNN workarounds for these Metal issues:
- tenstorrent/tt-metal#13361 - By decomposing
`reduce(keepDim=false)` into `reduce(keepDim=true) + reshape`
- tenstorrent/tt-metal#16118 - By annulling
dimensions argument when all dims are being reduced

As part of this work I've also:
- Enabled conversion of `stablehlo.reduce` op with multiple reduce
dimensions
- Added reduce ops verifiers in TTIR
- Added a separate function in TTNNWorkarounds to run rewrite patterns
for decomposition and layout workarounds
- Added lots of unit tests for reduce ops to cover conversions and
verifiers
- Added lots of silicon tests for reduce ops

Opened issue #1624 on
myself to revert these workarounds once Metal issues are fixed.

Closes #805, #848

After implementing these workarounds and running tests, I've encountered
[another Metal
issue](tenstorrent/tt-metal#16104), this time
in `reshape` op. I've debugged it and I have a local fix, I will send a
PR to fix it in Metal repo, confirmed with reshape op owners. I've
opened myself an issue
#1640 to enable Reduce ops
silicon tests after this fix is uplifted.

Another issue that I've encountered while working on this is that after
the workaround pass decompositions, if we are changing the shapes of the
ops tensors, that means that their layout needs to be changed too, but
layout pass is done before the workaround pass. I've managed to solve it
by reusing the layout of the input tensor, but I am not sure if that is
a good solution and maybe we need to repeat some of the layout logic
again after workaround decompositions. FYI @sdjordjevicTT

Here is the example TTNN IR before the workarounds:
```
%3 = "ttnn.sum"(%2) <{dim_arg = [0: i32, 1 : i32, 2: i32], keep_dim = false}> : (tensor<128x32x4xf32, #ttnn_layout2>) -> tensor<1xf32, #ttnn_layout2>
```

and after the workarounds:
```
%3 = "ttnn.sum"(%2) <{keep_dim = true}> : (tensor<128x32x4xf32, #ttnn_layout2>) -> tensor<1x1x1xf32, #ttnn_layout2>
%4 = "ttnn.reshape"(%3) <{shape = [1 : i32]}> : (tensor<1x1x1xf32, #ttnn_layout2>) -> tensor<1xf32, #ttnn_layout3>
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants