Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[StableHLO] reshape op failure for multi dimensional reshaping #1344

Closed
mmanzoorTT opened this issue Nov 20, 2024 · 9 comments
Closed

[StableHLO] reshape op failure for multi dimensional reshaping #1344

mmanzoorTT opened this issue Nov 20, 2024 · 9 comments
Assignees
Labels
stablehlo conversion bug Bugs in StableHLO conversion

Comments

@mmanzoorTT
Copy link
Contributor

stablehlo.reshape conversion is failing if we reshape multiple dimensions simultaneously.

An example stablehlo graph

module {
  func.func @main(%arg0: tensor<1x3x256x256xbf16>) -> tensor<1x3x16x16x16x16xbf16> {
    %0 = stablehlo.reshape %arg0 : (tensor<1x3x256x256xbf16>) -> tensor<1x3x16x16x16x16xbf16>
    return %0 : tensor<1x3x16x16x16x16xbf16>
  }
}

Error message

results/mlir_tests/stable_hlo/aten::view_0.mlir:3:10: error: 'ttir.reshape' op Shape attribute must have at most 5 elements	"results/mlir_tests/stable_hlo/aten::view_0.mlir:3:10: error: 'ttir.reshape' op Shape attribute must have at most 5 elements
    %0 = stablehlo.reshape %arg0 : (tensor<1x3x256x256xbf16>) -> tensor<1x3x16x16x16x16xbf16>
         ^
results/mlir_tests/stable_hlo/aten::view_0.mlir:3:10: note: see current operation: %1 = ""ttir.reshape""(%arg0, %0) <{operand_constraints = [#tt.operand_constraint<dram|l1|tile|none|interleaved|single_bank|height_sharded|width_sharded|block_sharded|any_layout|any_device_tile>, #tt.operand_constraint<dram|l1|tile|none|interleaved|single_bank|height_sharded|width_sharded|block_sharded|any_layout|any_device_tile>], shape = [1 : i32, 3 : i32, 16 : i32, 16 : i32, 16 : i32, 16 : i32]}> : (tensor<1x3x256x256xbf16>, tensor<1x3x16x16x16x16xbf16>) -> tensor<1x3x16x16x16x16xbf16>
@mmanzoorTT mmanzoorTT added the stablehlo conversion bug Bugs in StableHLO conversion label Nov 20, 2024
@mmanzoorTT mmanzoorTT added this to the [Third Party] HLO + XLA milestone Nov 20, 2024
@ajakovljevicTT ajakovljevicTT self-assigned this Nov 21, 2024
@ajakovljevicTT
Copy link
Contributor

ajakovljevicTT commented Nov 21, 2024

I did some initial digging into this bug, putting notes below:
The reshape op in tt-metal currently does not support tensors with more that 4 dimensions with tile layout (issue tenstorrent/tt-metal#13889). However, that is currently being fixed (tenstorrent/tt-metal@d08a9fc). I cherry-picked this commit manually to our version of metal, disabled the asserts and ran the test above. It passed, with a small caveat. Since reshape puts out the view of a tensor, and does not change the tensor memory place itself, that means that we will need to change how we do deallocation op on our side (as it will lead to deallocating twice in the current setup).

We might be able to circumvent it if we manually translate to row_major and then back to tile, but not sure if that is the best point to go forward if the metal is working on a fix for this.

Should we wait for the PR on metal to see if the issues go away? @mmanzoorTT @mrakitaTT

@mmanzoorTT
Copy link
Contributor Author

Thanks @ajakovljevicTT . We have to wait till next tt-metal uplift to resolve this problem. You may open a tt-metal issue to describe the problem and get their feedback in meanwhile.

@ajakovljevicTT
Copy link
Contributor

Update: A pull request on tt-metal was also opened: tenstorrent/tt-metal#15572

@jvegaTT
Copy link

jvegaTT commented Dec 4, 2024

The above PR as mentioned does the dimensionality reduction directly in the reshape OP. I am not part of the compiler team to test and validate but it is possible that merging the above PR in may resolve this.

@jvegaTT
Copy link

jvegaTT commented Dec 5, 2024

The pr in tt-metal was just merged, can you verify if the issue is solved?

@ajakovljevicTT
Copy link
Contributor

The pr in tt-metal was just merged, can you verify if the issue is solved?

Will do that, thanks!

@ajakovljevicTT
Copy link
Contributor

ajakovljevicTT commented Dec 5, 2024

Update: I have manually uplifted new tt-metal main and disabled the asserts in tt-mlir and the code above works. However, the newest version of tt-metal is not uplifted due to some compilation issues (PR #1512). I have pushed the fix to the uplift branch, and hopefully the uplift will be soon merged into main. After that, I will delete the asserts, test and close the issue.

@ajakovljevicTT
Copy link
Contributor

The tt-metal uplift has been merged and I have tested that the new reshape works with more than 4D tensors. The PR which will allow this on our side is here: #1524

@ajakovljevicTT
Copy link
Contributor

Closed by #1524

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stablehlo conversion bug Bugs in StableHLO conversion
Projects
None yet
Development

No branches or pull requests

3 participants