Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uplift third_party/tt-metal to 2024-11-01 (3a24131822 w/ fixes) #1119

Merged
merged 3 commits into from
Nov 13, 2024

Conversation

nsmithtt
Copy link
Contributor

No description provided.

@kmabeeTT
Copy link
Contributor

Plz cherry-pick a2e64a2 for some needed runtime changes to solve compile error that goes with metal PR tenstorrent/tt-metal#14394

@mtopalovicTT
Copy link
Contributor

@nsmithtt
Not sure if possible to include this also?
tenstorrent/tt-metal@948fafb

@mbezuljTT
Copy link
Contributor

we have a bigger problem with merges, when we solve it - these will be automated again.

@kmabeeTT
Copy link
Contributor

kmabeeTT commented Nov 4, 2024

Comment on CI fails, one error is:

2024-11-01 16:16:18,806 - DEBUG - starting loop=1/1 for binary=test_perf_slice.mlir.ttnn
                 Always | FATAL    | Physical size (32, 16) must be multiple of page size (1, 32)
2024-11-01 16:16:19,059 - ERROR - ERROR: test=test_perf_slice.mlir.ttnn experienced an error with exception=TT_FATAL @ /localdev/kmabee/mlir2/third_party/tt-metal/src/tt-metal/ttnn/cpp/ttnn/tensor/layout/tensor_layout.cpp:97: width_remainder == 0 && height_remainder == 0

Talked to Artem on Friday and he said tt-metal bug and fix is coming ...soon...

Edit: Unconfirmed if we need to fix something here, but metal PR Artem referring to is tenstorrent/tt-metal#14550

@nsmithtt
Copy link
Contributor Author

nsmithtt commented Nov 4, 2024

We'll need this UMD fix: tenstorrent/tt-umd#251 (comment)

@mbezuljTT
Copy link
Contributor

We'll need this UMD fix: tenstorrent/tt-umd#251 (comment)

@broskoTT we're fixing metal in mlir repo. do you know if umd uplifts regularly to metal repo or would be better for us to do it on our own?

@kmabeeTT kmabeeTT force-pushed the nsmith/metal-uplift branch from 2c97fbd to 980bee6 Compare November 6, 2024 18:29
@kmabeeTT kmabeeTT changed the title Uplift third_party/tt-metal to dec1dc2c6da1d547535bedb11d41f6373c66a69f Oct30 uplift third_party/tt-metal effort Nov 6, 2024
@kmabeeTT
Copy link
Contributor

kmabeeTT commented Nov 6, 2024

Small update. Renamed PR, and cherry picked 3 metal PR's (2 from Friday, 1 from today) and the failing tests in last week's CI run we saw are gone. Rebased and force pushed this branch as we are getting close. Going to try to test on Forge-FE too.

I haven't seen need for UMD fix referenced above though, or at least I didn't do anything, I get impression it's needed only for newer tt-metal not the Oct30 base we are still targeting with this uplift. tt-mlir is currently using Oct25 tt-metal.

@broskoTT
Copy link

broskoTT commented Nov 8, 2024

We're making a lot of API changes, so UMD is regularly being uplifted to tt_metal (every one or two days). Of course, subject to UMD regressions, and to tt_metal CI regressions.

@nsmithtt
Copy link
Contributor Author

nsmithtt commented Nov 8, 2024

We're making a lot of API changes, so UMD is regularly being uplifted to tt_metal (every one or two days). Of course, subject to UMD regressions, and to tt_metal CI regressions.

Good to know! Thanks @broskoTT

@kmabeeTT kmabeeTT force-pushed the nsmith/metal-uplift branch from 4d650eb to 9a95ce8 Compare November 12, 2024 18:39
@kmabeeTT
Copy link
Contributor

Good news, this tt-metal uplift that started oct-30 is now passing now tt-mlir CI (and when tried in tt-forge-fe) after some follow up fixes last few days. Ping owners @pilkicTT / @sdjordjevicTT / @jserbedzijaTT to unblock.

tt-mlir link: https://github.com/tenstorrent/tt-mlir/actions/runs/11803629246
tt-forge-fe link: https://github.com/tenstorrent/tt-forge-fe/actions/runs/11806004440

Notes:

  1. Needed a conv2d fix in tt-metal made on 11/8 that I cherry picked. Had to cherry-pick a revert of conv changes from Lewis too from tt-metal to solve conflicts.
  2. Needed to rebase this branch on latest tt-mlir from 11/12 to try in tt-forge-fe to get a conv fix to allow maxpool2d tests to pass in tt-forge-fe
  3. So this is using tt-metal base 3a24131822 (11/1) plus few cherry picks from tt-metal main, won't need them on next full tt-metal uplift.

If we like it I can clean up and merge.

@mbezuljTT
Copy link
Contributor

mbezuljTT commented Nov 13, 2024

Thanks @kmabeeTT for pushing this all the way forward!

Question:

  1. Had to cherry-pick a revert of conv changes from Lewis too from tt-metal to solve conflicts.

Is this revert in tt-metal as well?

@kmabeeTT
Copy link
Contributor

Thanks @kmabeeTT for pushing this all the way forward!

Question:

  1. Had to cherry-pick a revert of conv changes from Lewis too from tt-metal to solve conflicts.

Is this revert in tt-metal as well?

Yep, it sure is. There was a change submitted, and then a revert a few days later, I just cherry picked the revert.

@kmabeeTT kmabeeTT force-pushed the nsmith/metal-uplift branch from 9a95ce8 to ad71c1d Compare November 13, 2024 17:09
nsmithtt and others added 3 commits November 13, 2024 18:33
 - Switch to tt-metal branch w/ 3 PR's cherry picked to solve
   refactoring issues with TensorLayout

 - Cherry pick conv2D fix (and revert of another change to
   resolve conflicts)
…n tt-metal

 - Matches changes from tt-metal PR 14394
… tt-metal

 - Apply small_vector_shape workaround to 2 places in runtime.cpp
@kmabeeTT kmabeeTT force-pushed the nsmith/metal-uplift branch from ad71c1d to b25b031 Compare November 13, 2024 18:34
@kmabeeTT
Copy link
Contributor

Rebased, and merging shortly.

Some documentation: There was some follow up experiments where I tried to remove maxpool2dPreshard workaround yesterday/today (in effort to try and get tt-forge-fe passing), in the end turns out we need to keep that workaround in place (otherwise maxpool2d fails in FFE). Rebased this branch, and dropped that commit just now, have already tried it on FFE, is passing).

@kmabeeTT kmabeeTT changed the title Oct30 uplift third_party/tt-metal effort Uplift third_party/tt-metal to 2024-11-01 (3a24131822 w/ fixes) Nov 13, 2024
@kmabeeTT kmabeeTT merged commit 9d08234 into main Nov 13, 2024
18 checks passed
@kmabeeTT kmabeeTT deleted the nsmith/metal-uplift branch December 8, 2024 15:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants