ND behavior in `test_matmul.py::test_sd_matmul` #7126

TT-billteng · 2024-04-04T19:11:07Z

pytest tests/ttnn/unit_tests/operations/test_matmul.py::test_sd_matmul

This appears only on N150, but I haven't been able to repro locally yet though

This is not failing on a specific VM or BM.

Some failing runs:

https://github.com/tenstorrent-metal/tt-metal/actions/runs/8558424886/job/23453740581
https://github.com/tenstorrent-metal/tt-metal/actions/runs/8558853752/job/23454745853
https://github.com/tenstorrent-metal/tt-metal/actions/runs/8545275089/job/23414372279
https://github.com/tenstorrent-metal/tt-metal/actions/runs/8542369244/job/23403999132
https://github.com/tenstorrent-metal/tt-metal/actions/runs/8559093722/job/23455622019

The text was updated successfully, but these errors were encountered:

TT-billteng · 2024-04-04T19:19:35Z

not test_sd_matmul specifically, but this is a deterministic failure and may be related:

pip install pytest-repeat
pytest --count=2 tests/ttnn/unit_tests/operations/test_matmul.py -xv

PASSED tests/ttnn/unit_tests/operations/test_matmul.py::test_matmul_with_matched_width_height_from_1D[k_size=4-n_size=4-2-2]
PASSED tests/ttnn/unit_tests/operations/test_matmul.py::test_matmul_with_matched_width_height_4D[n_size=1-c=1-h=2-w=4-1-2]
SKIPPED [4] tests/ttnn/unit_tests/operations/test_matmul.py:70: ttnn.reshape doesn't support reshaping the input tensors used in this test
FAILED tests/ttnn/unit_tests/operations/test_matmul.py::test_matmul_with_matched_width_height_4D[n_size=1-c=1-h=2-w=4-2-2] - AssertionError: 0.9996211303338057

cfjchu · 2024-04-04T19:24:16Z

not test_sd_matmul specifically, but this is a deterministic failure and may be related:

pip install pytest-repeat pytest --count=2 tests/ttnn/unit_tests/operations/test_matmul.py -xv

PASSED tests/ttnn/unit_tests/operations/test_matmul.py::test_matmul_with_matched_width_height_from_1D[k_size=4-n_size=4-2-2]
PASSED tests/ttnn/unit_tests/operations/test_matmul.py::test_matmul_with_matched_width_height_4D[n_size=1-c=1-h=2-w=4-1-2]
SKIPPED [4] tests/ttnn/unit_tests/operations/test_matmul.py:70: ttnn.reshape doesn't support reshaping the input tensors used in this test
FAILED tests/ttnn/unit_tests/operations/test_matmul.py::test_matmul_with_matched_width_height_4D[n_size=1-c=1-h=2-w=4-2-2] - AssertionError: 0.9996211303338057

@TT-billteng that just looks like a PCC thresholding assertion maybe due to different seeds across runs?

TT-billteng · 2024-04-04T19:29:24Z

not test_sd_matmul specifically, but this is a deterministic failure and may be related:
pip install pytest-repeat pytest --count=2 tests/ttnn/unit_tests/operations/test_matmul.py -xv
PASSED tests/ttnn/unit_tests/operations/test_matmul.py::test_matmul_with_matched_width_height_from_1D[k_size=4-n_size=4-2-2]
PASSED tests/ttnn/unit_tests/operations/test_matmul.py::test_matmul_with_matched_width_height_4D[n_size=1-c=1-h=2-w=4-1-2]
SKIPPED [4] tests/ttnn/unit_tests/operations/test_matmul.py:70: ttnn.reshape doesn't support reshaping the input tensors used in this test
FAILED tests/ttnn/unit_tests/operations/test_matmul.py::test_matmul_with_matched_width_height_4D[n_size=1-c=1-h=2-w=4-2-2] - AssertionError: 0.9996211303338057
@TT-billteng that just looks like a PCC thresholding assertion maybe due to different seeds across runs?

Ah yes that's true, running again with lower threshold

I feel like we should use a "standard" globally-agreed upon PCC threshold to use for testing, or is this too much of an ask?

cfjchu · 2024-04-04T19:37:22Z

I don't think that's feasible because there could be variability based on:

operation type
distribution of floating point values based on seed
stacking operations
data formats relative to torch.float32/torch.bfloat16

TT-billteng · 2024-04-05T20:27:21Z

can we disable? I still see this failing on main

eyonland · 2024-04-08T15:31:02Z

I would rather not disable this test. @TT-BrianLiu , do we need to lower the PCC for this matmul test when running on WH or is this something bigger?

TT-BrianLiu · 2024-04-11T20:52:28Z

Was it not failing before? Otherwise, 0.999 is pretty reasonable pcc for a matmul

bbradelTT · 2024-07-10T16:10:04Z

@TT-billteng

Re: I feel like we should use a "standard" globally-agreed upon PCC threshold to use for testing, or is this too much of an ask?
Too big of an ask. Model owners use PCC thresholds as close as possible to existing values to notice any changes.

Having said that, the PCCs in test_matmul.py should be updated so that anything above .999 or with too many digits would be changed.

test_matmul.py has many tests with skip_for_wormhole_b0. If we enable the tests again, I'm worried about All commit runtime increasing. When updating the PCCs, would you prefer to

just enable test_sd_matmul
enable all the tests
enable some subset of tests, in which case, what would be the criteria?

prajaramanTT · 2024-11-26T21:51:00Z

@TT-billteng @bbradelTT Can we close this issue ?

bbradelTT · 2024-11-26T22:20:57Z

@prajaramanTT We can't.

prajaramanTT · 2025-01-07T16:43:12Z

@bbradelTT Do we have any updates on this ?

bbradelTT · 2025-01-14T21:00:23Z

I just looked into this.

On WH N150 and BH the tests pass. They are skipped on N300 since the grid is too small.

I'll create a PR to re-enable the tests.

### Ticket Link to Github Issue #7126 ### Problem description A test was failing and was skipped ### What's changed After various issues were fixed over time the test now passes. Therefore enable it again. ### Checklist - [x] Post commit CI passes https://github.com/tenstorrent/tt-metal/actions/runs/12776412082 - [ ] Blackhole Post commit (if applicable) Too many issues, but test passed locally. - [ ] Model regression CI testing passes (if applicable) N/A - [ ] Device performance regression CI testing passes (if applicable) N/A - [ ] **(For models and ops writers)** Full [new models](https://github.com/tenstorrent/tt-metal/actions/workflows/full-new-models-suite.yaml) tests passes - [ ] New/Existing tests provide coverage for changes

bbradelTT · 2025-01-15T17:15:39Z

All post commit passed in main after the merge. Checked subsequent runs as well, and there are o failures in all post commit related to GS.

Closing.

TT-billteng added the bug Something isn't working label Apr 4, 2024

github-project-automation bot added this to External Requests and Reports Apr 4, 2024

github-project-automation bot moved this to 🆕 New in External Requests and Reports Apr 4, 2024

TT-billteng removed this from External Requests and Reports Apr 4, 2024

TT-billteng added the ci-bug bugs found in CI label Apr 4, 2024

github-project-automation bot added this to External Requests and Reports Apr 4, 2024

github-project-automation bot moved this to 🆕 New in External Requests and Reports Apr 4, 2024

TT-billteng removed this from External Requests and Reports Apr 4, 2024

TT-billteng added the P1 label Apr 4, 2024

github-project-automation bot added this to External Requests and Reports Apr 4, 2024

github-project-automation bot moved this to 🆕 New in External Requests and Reports Apr 4, 2024

cfjchu assigned cfjchu, AleksKnezevic and mtatsumiTT and unassigned cfjchu Apr 4, 2024

jliangTT added the op_cat: mm label Apr 10, 2024

bbradelTT assigned bbradelTT and unassigned AleksKnezevic and mtatsumiTT Jul 10, 2024

bbradelTT added a commit that referenced this issue Jul 12, 2024

#7126: update matmul test pccs to run and pass on wormhole

09b4450

bbradelTT mentioned this issue Jul 12, 2024

#7126: update matmul test pccs to run and pass on wormhole #10230

Closed

3 tasks

bbradelTT added a commit that referenced this issue Jul 15, 2024

#7126: test_matmul pcc adjustments and allow for WH

87684c6

bbradelTT added a commit that referenced this issue Jan 14, 2025

#7126: remove skip for test_sd_matmul test

8a314f6

bbradelTT mentioned this issue Jan 14, 2025

#7126: remove skip for test_sd_matmul test #16729

Merged

6 tasks

bbradelTT added a commit that referenced this issue Jan 14, 2025

#7126: remove skip for test_sd_matmul test

023b110

bbradelTT closed this as completed Jan 15, 2025

github-project-automation bot moved this from 🆕 New to ✅ Done in External Requests and Reports Jan 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ND behavior in `test_matmul.py::test_sd_matmul` #7126

ND behavior in `test_matmul.py::test_sd_matmul` #7126

TT-billteng commented Apr 4, 2024 •

edited

Loading

TT-billteng commented Apr 4, 2024 •

edited

Loading

cfjchu commented Apr 4, 2024

TT-billteng commented Apr 4, 2024

cfjchu commented Apr 4, 2024 •

edited

Loading

TT-billteng commented Apr 5, 2024

eyonland commented Apr 8, 2024

TT-BrianLiu commented Apr 11, 2024

bbradelTT commented Jul 10, 2024

prajaramanTT commented Nov 26, 2024

bbradelTT commented Nov 26, 2024

prajaramanTT commented Jan 7, 2025

bbradelTT commented Jan 14, 2025

bbradelTT commented Jan 15, 2025

ND behavior in test_matmul.py::test_sd_matmul #7126

ND behavior in test_matmul.py::test_sd_matmul #7126

Comments

TT-billteng commented Apr 4, 2024 • edited Loading

TT-billteng commented Apr 4, 2024 • edited Loading

cfjchu commented Apr 4, 2024

TT-billteng commented Apr 4, 2024

cfjchu commented Apr 4, 2024 • edited Loading

TT-billteng commented Apr 5, 2024

eyonland commented Apr 8, 2024

TT-BrianLiu commented Apr 11, 2024

bbradelTT commented Jul 10, 2024

prajaramanTT commented Nov 26, 2024

bbradelTT commented Nov 26, 2024

prajaramanTT commented Jan 7, 2025

bbradelTT commented Jan 14, 2025

bbradelTT commented Jan 15, 2025

ND behavior in `test_matmul.py::test_sd_matmul` #7126

ND behavior in `test_matmul.py::test_sd_matmul` #7126

TT-billteng commented Apr 4, 2024 •

edited

Loading

TT-billteng commented Apr 4, 2024 •

edited

Loading

cfjchu commented Apr 4, 2024 •

edited

Loading