Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

logit operation fails with low PCC on Wormhole and Grayskull cards #4405

Closed
Tracked by #6445
JelenaTosicJtosic opened this issue Dec 19, 2023 · 4 comments
Closed
Tracked by #6445
Assignees
Labels
bug Something isn't working GS op_cat: eltwise WH

Comments

@JelenaTosicJtosic
Copy link
Contributor

JelenaTosicJtosic commented Dec 19, 2023

logit operation breaks with low PCC error in some test cases on Wormhole cards.

  • Problem is observed both on tt_lib.tensor and ttnn variants.
  • Problem happens on both GS and WH cards

To Reproduce
Steps to reproduce the behavior:

  1. Checkout main branch
  2. Run unit test test_logit.py using this command:
    pytest tests/tt_eager/python_api_testing/non_working_unit_tests/wormhole/test_logit.py

(NOTE: Use the same unit test for Grayskyll card1)

Expected behavior
There are three test cases presented in the unit test tests/tt_eager/python_api_testing/non_working_unit_tests/wormhole/test_logit.py
For GS first two are expected to fail and for WH all three are expected to fail with low PCC error.
For example, one of the tests is expected to fail with this result:
Max ATOL Delta: 5.001336097717285, Max RTOL Delta: 2.0224297046661377, PCC: -0.7121394672631166, PCC check failed.

Also, we should expect the operation to return nan when the epsilon argument is greater than 1. For this test case GS returns tensor filled with nan` values, but for WH this is not the case and we get error printed like this:

ERROR | tests.tt_eager.python_api_testing.sweep_tests.comparison_funcs:get_pcc:32 - One tensor is all nan, the other is not.

Moreover the documentation for logit needs to be updated, so that epsilon is defined appropriately like the epsilon for input clamp bound and not dimension to logit along. Also the ranges need to be updated, so that epsilon as well as input have valid input ranges mentioned in the documentation.

Getting Additional info for the operation under test and its behavior
To get additional information and results for different combinations of input shapes, types, layouts and memory configs for which this operation was tested you can also run locally sweeps for logit operation and check the results:

tests/tt_eager/python_api_testing/sweep_tests/test_configs/ci_sweep_tests/broken_wormhole/pytorch_eltwise_logit_test.yaml

for tt_lib variant. And:

tests/ttnn/python_api_testing/sweep_tests/test_configs/ci_sweep_tests_broken/wormhole/ttnn_eltwise_logit_test.yaml

for ttnn variant.

To do this you should:

  1. Follow the Getting Started page to setup the repo, environment variables and python-env
  2. Activate source build/python_env/bin/activate
  3. Run sweeps by using python tests/tt_eager/python_api_testing/sweep_tests/run_pytorch_test.py -i tests/tt_eager/python_api_testing/sweep_tests/test_configs/ci_sweep_tests/broken_wormhole/pytorch_eltwise_logit_test.yaml -o ./result-sweeps
  4. After the run is completed all test sweeps results should be available inside specified output directory (in this case ./result-sweeps). There you will find eltwise_logit_sweep.csv which holds all executed sweeps.
@nemanjagrujic nemanjagrujic changed the title Tt_lib.tensor.logit op fails with low PCC logit operation fails with low PCC on Wormhole cards Mar 1, 2024
@nemanjagrujic nemanjagrujic changed the title logit operation fails with low PCC on Wormhole cards logit operation fails with low PCC on Wormhole and Grayskull cards Mar 4, 2024
@nemanjagrujic nemanjagrujic added the bug Something isn't working label Mar 4, 2024
@jliangTT jliangTT assigned umadevimcw and unassigned boris-drazic Mar 18, 2024
umadevimcw added a commit that referenced this issue Mar 19, 2024
@umadevimcw
Copy link
Contributor

umadevimcw commented Mar 19, 2024

@JelenaTosicJtosic @nemanjagrujic Please find the PR #6538. With the updated logic logit is passing

umadevimcw added a commit that referenced this issue Mar 19, 2024
umadevimcw added a commit that referenced this issue Mar 20, 2024
@umadevimcw umadevimcw moved this from 🆕 New to 🏗 In progress in External Requests and Reports Mar 20, 2024
@umadevimcw umadevimcw moved this from 🏗 In progress to 👀 In review in External Requests and Reports Mar 20, 2024
@nemanjagrujic
Copy link
Contributor

@umadevimcw Tested and seems to be passing both unit and sweeps tests! Great!

Please let us know when you merge to main so we can update sweep tests (move it to working folders etc).

@umadevimcw
Copy link
Contributor

Sure waiting for approval. Will let you know

@umadevimcw
Copy link
Contributor

@nemanjagrujic Merged PR into the main

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working GS op_cat: eltwise WH
Projects
None yet
Development

No branches or pull requests

4 participants