Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Precision issue : Exp2 #13002

Open
Tracked by #9702
umadevimcw opened this issue Sep 23, 2024 · 4 comments
Open
Tracked by #9702

Precision issue : Exp2 #13002

umadevimcw opened this issue Sep 23, 2024 · 4 comments
Assignees
Labels
bug Something isn't working LLK P2 WH

Comments

@umadevimcw
Copy link
Contributor

umadevimcw commented Sep 23, 2024

Describe the bug
PCC is dropping due to precision loss for lodaddexp2 functions. While debugging observed that exp2 of certain inputs are zeros whereas in Torch we are getting values at the precision level which results in PCC drop.

This issue blocks #6391, #8634, #13973, and #13930,

To Reproduce
Steps to reproduce the behavior:

Copy Paste this code to get the precision loss.
In this code the input values are fixed for debugging purposes which show cases the precision loss

# SPDX-FileCopyrightText: © 2023 Tenstorrent Inc.

# SPDX-License-Identifier: Apache-2.0

from loguru import logger
import random
import pytest
import torch
import ttnn

from tests.ttnn.utils_for_testing import assert_with_pcc
from tests.ttnn.python_api_testing.sweep_tests import ttnn_ops


def run_logaddexp2_tests(input_shape, dtype, dlayout, in_mem_config, output_mem_config, data_seed, device):
    torch.manual_seed(data_seed)

    x = torch.Tensor(size=input_shape[0]).uniform_(-100, 100).to(torch.bfloat16)
    y = torch.Tensor(size=input_shape[1]).uniform_(-100, 100).to(torch.bfloat16)

    try:
        # get ref result
        x.fill_(-69.50000)
        y.fill_(-81.00000) # hard coded this for debugging purposes
        
        print("Exp2 results of Torch....")
        
        torch.set_printoptions(sci_mode=False, precision=32)
        print(torch.exp2(x))
        print(torch.exp2(y))


        tt_x = ttnn_ops.exp2(
            x,
            device=device,
            dtype=dtype,
            layout=dlayout,
            input_mem_config=in_mem_config,
            output_mem_config=output_mem_config,
        )
        tt_y  = ttnn_ops.exp2(
            y,
            device=device,
            dtype=dtype,
            layout=dlayout,
            input_mem_config=in_mem_config,
            output_mem_config=output_mem_config,
        )
        
        # # Replicated the logic used in TT 
        # ref_value = torch.logaddexp2(x, y)
        # test_tt_logic = torch.add(torch.exp2(x), torch.exp2(y)) # here result is 0.00000000000000000000119775752698
        # test_tt_logic = torch.log2(test_tt_logic) #here output is -69.5000000
        

        # tt_result = ttnn_ops.logaddexp2(
        #     x,
        #     y,
        #     device=device,
        #     dtype=dtype,
        #     layout=dlayout,
        #     input_mem_config=in_mem_config,
        #     output_mem_config=output_mem_config,
        # )

    except Exception as e:
        logger.warning(f"Operation execution crashed")
        raise e

    # assert len(tt_result.shape) == len(ref_value.shape)
    # assert tt_result.shape == ref_value.shape
    # ref value is -69.500
    # tt_result is -inf
    print("Exp2 results of TT....")
    print(tt_x)
    print(tt_y)

test_sweep_args2 = [
    (
        [(19, 12), (19, 12)],
        [ttnn.bfloat16, ttnn.bfloat16],
        [ttnn.TILE_LAYOUT, ttnn.TILE_LAYOUT],
        [ttnn.DRAM_MEMORY_CONFIG, ttnn.DRAM_MEMORY_CONFIG],
        (ttnn.DRAM_MEMORY_CONFIG),
        18261510,
    ),
]


@pytest.mark.parametrize(
    "input_shape, dtype, dlayout, in_mem_config, output_mem_config, data_seed",
    (test_sweep_args2),
)
def test_eltwise_logaddexp2(input_shape, dtype, dlayout, in_mem_config, output_mem_config, data_seed, device):
    run_logaddexp2_tests(input_shape, dtype, dlayout, in_mem_config, output_mem_config, data_seed, device)

Expected behavior

  • TT exp2 results will be zeros
  • torch result will be Non-zeros

Screenshots

Screenshot 2024-09-23 at 7 21 52 PM

Please complete the following environment information:

  • OS: [e.g. Ubuntu 20.04]
  • Version of software (eg. commit)

Additional context
TT's exp2 ops internally depends on exp op

@umadevimcw umadevimcw added bug Something isn't working LLK labels Sep 23, 2024
@rtawfik01
Copy link
Contributor

@ttmtrajkovic I have discussed with @umadevimcw offline, and the issue above is that exp2 implementation on device does not output value: 1.1977575e-21, which is re-presentable by float16b, but pytorch with dataformat float16b does represent it, this causes precision failures downstream.

@umadevimcw @eyonland please let us know what is the priority for this issue, if it is only failing unit tests, or failing on models due to the downstream precision issue.

@ttmtrajkovic can re-assign appropriately.

@eyonland
Copy link
Contributor

This is a P1 priority. I'm not aware of any models failing on this at the moment. The related issue is #8634

@cmaryanTT
Copy link

Reducing precision issues to P2 poer discussion with @ttmtrajkovic

@prajaramanTT
Copy link

@umadevimcw @ttmtrajkovic Is this still an open issue ? If not, can you please close this ticket ? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working LLK P2 WH
Projects
None yet
Development

No branches or pull requests

6 participants