Precision issue : Exp2 #13002

umadevimcw · 2024-09-23T13:57:08Z

Describe the bug
PCC is dropping due to precision loss for lodaddexp2 functions. While debugging observed that exp2 of certain inputs are zeros whereas in Torch we are getting values at the precision level which results in PCC drop.

This issue blocks #6391, #8634, #13973, and #13930,

To Reproduce
Steps to reproduce the behavior:

Copy Paste this code to get the precision loss.
In this code the input values are fixed for debugging purposes which show cases the precision loss

# SPDX-FileCopyrightText: © 2023 Tenstorrent Inc.

# SPDX-License-Identifier: Apache-2.0

from loguru import logger
import random
import pytest
import torch
import ttnn

from tests.ttnn.utils_for_testing import assert_with_pcc
from tests.ttnn.python_api_testing.sweep_tests import ttnn_ops


def run_logaddexp2_tests(input_shape, dtype, dlayout, in_mem_config, output_mem_config, data_seed, device):
    torch.manual_seed(data_seed)

    x = torch.Tensor(size=input_shape[0]).uniform_(-100, 100).to(torch.bfloat16)
    y = torch.Tensor(size=input_shape[1]).uniform_(-100, 100).to(torch.bfloat16)

    try:
        # get ref result
        x.fill_(-69.50000)
        y.fill_(-81.00000) # hard coded this for debugging purposes
        
        print("Exp2 results of Torch....")
        
        torch.set_printoptions(sci_mode=False, precision=32)
        print(torch.exp2(x))
        print(torch.exp2(y))


        tt_x = ttnn_ops.exp2(
            x,
            device=device,
            dtype=dtype,
            layout=dlayout,
            input_mem_config=in_mem_config,
            output_mem_config=output_mem_config,
        )
        tt_y  = ttnn_ops.exp2(
            y,
            device=device,
            dtype=dtype,
            layout=dlayout,
            input_mem_config=in_mem_config,
            output_mem_config=output_mem_config,
        )
        
        # # Replicated the logic used in TT 
        # ref_value = torch.logaddexp2(x, y)
        # test_tt_logic = torch.add(torch.exp2(x), torch.exp2(y)) # here result is 0.00000000000000000000119775752698
        # test_tt_logic = torch.log2(test_tt_logic) #here output is -69.5000000
        

        # tt_result = ttnn_ops.logaddexp2(
        #     x,
        #     y,
        #     device=device,
        #     dtype=dtype,
        #     layout=dlayout,
        #     input_mem_config=in_mem_config,
        #     output_mem_config=output_mem_config,
        # )

    except Exception as e:
        logger.warning(f"Operation execution crashed")
        raise e

    # assert len(tt_result.shape) == len(ref_value.shape)
    # assert tt_result.shape == ref_value.shape
    # ref value is -69.500
    # tt_result is -inf
    print("Exp2 results of TT....")
    print(tt_x)
    print(tt_y)

test_sweep_args2 = [
    (
        [(19, 12), (19, 12)],
        [ttnn.bfloat16, ttnn.bfloat16],
        [ttnn.TILE_LAYOUT, ttnn.TILE_LAYOUT],
        [ttnn.DRAM_MEMORY_CONFIG, ttnn.DRAM_MEMORY_CONFIG],
        (ttnn.DRAM_MEMORY_CONFIG),
        18261510,
    ),
]


@pytest.mark.parametrize(
    "input_shape, dtype, dlayout, in_mem_config, output_mem_config, data_seed",
    (test_sweep_args2),
)
def test_eltwise_logaddexp2(input_shape, dtype, dlayout, in_mem_config, output_mem_config, data_seed, device):
    run_logaddexp2_tests(input_shape, dtype, dlayout, in_mem_config, output_mem_config, data_seed, device)

Expected behavior

TT exp2 results will be zeros
torch result will be Non-zeros

Screenshots

Please complete the following environment information:

OS: [e.g. Ubuntu 20.04]
Version of software (eg. commit)

Additional context
TT's exp2 ops internally depends on exp op

The text was updated successfully, but these errors were encountered:

rtawfik01 · 2024-09-23T14:30:08Z

@ttmtrajkovic I have discussed with @umadevimcw offline, and the issue above is that exp2 implementation on device does not output value: 1.1977575e-21, which is re-presentable by float16b, but pytorch with dataformat float16b does represent it, this causes precision failures downstream.

@umadevimcw @eyonland please let us know what is the priority for this issue, if it is only failing unit tests, or failing on models due to the downstream precision issue.

@ttmtrajkovic can re-assign appropriately.

eyonland · 2024-09-25T17:31:04Z

This is a P1 priority. I'm not aware of any models failing on this at the moment. The related issue is #8634

cmaryanTT · 2024-12-16T15:30:55Z

Reducing precision issues to P2 poer discussion with @ttmtrajkovic

prajaramanTT · 2025-01-09T20:45:15Z

@umadevimcw @ttmtrajkovic Is this still an open issue ? If not, can you please close this ticket ? Thanks.

umadevimcw added bug Something isn't working LLK labels Sep 23, 2024

umadevimcw assigned rtawfik01 Sep 23, 2024

rtawfik01 removed their assignment Sep 23, 2024

umadevimcw mentioned this issue Sep 24, 2024

ttnn.log, log2 and log10 operation fail with low PCC on TILE layout, bfloat8_b and random shapes like (198, 216) #8634

Open

ttmtrajkovic self-assigned this Sep 25, 2024

ttmtrajkovic added the WH label Sep 25, 2024

This was referenced Oct 22, 2024

LLK Bugs & Features for Binary Generality #9702

Open

ttnn.exp, exp2, expm1 and ldexp fail with low PCC on both GS and WH in larger value ranges [Bug Report][GS][WH] #6391

Open

eyonland added P0 P1 and removed P0 labels Oct 22, 2024

cmaryanTT added P2 and removed P1 labels Dec 16, 2024

cmaryanTT assigned umadevimcw Jan 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Precision issue : Exp2 #13002

Precision issue : Exp2 #13002

umadevimcw commented Sep 23, 2024 •

edited by eyonland

Loading

rtawfik01 commented Sep 23, 2024

eyonland commented Sep 25, 2024

cmaryanTT commented Dec 16, 2024

prajaramanTT commented Jan 9, 2025

Precision issue : Exp2 #13002

Precision issue : Exp2 #13002

Comments

umadevimcw commented Sep 23, 2024 • edited by eyonland Loading

rtawfik01 commented Sep 23, 2024

eyonland commented Sep 25, 2024

cmaryanTT commented Dec 16, 2024

prajaramanTT commented Jan 9, 2025

umadevimcw commented Sep 23, 2024 •

edited by eyonland

Loading