ttnn.exp, exp2, expm1 and ldexp fail with low PCC on both GS and WH in larger value ranges [Bug Report][GS][WH] #6391

npetrovic-tenstorrent · 2024-03-14T09:01:56Z

Exponential-alike operations ttnn.exp, ttnn.exp2, ttnn.expm1 and ttnn.ldexp fail with low PCC when input value range is larger than [-10,10]. Therefore, working sweeps passing in range [-100,100] cannot be created.

To Reproduce
Steps to reproduce the behavior:
Checkout main branch (soon to be merged into main) and run unit test test_exp.py , test_exp2.py, test.expm1.py or test_ldexp.py (or others) using this command pattern:

pytest tests/ttnn/python_api_testing/non_working_unit_tests/grayskull/test_exp2.py

Expected behavior
There are few test cases presented in the unit test tests/tt_eager/python_api_testing/non_working_unit_tests/grayskull/test_exp2.py and they are expected to fail with low PCC error (PCC not so low, but still not passing):

PCC: 0.9336307604267816

The same is expected for test_exp.py and others as well.

Getting Additional info for the operation under test and its behavior
To get additional information and results for different combinations of input shapes, types, layouts and memory configs for which this operation was tested you can also run locally sweep test:

tests/ttnn/python_api_testing/sweep_tests/test_configs/ci_sweep_tests_broken/grayskull/ttnn_eltwise_exp2_test.yaml

To do this you should:

Follow the Getting Started page to setup the repo, environment variables and python-env
Activate source build/python_env/bin/activate
Run sweeps by using python tests/tt_eager/python_api_testing/sweep_tests/run_pytorch_test.py -i tests/ttnn/python_api_testing/sweep_tests/test_configs/ci_sweep_tests_broken/grayskull/ttnn_eltwise_exp2_test.yaml -o ./result-sweeps
After the run is completed all test sweeps results should be available inside specified output directory (in this case ./result-sweeps). There you will find .csv which holds all executed sweeps, among which you can also find the ones that failed and were recreated by the unit test, which you can get by searching unique data_seed field.

The text was updated successfully, but these errors were encountered:

umadevimcw · 2024-05-06T11:18:31Z

@tt-aho all above ops are not accurate for BF8 datatypes when the input range is [-100, 100] whereas it works for the range [-10,10] and pcc also 0.999. While checking observed that exp module has a function s2vFloat16b I assume which is very specific to BF16, while tracing it involves following logic (specific to fp16)? Will this cause PCC drop ?

sfpi_inline uint32_t s2vFloat16::fp32_to_fp16a(const float val)
{
    union {
        float vfloat;
        uint32_t vui;
    } tmp;

    tmp.vfloat = val;

    // https://stackoverflow.com/questions/1659440/32-bit-to-16-bit-floating-point-conversion
    // Handles denorms.  May be costly w/ non-immediate values
    const unsigned int b = tmp.vui + 0x00001000;
    const unsigned int e = (b & 0x7F800000) >> 23;
    const unsigned int m = b & 0x007FFFFF;
    const unsigned int result =
        (b & 0x80000000) >> 16 |
        (e > 112) * ((((e - 112) << 10) &0x7C00) | m >> 13) |
        ((e < 113) & (e > 101)) * ((((0x007FF000 + m) >> (125 -e )) + 1) >> 1) |
        (e > 143) * 0x7FFF;
#if 0
    // Simple/faster but less complete
    const unsigned int result =
        ((tmp.vui >> 16) & 0x8000) |
        ((((tmp.vui & 0x7F800000) - 0x38000000) >> 13) & 0x7c00) |
        ((tmp.vui >> 13) & 0x03FF);
#endif

    return result;
}

tt-aho · 2024-05-07T05:14:11Z

Is the issue only with bfp8 or with bfloat16 as well? For bfp8, increasing the range from 10 -> 100 will cause you have a much larger variation on the output, which will result in less accurate results due to the shared exponents.

Some experiments you could try are to just take the pytorch output, convert to bfp8, then compare the output, or do the same and convert bfloat16 output to bfp8 and compare to see if it's just due to the limitations of the data format.

KalaivaniMCW · 2024-05-08T12:25:00Z

@tt-aho I tried debugging the exp op for bfloat8_b and observed a few things.

for both bfloat16 and bfloat8_b, for input >= 89, the output values are nan (in TT) and inf (in torch)
for dtype=bfloat8_b, when changing the input range to (-88, 88) in the given testfile
tests/tt_eager/python_api_testing/non_working_unit_tests/grayskull/test_exp.py, the PCC goes upto 0.9686338992967846
I tried debugging using DPRINT and TSLICE in tt-metal/tt_metal/kernels/compute/eltwise_sfpu.cpp kernel file for input shape (1,1,32,32) in TILE layout, with env export TT_METAL_DPRINT_RISCVS=TR2.

    init_sfpu(tt::CB::c_in0);
    for (uint32_t block_index = 0; block_index < per_core_block_cnt; block_index++) {
        cb_reserve_back(tt::CB::c_out0, per_core_block_dim);
        for(uint32_t tile_index = 0; tile_index < per_core_block_dim; ++tile_index) {
            acquire_dst(tt::DstMode::Half);

            // Pop tile after tile, copy to DST and pack
            cb_wait_front(tt::CB::c_in0, 1);
            
            DPRINT << "input tile slice 1" << ENDL();
            // Extract a numpy slice from tile 0 from CB c_in0 and print it.
            DPRINT  << TSLICE(tt::CB::c_in0, 0, SliceRange::h0_w0_32()) << ENDL();

            copy_tile(tt::CB::c_in0, 0, 0);

            #ifdef SFPU_OP_CHAIN_0
            SFPU_OP_CHAIN_0
            #endif

            pack_tile(0, tt::CB::c_out0);
           
            // Extract a numpy slice from tile 0 from CB c_out0 and print it.
            DPRINT << "output tile slice 1" << ENDL();
            DPRINT  << TSLICE(tt::CB::c_out0, 0, SliceRange::h0_w0_32()) << ENDL();

            cb_pop_front(tt::CB::c_in0, 1);

            release_dst(tt::DstMode::Half);
        }
        cb_push_back(tt::CB::c_out0, per_core_block_dim);
    }

This worked for bfloat16 and I printed one row at a time. c_in0 and c_out0 CB buffer data matched the TT input and TT output.
But for bfloat8_b, I get random values in the input and output buffer slice.
Ex: a row from input tensor
tensor([[[[ 27.37500, 56.25000, -60.25000, -87.50000, -81.00000, -44.25000, -67.00000, -56.50000, -3.56250, 53.25000, 60.00000, 2.71875, 86.50000, -65.00000, -58.00000, -76.50000, -77.00000, -74.50000, -81.00000, -56.25000, -88.00000, -66.00000, 9.12500, 14.81250, 67.00000, -3.75000, 2.93750, 69.50000, 32.25000, 7.21875, 37.25000, 40.75000],

for bf16, c_in0 tile slice h0
27.375 56.25 -60.25 -87.5 -81 -44.25 -67 -56.5 -3.5625 53.25 60 2.71875 86.5 -65 -58 -76.5 -77 -74.5 -81 -56.25 -88 -66 9.125 14.8125 67 -3.75 2.9375 69.5 32.25 7.21875 37.25 40.75

for bf8_b, c_in0 tile slice h0
-1.25073e-35 -1.25073e-35 -1.25073e-35 -1.25073e-35 -1.25073e-35 -1.25073e-35 -1.25073e-35 -1.25073e-35 -1.25073e-35 -1.25073e-35 -1.25073e-35 -1.25073e-35 -1.25073e-35 -1.25073e-35 -1.25073e-35 -1.25073e-35 -4.85642e-30 3.47946e-35 9.56428e-24 3.18934e-31 -2.26499e+14 -0.000101566 -1.83936e-08 1.10589e-17 -2.20237e-23 -1.35515e-10 3.241e-07 9.50272e+06 413696 1.36922e-34 6.20808e-38 -1.5752e-26

Similar case while printing c_out0 tile slice h0 as well
I have gathered my test findings here : https://docs.google.com/spreadsheets/d/1nOgcMfZjC37mmB0J1ddDyihpQU21RKVZQWR6dr8-NIc/edit#gid=783934660

I tried using DPRINT in the reader/writer kernel to check the output values but it did not show up in the log file. does it require any different method like DPRINT_PACK, DPRINT_UNPACK ?

KalaivaniMCW · 2024-10-21T10:57:02Z

For ttnn.exp2, the range TT output matches with torch result , as tested in Wormhole machine for both bf16 and bf8b dtypes

KalaivaniMCW · 2024-10-22T06:49:23Z

The precision issue related to exp2 is being tracked here: #13002

KalaivaniMCW · 2024-10-22T06:52:09Z

@npetrovic-tenstorrent
The precision issue related to exp2 is being tracked here: #13002
would you like an documentation-update about the working range for the ops mentioned in this ticket or how would you like to proceed ?

prajaramanTT · 2025-01-06T22:50:03Z

@npetrovic-tenstorrent Can we consider this ticket done ? If yes, can you please close this ? Thanks

npetrovic-tenstorrent · 2025-01-14T14:35:53Z

Still valid it seems, as PCC for the test is still around 0.92-0.93. I updated the command for running, as it reported the file not found problem when I first tried it.

@npetrovic-tenstorrent Can we consider this ticket done ? If yes, can you please close this ? Thanks

npetrovic-tenstorrent added bug Something isn't working GS WH labels Mar 14, 2024

github-project-automation bot added this to External Requests and Reports Mar 14, 2024

github-project-automation bot moved this to 🆕 New in External Requests and Reports Mar 14, 2024

jliangTT added op_cat: eltwise P2 labels Mar 15, 2024

Aswinmcw self-assigned this Apr 4, 2024

Aswinmcw mentioned this issue Apr 5, 2024

Eltwise / Reduce / Broadcast Related tasks/bugs - MCW #6445

Open

44 tasks

umadevimcw assigned umadevimcw and unassigned Aswinmcw May 3, 2024

umadevimcw assigned ruthreshx May 7, 2024

KalaivaniMCW self-assigned this May 7, 2024

eyonland unassigned ruthreshx Oct 1, 2024

eyonland mentioned this issue Oct 15, 2024

Eltwise Master Tracking #13795

Open

eyonland mentioned this issue Dec 19, 2024

Precision issue : Exp2 #13002

Open

eyonland added the MCW label Dec 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ttnn.exp, exp2, expm1 and ldexp fail with low PCC on both GS and WH in larger value ranges [Bug Report][GS][WH] #6391

ttnn.exp, exp2, expm1 and ldexp fail with low PCC on both GS and WH in larger value ranges [Bug Report][GS][WH] #6391

npetrovic-tenstorrent commented Mar 14, 2024 •

edited

Loading

umadevimcw commented May 6, 2024 •

edited

Loading

tt-aho commented May 7, 2024

KalaivaniMCW commented May 8, 2024 •

edited

Loading

KalaivaniMCW commented Oct 21, 2024

KalaivaniMCW commented Oct 22, 2024 •

edited

Loading

KalaivaniMCW commented Oct 22, 2024 •

edited

Loading

prajaramanTT commented Jan 6, 2025

npetrovic-tenstorrent commented Jan 14, 2025 •

edited

Loading

ttnn.exp, exp2, expm1 and ldexp fail with low PCC on both GS and WH in larger value ranges [Bug Report][GS][WH] #6391

ttnn.exp, exp2, expm1 and ldexp fail with low PCC on both GS and WH in larger value ranges [Bug Report][GS][WH] #6391

Comments

npetrovic-tenstorrent commented Mar 14, 2024 • edited Loading

umadevimcw commented May 6, 2024 • edited Loading

tt-aho commented May 7, 2024

KalaivaniMCW commented May 8, 2024 • edited Loading

KalaivaniMCW commented Oct 21, 2024

KalaivaniMCW commented Oct 22, 2024 • edited Loading

KalaivaniMCW commented Oct 22, 2024 • edited Loading

prajaramanTT commented Jan 6, 2025

npetrovic-tenstorrent commented Jan 14, 2025 • edited Loading

npetrovic-tenstorrent commented Mar 14, 2024 •

edited

Loading

umadevimcw commented May 6, 2024 •

edited

Loading

KalaivaniMCW commented May 8, 2024 •

edited

Loading

KalaivaniMCW commented Oct 22, 2024 •

edited

Loading

KalaivaniMCW commented Oct 22, 2024 •

edited

Loading

npetrovic-tenstorrent commented Jan 14, 2025 •

edited

Loading