Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ttnn.exp, exp2, expm1 and ldexp fail with low PCC on both GS and WH in larger value ranges [Bug Report][GS][WH] #6391

Open
Tracked by #6445 ...
npetrovic-tenstorrent opened this issue Mar 14, 2024 · 8 comments
Assignees
Labels

Comments

@npetrovic-tenstorrent
Copy link
Contributor

npetrovic-tenstorrent commented Mar 14, 2024

Exponential-alike operations ttnn.exp, ttnn.exp2, ttnn.expm1 and ttnn.ldexp fail with low PCC when input value range is larger than [-10,10]. Therefore, working sweeps passing in range [-100,100] cannot be created.

To Reproduce
Steps to reproduce the behavior:
Checkout main branch (soon to be merged into main) and run unit test test_exp.py , test_exp2.py, test.expm1.py or test_ldexp.py (or others) using this command pattern:

pytest tests/ttnn/python_api_testing/non_working_unit_tests/grayskull/test_exp2.py

Expected behavior
There are few test cases presented in the unit test tests/tt_eager/python_api_testing/non_working_unit_tests/grayskull/test_exp2.py and they are expected to fail with low PCC error (PCC not so low, but still not passing):

PCC: 0.9336307604267816

The same is expected for test_exp.py and others as well.

Getting Additional info for the operation under test and its behavior
To get additional information and results for different combinations of input shapes, types, layouts and memory configs for which this operation was tested you can also run locally sweep test:

tests/ttnn/python_api_testing/sweep_tests/test_configs/ci_sweep_tests_broken/grayskull/ttnn_eltwise_exp2_test.yaml

To do this you should:

  1. Follow the Getting Started page to setup the repo, environment variables and python-env
  2. Activate source build/python_env/bin/activate
  3. Run sweeps by using python tests/tt_eager/python_api_testing/sweep_tests/run_pytorch_test.py -i tests/ttnn/python_api_testing/sweep_tests/test_configs/ci_sweep_tests_broken/grayskull/ttnn_eltwise_exp2_test.yaml -o ./result-sweeps
  4. After the run is completed all test sweeps results should be available inside specified output directory (in this case ./result-sweeps). There you will find .csv which holds all executed sweeps, among which you can also find the ones that failed and were recreated by the unit test, which you can get by searching unique data_seed field.
@umadevimcw
Copy link
Contributor

umadevimcw commented May 6, 2024

@tt-aho all above ops are not accurate for BF8 datatypes when the input range is [-100, 100] whereas it works for the range [-10,10] and pcc also 0.999. While checking observed that exp module has a function s2vFloat16b I assume which is very specific to BF16, while tracing it involves following logic (specific to fp16)? Will this cause PCC drop ?

sfpi_inline uint32_t s2vFloat16::fp32_to_fp16a(const float val)
{
    union {
        float vfloat;
        uint32_t vui;
    } tmp;

    tmp.vfloat = val;

    // https://stackoverflow.com/questions/1659440/32-bit-to-16-bit-floating-point-conversion
    // Handles denorms.  May be costly w/ non-immediate values
    const unsigned int b = tmp.vui + 0x00001000;
    const unsigned int e = (b & 0x7F800000) >> 23;
    const unsigned int m = b & 0x007FFFFF;
    const unsigned int result =
        (b & 0x80000000) >> 16 |
        (e > 112) * ((((e - 112) << 10) &0x7C00) | m >> 13) |
        ((e < 113) & (e > 101)) * ((((0x007FF000 + m) >> (125 -e )) + 1) >> 1) |
        (e > 143) * 0x7FFF;
#if 0
    // Simple/faster but less complete
    const unsigned int result =
        ((tmp.vui >> 16) & 0x8000) |
        ((((tmp.vui & 0x7F800000) - 0x38000000) >> 13) & 0x7c00) |
        ((tmp.vui >> 13) & 0x03FF);
#endif

    return result;
}

@tt-aho
Copy link
Contributor

tt-aho commented May 7, 2024

Is the issue only with bfp8 or with bfloat16 as well? For bfp8, increasing the range from 10 -> 100 will cause you have a much larger variation on the output, which will result in less accurate results due to the shared exponents.

Some experiments you could try are to just take the pytorch output, convert to bfp8, then compare the output, or do the same and convert bfloat16 output to bfp8 and compare to see if it's just due to the limitations of the data format.

@KalaivaniMCW
Copy link
Contributor

KalaivaniMCW commented May 8, 2024

@tt-aho I tried debugging the exp op for bfloat8_b and observed a few things.

  1. for both bfloat16 and bfloat8_b, for input >= 89, the output values are nan (in TT) and inf (in torch)
  2. for dtype=bfloat8_b, when changing the input range to (-88, 88) in the given testfile
    tests/tt_eager/python_api_testing/non_working_unit_tests/grayskull/test_exp.py, the PCC goes upto 0.9686338992967846
  3. I tried debugging using DPRINT and TSLICE in tt-metal/tt_metal/kernels/compute/eltwise_sfpu.cpp kernel file for input shape (1,1,32,32) in TILE layout, with env export TT_METAL_DPRINT_RISCVS=TR2.
    init_sfpu(tt::CB::c_in0);
    for (uint32_t block_index = 0; block_index < per_core_block_cnt; block_index++) {
        cb_reserve_back(tt::CB::c_out0, per_core_block_dim);
        for(uint32_t tile_index = 0; tile_index < per_core_block_dim; ++tile_index) {
            acquire_dst(tt::DstMode::Half);

            // Pop tile after tile, copy to DST and pack
            cb_wait_front(tt::CB::c_in0, 1);
            
            DPRINT << "input tile slice 1" << ENDL();
            // Extract a numpy slice from tile 0 from CB c_in0 and print it.
            DPRINT  << TSLICE(tt::CB::c_in0, 0, SliceRange::h0_w0_32()) << ENDL();

            copy_tile(tt::CB::c_in0, 0, 0);

            #ifdef SFPU_OP_CHAIN_0
            SFPU_OP_CHAIN_0
            #endif

            pack_tile(0, tt::CB::c_out0);
           
            // Extract a numpy slice from tile 0 from CB c_out0 and print it.
            DPRINT << "output tile slice 1" << ENDL();
            DPRINT  << TSLICE(tt::CB::c_out0, 0, SliceRange::h0_w0_32()) << ENDL();

            cb_pop_front(tt::CB::c_in0, 1);

            release_dst(tt::DstMode::Half);
        }
        cb_push_back(tt::CB::c_out0, per_core_block_dim);
    }

This worked for bfloat16 and I printed one row at a time. c_in0 and c_out0 CB buffer data matched the TT input and TT output.
But for bfloat8_b, I get random values in the input and output buffer slice.
Ex: a row from input tensor
tensor([[[[ 27.37500, 56.25000, -60.25000, -87.50000, -81.00000, -44.25000, -67.00000, -56.50000, -3.56250, 53.25000, 60.00000, 2.71875, 86.50000, -65.00000, -58.00000, -76.50000, -77.00000, -74.50000, -81.00000, -56.25000, -88.00000, -66.00000, 9.12500, 14.81250, 67.00000, -3.75000, 2.93750, 69.50000, 32.25000, 7.21875, 37.25000, 40.75000],

for bf16, c_in0 tile slice h0
27.375 56.25 -60.25 -87.5 -81 -44.25 -67 -56.5 -3.5625 53.25 60 2.71875 86.5 -65 -58 -76.5 -77 -74.5 -81 -56.25 -88 -66 9.125 14.8125 67 -3.75 2.9375 69.5 32.25 7.21875 37.25 40.75

for bf8_b, c_in0 tile slice h0
-1.25073e-35 -1.25073e-35 -1.25073e-35 -1.25073e-35 -1.25073e-35 -1.25073e-35 -1.25073e-35 -1.25073e-35 -1.25073e-35 -1.25073e-35 -1.25073e-35 -1.25073e-35 -1.25073e-35 -1.25073e-35 -1.25073e-35 -1.25073e-35 -4.85642e-30 3.47946e-35 9.56428e-24 3.18934e-31 -2.26499e+14 -0.000101566 -1.83936e-08 1.10589e-17 -2.20237e-23 -1.35515e-10 3.241e-07 9.50272e+06 413696 1.36922e-34 6.20808e-38 -1.5752e-26

Similar case while printing c_out0 tile slice h0 as well
I have gathered my test findings here : https://docs.google.com/spreadsheets/d/1nOgcMfZjC37mmB0J1ddDyihpQU21RKVZQWR6dr8-NIc/edit#gid=783934660

  1. I tried using DPRINT in the reader/writer kernel to check the output values but it did not show up in the log file. does it require any different method like DPRINT_PACK, DPRINT_UNPACK ?

@KalaivaniMCW
Copy link
Contributor

For ttnn.exp2, the range TT output matches with torch result , as tested in Wormhole machine for both bf16 and bf8b dtypes
image

@KalaivaniMCW
Copy link
Contributor

KalaivaniMCW commented Oct 22, 2024

The precision issue related to exp2 is being tracked here: #13002

@KalaivaniMCW
Copy link
Contributor

KalaivaniMCW commented Oct 22, 2024

@npetrovic-tenstorrent
The precision issue related to exp2 is being tracked here: #13002
would you like an documentation-update about the working range for the ops mentioned in this ticket or how would you like to proceed ?

@prajaramanTT
Copy link

@npetrovic-tenstorrent Can we consider this ticket done ? If yes, can you please close this ? Thanks

@npetrovic-tenstorrent
Copy link
Contributor Author

npetrovic-tenstorrent commented Jan 14, 2025

Still valid it seems, as PCC for the test is still around 0.92-0.93. I updated the command for running, as it reported the file not found problem when I first tried it.

@npetrovic-tenstorrent Can we consider this ticket done ? If yes, can you please close this ? Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants