Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PCC drop in T5 google/flan-t5-large variant #783

Open
kamalrajkannan78 opened this issue Nov 27, 2024 · 0 comments
Open

PCC drop in T5 google/flan-t5-large variant #783

kamalrajkannan78 opened this issue Nov 27, 2024 · 0 comments
Labels
bug Something isn't working
Milestone

Comments

@kamalrajkannan78
Copy link
Contributor

kamalrajkannan78 commented Nov 27, 2024

Description

  • notable pcc drop of 0.9722469167498842 is observed in the google/flan-t5-large model variant,By running decoder alone drop of 0.9857664543253748 is observed.

  • A detailed analysis of PCC changes with respect to the number of decoder layers reveals the following

Number of decoder layers pcc
1 1.0
12 0.9999998585756898
18 0.9998312374946592
23 0.9939849829870099
full decoder (24) 0.9857664543253748
  • From the table, it is evident that the PCC drop is negligible in the earlier decoder layers but becomes notable only at the final decoder block. Despite the small and gradual decrease as the layer count increases, pinpointing the operation responsible for this issue has proven challenging. Further investigation is needed to isolate and address the root cause.

Reproduce

git checkout kkannan/t5_fix_dec2
git submodule update --recursive
cmake --build build -- install_ttforge

# to run whole model 
pytest forge/test/models/pytorch/text/t5/test_t5.py::test_t5_generation[no_device-google_flan_t5_large] -svv

# to run decoder alone 
pytest forge/test/models/pytorch/text/t5/test_t5_sanity.py -svv

Note : twig the num_decoder_layers to get table results

Observed Behaviour

Tensor mismatch on Framework vs. Forge codegen output 0
Max ATOL Delta: 7.033e+00, atol=5.368640899658203
Max RTOL Delta: 4.264e+00, rtol=0
PCC got=0.9722469167498842, required=0.99999

Logs

@kamalrajkannan78 kamalrajkannan78 added the bug Something isn't working label Nov 27, 2024
@kamalrajkannan78 kamalrajkannan78 added this to the [FFE] Bringup milestone Nov 27, 2024
@kamalrajkannan78 kamalrajkannan78 self-assigned this Nov 27, 2024
@kamalrajkannan78 kamalrajkannan78 removed their assignment Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant