Support CE after grad acc fix #375

ByronHsu · 2024-11-12T19:34:35Z

Summary

Based on #374, but make it leaner

The use of cross entropy in model code has changed after grad fix
It changed from module CrossEntropy to functional cross_entropy
Our monkey patching needs to change accordingly
While also make sure backward compatibility by adding a condition for different versions

Notable Changes

Add a functional api for CE to take keyword args
Add back conv test with logits to test CE convergence
Add back comp test for transformers 4.44

Testing Done

Hardware Type:
run make test to ensure correctness
run make checkstyle to ensure code style
run make test-convergence to ensure convergence

hongpeng-guo · 2024-11-14T08:30:17Z

Thanks for providing this exemplary PR! While I am working on enabling kwargs for all other operators, I meet a few questions and would like to hear your suggestions. 😄

For the function signature, should we try to follow their counter-parts' signatures in torch.nn.Function? I found there are a few cases that make it hard to make the signature compatible with torch, i.e:

rms_norm and layer_norm in torch ask for non-optional arg normalized_shape that is not required in liger;
group_norm in liger ask for non-optional num_channels which is not required in torch;
gelu in liger takes two args a, b as inputs, but for torch, it only asks for one input

There are also ops that is not available in torch.nn.Functional, i.e., the fused ops and jsd ops.

What would be a good strategy here to redefine the function signature here? cc @ByronHsu

ByronHsu · 2024-11-14T08:41:04Z

Thanks @hongpeng-guo! I think for now we don't need to be restricted by torch since some of the layers like rmsnorm is not actually taken from torch. Let's just put whatever arg liger has as kwargs and don't worry about torch. We can do some adjustment once we receive more community feedback.

ByronHsu added 2 commits November 12, 2024 19:34

support CE after grad acc fix

713fa41

fix modal code

a146592

ByronHsu changed the title ~~support CE after grad acc fix~~ Support CE after grad acc fix Nov 12, 2024

ByronHsu added 2 commits November 12, 2024 20:10

fix backward comp

b5d3bc3

improve ci test name

f7fb0aa

ByronHsu merged commit 5ef09d5 into main Nov 12, 2024
3 checks passed

ByronHsu deleted the byhsu/fix-ce branch November 12, 2024 20:49

This was referenced Nov 12, 2024

LigerCrossEntropyLoss is not patched for latest transformers models #369

Closed

Keyword arguments for liger functional #368

Closed

Fix cross entropy patch for LLama #374

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support CE after grad acc fix #375

Support CE after grad acc fix #375

ByronHsu commented Nov 12, 2024 •

edited

Loading

hongpeng-guo commented Nov 14, 2024

ByronHsu commented Nov 14, 2024

Support CE after grad acc fix #375

Support CE after grad acc fix #375

Conversation

ByronHsu commented Nov 12, 2024 • edited Loading

Summary

Testing Done

hongpeng-guo commented Nov 14, 2024

ByronHsu commented Nov 14, 2024

ByronHsu commented Nov 12, 2024 •

edited

Loading