Fix Normalization Term in Distillation Loss #442

austin362667 · 2024-12-09T09:31:54Z

Summary

Based on this modification e381569#r1875638713 introduced by @shivam15s.

I believe it's more accurate to compute chunked loss by normalizing it based on the "number of chunks". Please correct me if I'm mistaken—thanks!

Testing Done

Hardware Type:
run make test to ensure correctness
run make checkstyle to ensure code style
run make test-convergence to ensure convergence

Signed-off-by: Austin Liu <austin362667@gmail.com>

shivam15s · 2024-12-09T21:14:21Z

Hey @austin362667 that's a valid way to do normalization. However, I want to be consistent with the preference base and also avoid ops involving chunk sizes/shapes as this can change in the future.

austin362667 added 2 commits December 9, 2024 17:23

Fix normalization term

961a888

Signed-off-by: Austin Liu <austin362667@gmail.com>

Format

f9f9dd5

Signed-off-by: Austin Liu <austin362667@gmail.com>

austin362667 changed the title ~~Fix normalization term in distillation loss~~ Fix Normalization Term in Distillation Loss Dec 9, 2024

shivam15s closed this Dec 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Normalization Term in Distillation Loss #442

Fix Normalization Term in Distillation Loss #442

austin362667 commented Dec 9, 2024

shivam15s commented Dec 9, 2024

Fix Normalization Term in Distillation Loss #442

Fix Normalization Term in Distillation Loss #442

Conversation

austin362667 commented Dec 9, 2024

Summary

Testing Done

shivam15s commented Dec 9, 2024