Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Normalization Term in Distillation Loss #442

Closed

Conversation

austin362667
Copy link
Collaborator

Summary

Based on this modification e381569#r1875638713 introduced by @shivam15s.

I believe it's more accurate to compute chunked loss by normalizing it based on the "number of chunks". Please correct me if I'm mistaken—thanks!

Testing Done

  • Hardware Type:
  • run make test to ensure correctness
  • run make checkstyle to ensure code style
  • run make test-convergence to ensure convergence

Signed-off-by: Austin Liu <austin362667@gmail.com>
Signed-off-by: Austin Liu <austin362667@gmail.com>
@austin362667 austin362667 changed the title Fix normalization term in distillation loss Fix Normalization Term in Distillation Loss Dec 9, 2024
@shivam15s
Copy link
Collaborator

Hey @austin362667 that's a valid way to do normalization. However, I want to be consistent with the preference base and also avoid ops involving chunk sizes/shapes as this can change in the future.

@shivam15s shivam15s closed this Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants