Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ignore_index and label to jsd and fl-jsd #306

Merged
merged 9 commits into from
Oct 16, 2024

Conversation

Tcc0403
Copy link
Collaborator

@Tcc0403 Tcc0403 commented Oct 12, 2024

Summary

Resolve #277.

Testing Done

  • Hardware Type: gpu-ci
  • run make test to ensure correctness
  • run make checkstyle to ensure code style
  • run make test-convergence to ensure convergence

@Tcc0403 Tcc0403 marked this pull request as ready for review October 12, 2024 17:31
@lancerts lancerts requested a review from yundai424 October 13, 2024 16:05
tl.store(dX_ptr + offsets, dX, mask=mask)


MAX_FUSED_SIZE = 65536


def jsd_forward(_input, target, beta):
def jsd_forward(_input, target, label, beta, ignore_index, has_label):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be wrong -- if i'm understanding it correctly, we currently have an intrinsic assumption that the label is shifted already. It would be helpful to specify this requirement and provide an example of what kind of input we'll expect in this case 🤔

Copy link
Collaborator Author

@Tcc0403 Tcc0403 Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I added some examples in transformers files, and renamed it to shift_labels.

@Tcc0403 Tcc0403 requested a review from yundai424 October 15, 2024 22:18
Copy link
Collaborator

@yundai424 yundai424 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall LGTM just a very minor suggestion!

beta,
n_rows,
n_non_ignore,
ignore_index,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this could be a constexpr

@Tcc0403 Tcc0403 requested a review from yundai424 October 16, 2024 01:15
@yundai424 yundai424 merged commit 24a7efc into linkedin:main Oct 16, 2024
2 checks passed
@Tcc0403 Tcc0403 deleted the jsd-ignore-index branch December 1, 2024 03:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adding ignore index support for divergence losses
3 participants