Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add alignment matrix learning #1108

Closed
wants to merge 4 commits into from
Closed

Add alignment matrix learning #1108

wants to merge 4 commits into from

Conversation

iPRET
Copy link

@iPRET iPRET commented May 15, 2024

Added alignment matrix learning to training. See issue #1105 for more details.
Added command line arguments
--alignment-matrix for specifying a file to read alignments from.
--alignment-matrix-weight for specifying loss coefficient for alignment matrix cross entropy.
--attention-alignment-layer for specifying layer of decoder in which attention alignment will happen.
--align-attentions for telling model to learn alignments when that's impossible to infer from other command line arguments.
--shift-alignments for telling data preparation to shift alignments one target token forward, and translation one target token backward.

Report on performance impact:
Sockeye_Alignment_Matrix_Report-6.pdf

Pull Request Checklist

  • Changes are complete (if posting work-in-progress code, prefix your pull request title with '[WIP]'
    until you can check this box.
  • Unit tests pass (pytest)
  • Were system tests modified? If so did you run these at least 5 times to account for the variation across runs?
  • System tests pass (pytest test/system)
  • Passed code style checking (./style-check.sh)
  • You have considered writing a test
  • Updated major/minor version in sockeye/__init__.py. Major version bump if this is a backwards incompatible change.
  • Updated CHANGELOG.md

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@iPRET
Copy link
Author

iPRET commented May 16, 2024

So I ran the style_check.sh (My appologies that I didn't do it before the pull request) (ᗒᗣᗕ)՞.
.
One of the issues is a type error.
I modified TransformerDecoder.decode_seq to return Tuple[Tensor, Tensor] istead of the previous Tensor.
This is necessary because the decoder now has to return both the translation and the attention matrices.
But it doesn't fit what the abstract class Decoder.decode_seq expects of the function.
I see a couple paths forward here:

  1. Make decode_seq return a dictionary of results where keys correspond to different tensors.
  2. Make Decoder.decode_seq return Tuple[Tensor, Tensor]
  3. Make a new function called decode_seq_and_attention, and modify the logic everywhere that calls decode_seq.
  4. Perhaps something completely different?

Which approach do you think might be most pleasing?
Thanks,
IP

@mjdenkowski
Copy link
Contributor

Hi Ingus,

It looks like you've created a thorough implementation and tested it extensively. Given the size of the pull request and our current priorities, it may be some time before we get to this. We'll follow up when we start to review the code and your report.

Best,
Michael

@iPRET
Copy link
Author

iPRET commented May 17, 2024

Hello Michael.

Before You do the code review, can you please give a quick opinion on the type error, so I can fix it up? (It's the last problem holding me from ticking the style_check.sh box) ༼ つ ◕.◕ ༽つ

Thanks,
IP

@mjdenkowski
Copy link
Contributor

Hi Ingus,

Thank you for your interest in using and contributing to Sockeye!

Unfortunately, we were not able to include your new feature before the cutoff for transitioning Sockeye to maintenance mode. We appreciate the amount of work that went into developing and testing this feature and encourage you to share your extended version of Sockeye with others.

Best wishes on your future projects!

--Michael

@mjdenkowski mjdenkowski closed this Jun 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants