Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port new seq2seq tutorial #130

Closed
chsasank opened this issue Aug 30, 2017 · 8 comments · Fixed by #2452
Closed

Port new seq2seq tutorial #130

chsasank opened this issue Aug 30, 2017 · 8 comments · Fixed by #2452
Assignees
Labels
docathon-h1-2023 A label for the docathon in H1 2023 medium module: torchtext

Comments

@chsasank
Copy link
Contributor

chsasank commented Aug 30, 2017

From https://github.com/spro/practical-pytorch/tree/master/seq2seq-translation

cc @pytorch/team-text-core @Nayef211

@chsasank chsasank self-assigned this Aug 30, 2017
@poweihuang17
Copy link

BTW, what's the attention model implemented in official tutorial? The Bahdanau et al. model is said to be implemented, but it's actually not that one. The code used input embedding and hidden state to calculate the attention weight, but the Bahdanau et al. model should not be like this.

@wzpfish
Copy link

wzpfish commented Mar 12, 2019

BTW, what's the attention model implemented in official tutorial? The Bahdanau et al. model is said to be implemented, but it's actually not that one. The code used input embedding and hidden state to calculate the attention weight, but the Bahdanau et al. model should not be like this.

I think it's a bug. Never seen the attention of this form before...

@bbruceyuan
Copy link

@wzpfish In my opinion, it's a wrong implementation of using MLP to compute attention scores. All of the encoder hidden states should be used.

@wzpfish
Copy link

wzpfish commented Apr 1, 2019

@hey-bruce Look at the tutorial at https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html

attn_weights = F.softmax(self.attn(torch.cat((embedded[0], hidden[0]), 1)), dim=1)

This attention weights do not use any of the encoder hidden states. I think there is something wrong.

@rajarsheem
Copy link

@wzpfish Exactly, I raised an issue in their discuss forum. https://discuss.pytorch.org/t/possible-bug-in-seq2seq-tutorial/48241

Shortly afterwards, I saw your comment when I was searching for similar issue here if someone have already raised.

@karkirowle
Copy link

karkirowle commented Mar 25, 2020

I was just studying this, and I came to the same realisation. The annotations should be used instead of the embedded output words, which would be the encoder outputs. It's not a one-line change, I think.
Also, Bahdanau uses bidirectional recurrent neural networks.

@karkirowle
Copy link

I can make a pull request stating something like "this is being reworked", but the tutorial is not appropriate in this form, and this really needs to be changed as soon as possible. You can even see that the learnt attention is wrong. I can start working on making an updated version of this tutorial eventually if needed.

@QasimKhan5x
Copy link
Contributor

/assigntome

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment