-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
padding mask and attention mask #11
Comments
Hi @ZekaiChen, I'm not sure I completely understand the question, so I will break it up into two parts, hopefully one of them will answer your question: If you are looking for a way to pad your inputs, check out my As for masking, that's the next thing that I want to get done off my bucket list. Look for it in a coming update. If this is not what you wanted answered, let me know! I'll try and help as much as I can |
Hi @tatp22 , Thanks for the feedback. I was exactly talking about the masking. Thanks for letting me know. |
Hi @ZekaiChen, Actually, what you said is kinda a dilemma, since I actually cannot mask it in the traditional sense (the traditional sense being exactly what you said before). Therefore, my idea was to just mask the Q, K, and V matrices directly, similar to what is done here. I think this is the best way to do it, and I will mask it like this; let me know if you think this is good, and if not, I am open up for ideas. In theory, I think it should work the same (I can come up with a quick proof or something, maybe), and this is the only logical way that I see that we can pad the inputs. I'll leave this issue open until I make the pull request. |
I added more information in the PR. If it looks good, let me know 👍, otherwise I will just merge it, since it really doesn't affect the rest of the code too much |
I agree with masking projection matrices directly and thanks for the updates. But there could be some other solutions maybe? I'm also working on this but based on fairseq. Thanks again. |
Not sure what else to think of 🤷 This is the best solution I could come up with; I even emailed the authors of the paper about this, but no response, unfortunately... If no one says anything else, I will merge this soon. |
Merging |
Actually, sorry to keep on closing and opening this up again, but another possibility to mask would be to mask in the traditional sense, but instead of the entire (n,n) masking matrix, we can just make an (n,k) masking matrix, and then make it upper triangular as well, but just have the second dimension be cut off. This can work as well, let me know what you think. |
Hey, I added another upper triangular padding mask option for causal sequences, let me know what you think. It's up on the latest commit, and if the dims are I'll close this issue now, since I added both options and now users can use whichever one they want. |
Thanks for the updates. Wonderful solution. |
Hi there,
You've done a great job and thanks for the sharing. I'm wondering how you deal with the masking stuff in Linformer since the attention shape, key and value shape have now changed to (n, k) instead of (n, n). I didn't find these in the code. Thanks for your time!
The text was updated successfully, but these errors were encountered: