Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

padding mask and attention mask #11

Closed
zackchen-lb opened this issue Jul 12, 2020 · 10 comments · Fixed by #12
Closed

padding mask and attention mask #11

zackchen-lb opened this issue Jul 12, 2020 · 10 comments · Fixed by #12

Comments

@zackchen-lb
Copy link

zackchen-lb commented Jul 12, 2020

Hi there,

You've done a great job and thanks for the sharing. I'm wondering how you deal with the masking stuff in Linformer since the attention shape, key and value shape have now changed to (n, k) instead of (n, n). I didn't find these in the code. Thanks for your time!

@tatp22
Copy link
Owner

tatp22 commented Jul 12, 2020

Hi @ZekaiChen,

I'm not sure I completely understand the question, so I will break it up into two parts, hopefully one of them will answer your question:

If you are looking for a way to pad your inputs, check out my Padder class, which can be found here. Unfortunately, I have not have time to test it on the LinfromerLM module, and I will be doing that in the upcoming days, hopefully.

As for masking, that's the next thing that I want to get done off my bucket list. Look for it in a coming update.

If this is not what you wanted answered, let me know! I'll try and help as much as I can

@zackchen-lb
Copy link
Author

Hi @tatp22 ,

Thanks for the feedback. I was exactly talking about the masking. Thanks for letting me know.

@tatp22
Copy link
Owner

tatp22 commented Jul 14, 2020

Hi @ZekaiChen,

Actually, what you said is kinda a dilemma, since I actually cannot mask it in the traditional sense (the traditional sense being exactly what you said before). Therefore, my idea was to just mask the Q, K, and V matrices directly, similar to what is done here. I think this is the best way to do it, and I will mask it like this; let me know if you think this is good, and if not, I am open up for ideas.

In theory, I think it should work the same (I can come up with a quick proof or something, maybe), and this is the only logical way that I see that we can pad the inputs. I'll leave this issue open until I make the pull request.

@tatp22 tatp22 reopened this Jul 14, 2020
@tatp22 tatp22 linked a pull request Jul 15, 2020 that will close this issue
@tatp22
Copy link
Owner

tatp22 commented Jul 15, 2020

I added more information in the PR. If it looks good, let me know 👍, otherwise I will just merge it, since it really doesn't affect the rest of the code too much

@zackchen-lb
Copy link
Author

I agree with masking projection matrices directly and thanks for the updates. But there could be some other solutions maybe? I'm also working on this but based on fairseq. Thanks again.

@tatp22
Copy link
Owner

tatp22 commented Jul 15, 2020

Not sure what else to think of 🤷 This is the best solution I could come up with; I even emailed the authors of the paper about this, but no response, unfortunately...

If no one says anything else, I will merge this soon.

@tatp22
Copy link
Owner

tatp22 commented Jul 16, 2020

Merging

@tatp22
Copy link
Owner

tatp22 commented Jul 16, 2020

Actually, sorry to keep on closing and opening this up again, but another possibility to mask would be to mask in the traditional sense, but instead of the entire (n,n) masking matrix, we can just make an (n,k) masking matrix, and then make it upper triangular as well, but just have the second dimension be cut off. This can work as well, let me know what you think.

@tatp22 tatp22 reopened this Jul 16, 2020
@tatp22
Copy link
Owner

tatp22 commented Jul 24, 2020

Hey, I added another upper triangular padding mask option for causal sequences, let me know what you think. It's up on the latest commit, and if the dims are (n,k), it simply cuts off the right part of the matrix.

I'll close this issue now, since I added both options and now users can use whichever one they want.

@tatp22 tatp22 closed this as completed Jul 24, 2020
@zackchen-lb
Copy link
Author

Hey, I added another upper triangular padding mask option for causal sequences, let me know what you think. It's up on the latest commit, and if the dims are (n,k), it simply cuts off the right part of the matrix.

I'll close this issue now, since I added both options and now users can use whichever one they want.

Thanks for the updates. Wonderful solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants