padding mask and attention mask #11

zackchen-lb · 2020-07-12T18:12:15Z

Hi there,

You've done a great job and thanks for the sharing. I'm wondering how you deal with the masking stuff in Linformer since the attention shape, key and value shape have now changed to (n, k) instead of (n, n). I didn't find these in the code. Thanks for your time!

tatp22 · 2020-07-12T21:45:08Z

Hi @ZekaiChen,

I'm not sure I completely understand the question, so I will break it up into two parts, hopefully one of them will answer your question:

If you are looking for a way to pad your inputs, check out my Padder class, which can be found here. Unfortunately, I have not have time to test it on the LinfromerLM module, and I will be doing that in the upcoming days, hopefully.

As for masking, that's the next thing that I want to get done off my bucket list. Look for it in a coming update.

If this is not what you wanted answered, let me know! I'll try and help as much as I can

zackchen-lb · 2020-07-12T22:22:13Z

Hi @tatp22 ,

Thanks for the feedback. I was exactly talking about the masking. Thanks for letting me know.

tatp22 · 2020-07-14T12:55:28Z

Hi @ZekaiChen,

Actually, what you said is kinda a dilemma, since I actually cannot mask it in the traditional sense (the traditional sense being exactly what you said before). Therefore, my idea was to just mask the Q, K, and V matrices directly, similar to what is done here. I think this is the best way to do it, and I will mask it like this; let me know if you think this is good, and if not, I am open up for ideas.

In theory, I think it should work the same (I can come up with a quick proof or something, maybe), and this is the only logical way that I see that we can pad the inputs. I'll leave this issue open until I make the pull request.

tatp22 · 2020-07-15T14:48:06Z

I added more information in the PR. If it looks good, let me know 👍, otherwise I will just merge it, since it really doesn't affect the rest of the code too much

zackchen-lb · 2020-07-15T20:04:51Z

I agree with masking projection matrices directly and thanks for the updates. But there could be some other solutions maybe? I'm also working on this but based on fairseq. Thanks again.

tatp22 · 2020-07-15T21:26:00Z

Not sure what else to think of 🤷 This is the best solution I could come up with; I even emailed the authors of the paper about this, but no response, unfortunately...

If no one says anything else, I will merge this soon.

tatp22 · 2020-07-16T09:56:01Z

Merging

tatp22 · 2020-07-16T11:01:38Z

Actually, sorry to keep on closing and opening this up again, but another possibility to mask would be to mask in the traditional sense, but instead of the entire (n,n) masking matrix, we can just make an (n,k) masking matrix, and then make it upper triangular as well, but just have the second dimension be cut off. This can work as well, let me know what you think.

tatp22 · 2020-07-24T09:26:30Z

Hey, I added another upper triangular padding mask option for causal sequences, let me know what you think. It's up on the latest commit, and if the dims are (n,k), it simply cuts off the right part of the matrix.

I'll close this issue now, since I added both options and now users can use whichever one they want.

zackchen-lb · 2020-07-24T19:12:16Z

Hey, I added another upper triangular padding mask option for causal sequences, let me know what you think. It's up on the latest commit, and if the dims are (n,k), it simply cuts off the right part of the matrix.

I'll close this issue now, since I added both options and now users can use whichever one they want.

Thanks for the updates. Wonderful solution.

zackchen-lb closed this as completed Jul 12, 2020

tatp22 reopened this Jul 14, 2020

tatp22 linked a pull request Jul 15, 2020 that will close this issue

Added masking #12

Merged

tatp22 closed this as completed Jul 16, 2020

tatp22 mentioned this issue Jul 16, 2020

Any result on any benchmark? #13

Open

tatp22 reopened this Jul 16, 2020

tatp22 closed this as completed Jul 24, 2020

tatp22 mentioned this issue Sep 10, 2020

causal_mask of the decoder #16

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

padding mask and attention mask #11

padding mask and attention mask #11

zackchen-lb commented Jul 12, 2020 •

edited

Loading

tatp22 commented Jul 12, 2020

zackchen-lb commented Jul 12, 2020

tatp22 commented Jul 14, 2020 •

edited

Loading

tatp22 commented Jul 15, 2020

zackchen-lb commented Jul 15, 2020

tatp22 commented Jul 15, 2020

tatp22 commented Jul 16, 2020

tatp22 commented Jul 16, 2020

tatp22 commented Jul 24, 2020

zackchen-lb commented Jul 24, 2020

padding mask and attention mask #11

padding mask and attention mask #11

Comments

zackchen-lb commented Jul 12, 2020 • edited Loading

tatp22 commented Jul 12, 2020

zackchen-lb commented Jul 12, 2020

tatp22 commented Jul 14, 2020 • edited Loading

tatp22 commented Jul 15, 2020

zackchen-lb commented Jul 15, 2020

tatp22 commented Jul 15, 2020

tatp22 commented Jul 16, 2020

tatp22 commented Jul 16, 2020

tatp22 commented Jul 24, 2020

zackchen-lb commented Jul 24, 2020

zackchen-lb commented Jul 12, 2020 •

edited

Loading

tatp22 commented Jul 14, 2020 •

edited

Loading