This work doesn't change kernel, but utilize dependency to compute a whole line? #20

ziyuhuang123 · 2024-07-02T11:11:04Z

Your idea is very excellent and I have starred your repo. I want to check my understanding's correctness:

This paper does not modify the kernel implementation but instead considers that different rows in the sequence dimension of Q are independent. Therefore, it calculates from attention to FFN in one go, which quickly consumes intermediate results and allows for the computation of larger sequence lengths.

Is it correct?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This work doesn't change kernel, but utilize dependency to compute a whole line? #20

This work doesn't change kernel, but utilize dependency to compute a whole line? #20

ziyuhuang123 commented Jul 2, 2024

This work doesn't change kernel, but utilize dependency to compute a whole line? #20

This work doesn't change kernel, but utilize dependency to compute a whole line? #20

Comments

ziyuhuang123 commented Jul 2, 2024