This is an unofficial Pytorch implementation of the Infini Attention mechanism introduced in the paper : "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention". Note that the official code for the paper has not been released yet. In case of issues, add a PR (add an explanation of the changes made and why so?)
-
Updated
Sep 8, 2024 - Python