Releases · dingo-actual/infini-transformer

Implemented RoPE embeddings from "RoFormer: Enhanced Transformer with Rotary Position Embedding" by Su et. al. (https://arxiv.org/abs/2104.09864). This is the first step toward the implementation of best practices for positional embeddings: a combination of YaRN (https://arxiv.org/abs/2309.00071) with PoSE (https://arxiv.org/abs/2309.10400).

Note that positional embeddings only affect the SDP attention portion of CompressiveMemory. The calculations for the recurrent memory-based attention component are carried out along the key/value dimension and therefore are unable to utilize positional information. As such, the utility of adding positional embeddings for a given transformer block will be dependent on it learned mixing parameters ($\beta$) between the two kinds of attention. The full impact of including positional embeddings will have to be tested empirically (which I lack the resources to perform).

Using RoPE with either InfiniTransformer or MoDInfiniTransformer is as simple as adding positional_embeddings="rope" when creating either module.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: dingo-actual/infini-transformer

v0.2.7 - Beta

v0.2.6 - Positional Embeddings