Skip to content

Releases: dingo-actual/infini-transformer

v0.2.7 - Beta

04 May 17:33
Compare
Choose a tag to compare

YaRN has now been implemented.

Additionally, position embedders are no longer implicitly instantiated through keyword arguments to CompressiveMemory, InfiniTransformer, or MoDInfiniTransformer. Now, the classes RoPEEmbeddings and YaRNEmbeddings are exposed and can be passed to CompressiveMemory, InfiniTransformer, and MoDInfiniTransformer via the position_embedder argument.

v0.2.6 - Positional Embeddings

03 May 10:38
Compare
Choose a tag to compare

Implemented RoPE embeddings from "RoFormer: Enhanced Transformer with Rotary Position Embedding" by Su et. al. (https://arxiv.org/abs/2104.09864). This is the first step toward the implementation of best practices for positional embeddings: a combination of YaRN (https://arxiv.org/abs/2309.00071) with PoSE (https://arxiv.org/abs/2309.10400).

Note that positional embeddings only affect the SDP attention portion of CompressiveMemory. The calculations for the recurrent memory-based attention component are carried out along the key/value dimension and therefore are unable to utilize positional information. As such, the utility of adding positional embeddings for a given transformer block will be dependent on it learned mixing parameters ($\beta$) between the two kinds of attention. The full impact of including positional embeddings will have to be tested empirically (which I lack the resources to perform).

Using RoPE with either InfiniTransformer or MoDInfiniTransformer is as simple as adding positional_embeddings="rope" when creating either module.