Skip to content

Synthesizer Self-Attention is a very recent alternative to causal self-attention that has potential benefits by removing this dot product.

License

Notifications You must be signed in to change notification settings

iafarhan/causal-synthesizer-multihead-attention

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Causal and Synthesizer Multihead-Attention

We have implemented different variants of MultiheadAttention mechanisms.

  • CausalSelfAttention
  • SynthesizerSelfAttention

Causal Self-Attention is the vanilla multi-head masked Self-attention layer with a projection at the end. we have used scaled-dot product as our scoring function in this case.

Synthesizer Self-Attention is a very recent alternative to causal self-attention that has potential benefits by removing this dot product. In vanilla self-attention the scoring function returns a block_size * block_size attention scores. This computation is quaratic in the sequence's length. Synthesizer self-attention overcomes this and computes the block_size * block_size matrix of attention scores directly. it is inspired from Synthesizer: Rethinking Self-Attention in Transformer Models

About

Synthesizer Self-Attention is a very recent alternative to causal self-attention that has potential benefits by removing this dot product.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages