GitHub - iafarhan/causal-synthesizer-multihead-attention: Synthesizer Self-Attention is a very recent alternative to causal self-attention that has potential benefits by removing this dot product.

Causal and Synthesizer Multihead-Attention

We have implemented different variants of MultiheadAttention mechanisms.

CausalSelfAttention
SynthesizerSelfAttention

Causal Self-Attention is the vanilla multi-head masked Self-attention layer with a projection at the end. we have used scaled-dot product as our scoring function in this case.

Synthesizer Self-Attention is a very recent alternative to causal self-attention that has potential benefits by removing this dot product. In vanilla self-attention the scoring function returns a block_size * block_size attention scores. This computation is quaratic in the sequence's length. Synthesizer self-attention overcomes this and computes the block_size * block_size matrix of attention scores directly. it is inspired from Synthesizer: Rethinking Self-Attention in Transformer Models

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.vscode		.vscode
LICENSE		LICENSE
README.md		README.md
causal.py		causal.py
synthesizer.py		synthesizer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Causal and Synthesizer Multihead-Attention

About

Releases

Packages

Languages

License

iafarhan/causal-synthesizer-multihead-attention

Folders and files

Latest commit

History

Repository files navigation

Causal and Synthesizer Multihead-Attention

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages