The open source implementation of the multi grouped query attention by the paper "GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints"
-
Updated
Dec 11, 2023 - Python
The open source implementation of the multi grouped query attention by the paper "GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints"
Generative models nano version for fun. No STOA here, nano first.
Add a description, image, and links to the multiqueryattention topic page so that developers can more easily learn about it.
To associate your repository with the multiqueryattention topic, visit your repo's landing page and select "manage topics."