MLKV: Multi-Layer Key-Value Sharing

Experiments on EleutherAI's Pythia models

Setup

git clone https://github.com/zaydzuhri/pythia-mlkv.git
cd pythia-mlkv
pip install -r requirements.txt

Convert Pythia models to MQA/GQA/MLKV models

git lfs install
git clone https://huggingface.co/EleutherAI/pythia-160m-deduped
rm -rf pythia-160m-deduped/.git
python3 convert_to_mlkv.py --weights_path pythia-160m-deduped --num-key-value-layers 6 --num-key-value-heads 1

Here are all the 8+1 configs needed for all experiments:

Name	Num. of layers	Num. of attention heads	Num. of layers with KV heads (num-key-value-layers)	Num. of KV heads in a layer (num-key-value-heads)	Total num. of KV heads	Num. of parameters
MHA-144	12	12	12	12	144	160M
GQA-48	12	12	12	4	48	160M
MLKV-48	12	12	4	12	48	160M
MQA-12	12	12	12	1	12	160M
MLKV-12	12	12	4	3	12	160M
MLKV-6	12	12	6	1	6	160M
MLKV-4	12	12	4	1	4	160M
MLKV-2	12	12	2	1	2	160M
MLKV-1	12	12	1	1	1	160M

Uptraining

The dataset has been prepared to Huggingface, so you can directly uptrain:

CUDA_VISIBLE_DEVICES=0,1 python3 uptrain.py --output-dir pythia-160m-mlkv-6-b12-g2-v1 --model pythia-160m-deduped_mlkv_6_1 --batch-size 12 --gradient-accumulate-every 1 --learning-rate 6e-4 --warmup-ratio 0.2  --wandb pythia-160m-mlkv-6-b12-g2-v1

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
gpt_neox		gpt_neox
gpt_neox_mlkv		gpt_neox_mlkv
lm_eval_results		lm_eval_results
.gitignore		.gitignore
README.md		README.md
convert_to_mlkv.py		convert_to_mlkv.py
experiments.ipynb		experiments.ipynb
figures.ipynb		figures.ipynb
lm_eval.ps1		lm_eval.ps1
measure.ps1		measure.ps1
measure_memory.py		measure_memory.py
measurements.csv		measurements.csv
pack_data.py		pack_data.py
prepare_data.py		prepare_data.py
requirements.txt		requirements.txt
steps.ipynb		steps.ipynb
truncate_data.py		truncate_data.py
uptrain.py		uptrain.py
uptrain_tune.py		uptrain_tune.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MLKV: Multi-Layer Key-Value Sharing

Experiments on EleutherAI's Pythia models

Setup

Convert Pythia models to MQA/GQA/MLKV models

Uptraining

About

Releases

Packages

Languages

zaydzuhri/pythia-mlkv

Folders and files

Latest commit

History

Repository files navigation

MLKV: Multi-Layer Key-Value Sharing

Experiments on EleutherAI's Pythia models

Setup

Convert Pythia models to MQA/GQA/MLKV models

Uptraining

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages