🎉 This is the implementation of EMNLP 2024 main conference paper:Scalable Efficient Training of Large Language Models with Low-dimensional Projected Attention
In the paper/code, we use the WikiText-103 datasets and the Pile datasets, which are all open access on the Internet.
Before running the code, please replace the following data or work path definition with your path:
LPA/scripts/lpa_train_setting1.sh
/LPA/scripts/lpa_train_setting2.sh
:train_data_path
,valid_data_path
,test_data_path
,tokenizer_path
,output_dir
LPA/model_train/lpa_train_setting1.py
/LPA/model_train/lpa_train_setting2.py
: add your work path in Line 15LPA/model_train/lpa_train_setting2.py
: add your wandb project name in Line 107
You can apply LPA by running the following codes:
cd scripts
# model setting 1
bash lpa_train_setting1.sh
# model setting 2
bash lpa_train_setting2.sh
We explain some of the arguments as follows:
-
attn_lowdim
: The hyperparameters$r$ for the low-dimensional module applied in the attention layer. -
ffn_lowdim
: The hyperparameters$r$ for the low-dimensional module applied in the FFN layer. -
low_attn
: The argument to decide whether or not to apply low-dimensional module in the attention layer. Possible value isyes
orno
. -
low_ffn
: The argument to decide whether or not to apply low-dimensional module in the FFN layer. Possible value isyes
orno
.
The argument
low_attn
/low_ffn
can decide whether or not to apply low-dimensional module in the attention / FFN layer. For LPA model, we setlow_attn
to 'yes' andlow_ffn
to 'no'. For original Transformer (our main baseline in the paper), we setlow_attn
andlow_ffn
to 'yes'.
Part of the code in file LPA/architecture/lpa_setting2.py
refers to the implementation of LLaMA model in Huggingface Transformer.
If you have any questions related to the codes or the paper, please contact Xingtai Lv (lvxt24@mails.tsinghua.edu.cn
) or open an issue.
If you find our work useful, please use the following citation:
@misc{lv2024scalableefficienttraininglarge,
title={Scalable Efficient Training of Large Language Models with Low-dimensional Projected Attention},
author={Xingtai Lv and Ning Ding and Kaiyan Zhang and Ermo Hua and Ganqu Cui and Bowen Zhou},
year={2024},
eprint={2411.02063},
archivePrefix={arXiv},
primaryClass={cs.CL},
}