Alpaca-LoRA-RLHF-PyTorch

a full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware

Environment Setup

穷人卡：2080Ti 12G
torch==2.0.0
cuda==11.8

Todo List

Run

Supervised Finetune

 check src/peft/utils/save_and_load.py , Only comment the line 52 to # #to_return = {k: v for k, v in to_return.items() if (("lora_" in k and adapter_name in k) or ("bias" in k))}

python supervised_finetune.py --base_model 'decapoda-research/llama-7b-hf' --data_path 'yahma/alpaca-cleaned' --output_dir './lora-alpaca' --num_epochs 1

Merge PEFT adapter into Model

pip uninstall peft -y
pip install peft==0.2.0  # 0.3.0.dev0 raise many errors
python merge_peft_adapter.py --model_name ./alpaca-lora

Train Reward Model

python train_reward_model.py --model_name 'decapoda-research/llama-7b-hf' --gradient_accumulation_steps 32 --per_device_train_batch_size 1 --train_subset 100 --eval_subset 10 --local_rank 0 --bf16 False

Merge Reward adapter into Model

python merge_peft_adapter.py --model_name ./alpaca-lora-reward-model

Tuning LM with PPO

python tuning_lm_with_rl.py --model_name './lora-alpaca-adapter-merged' --reward_model_name './lora-alpaca-reward-model-adapter-merged' --adafactor False --tokenizer_name 'decapoda-research/llama-7b-hf' --save_freq 100 --output_max_length 128 --batch_size 1 --gradient_accumulation_steps 1 --batched_gen True --ppo_epochs 1 --seed 0 --learning_rate 1.4e-5 --early_stopping True --output_dir './checkpoints/tuning_llama_rl'

Notes

第一步SFT之前，切记有个注意事项，需要检查下安装的peft代码， src/peft/utils/save_and_load.py , 如果 line 52 有这行代码 #to_return = {k: v for k, v in to_return.items() if (("lora_" in k and adapter_name in k) or ("bias" in k))}，需要将其注释掉，否则在finetune完之后，保存不了 adapter model 的参数。切记！
PEFT的版本，目前从git上安装的是 0.3.0.dev0 版本，在merge_peft_adapter的时候有问题，需要切换到peft==0.2.0 (0.3.0.dev0 没有 _get_submodules()这个函数)
train reward model的时候会发生另一个问题： ValueError: weight is on the meta device, we need a value to put in on 0. 需要参看 transformer 在github上的最新代码，我在发现这个问题的时候，隔天发现在transformer的github上 8小时前才刚刚修复了这个问题。
最后一步，代码上基本是ok的，但是本人只有2080Ti的卡，加载完finetune model之后，再加载Reward model的时候直接CUDA out of memory了，所以并未执行。

Reference

utils & templates 来自 alpaca-lora 。

requirements 主要是按照 [alpaca-lora](https://github.com/tloen/ alpaca-lora) 来配环境。

Star-History

Donation

If this project help you reduce time to develop, you can give me a cup of coffee :)

AliPay(支付宝)

WechatPay(微信)

License

MIT © Kun

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
data_loader		data_loader
datasets		datasets
misc		misc
templates		templates
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
merge_peft_adapter.py		merge_peft_adapter.py
requirements.txt		requirements.txt
supervised_finetune.py		supervised_finetune.py
train_reward_model.py		train_reward_model.py
tuning_lm_with_rl.py		tuning_lm_with_rl.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Alpaca-LoRA-RLHF-PyTorch

Table of Contents

Environment Setup

Todo List

Run

Supervised Finetune

Merge PEFT adapter into Model

Train Reward Model

Merge Reward adapter into Model

Tuning LM with PPO

Notes

Reference

Star-History

Donation

License

About

Releases

Packages

Languages

License

jackaduma/Alpaca-LoRA-RLHF-PyTorch

Folders and files

Latest commit

History

Repository files navigation

Alpaca-LoRA-RLHF-PyTorch

Table of Contents

Environment Setup

Todo List

Run

Supervised Finetune

Merge PEFT adapter into Model

Train Reward Model

Merge Reward adapter into Model

Tuning LM with PPO

Notes

Reference

Star-History

Donation

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages