Skip to content
This repository has been archived by the owner on Oct 25, 2024. It is now read-only.

Commit

Permalink
add ppo rl_training part (#634)
Browse files Browse the repository at this point in the history
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
  • Loading branch information
sywangyi authored Nov 23, 2023
1 parent 4101a80 commit 936c2d2
Show file tree
Hide file tree
Showing 12 changed files with 4,244 additions and 4 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -43,3 +43,25 @@ multi card finetunes
```
python ../instruction/gaudi_spawn.py --world_size 8 --use_mpi reward_modeling.py --model_name_or_path meta-llama/Llama-2-7b-hf --output_dir <output> --log_level info --num_train_epochs 1 --use_habana --use_lazy_mode --hf_access_token xxxxxx --ddp_find_unused_parameters True
```

## 5. Reinforcement Fine-tuning

### Training on CUDA
```
accelerate launch --multi_gpu --num_machines 1 --num_processes 8 rl_training.py --log_with=wandb --model_name=meta-llama/Llama-2-7b-hf --reward_model_name=output_se --adafactor=False --tokenizer_name=meta-llama/Llama-2-7b-hf --save_freq=100 --output_max_length=128 --batch_size=8 --gradient_accumulation_steps=8 --batched_gen=True --ppo_epochs=4 --seed=0 --learning_rate=1.4e-5 --early_stopping=True --output_dir=llama-se-rl-finetune-128-8-8-1.4e-5_adam --hf_access_token xxxxxx
```

### Training on Habana

Follow install guidance in [optimum-habana](https://github.com/huggingface/optimum-habana)

single card finetune

```
python3 rl_training.py --model_name=meta-llama/Llama-2-7b-hf --reward_model_name=<output_rm> --adafactor=False --tokenizer_name=meta-llama/Llama-2-7b-hf --save_freq=100 --output_max_length=128 --batch_size=8 --mini_batch_size=1 --gradient_accumulation_steps=8 --batched_gen=True --ppo_epochs=4 --seed=0 --learning_rate=1.4e-5 --early_stopping=True --output_dir=llama-se-rl-finetune-128-8-8-1.4e-5_adam --hf_access_token xxxxxx --use_habana
```

multi card finetunes
```
python3 ../instruction/gaudi_spawn.py --world_size 8 --use_mpi rl_training.py --model_name=meta-llama/Llama-2-7b-hf --reward_model_name=<output_rm> --adafactor=False --tokenizer_name=meta-llama/Llama-2-7b-hf --save_freq=100 --output_max_length=128 --batch_size=8 --mini_batch_size=1 --gradient_accumulation_steps=8 --batched_gen=True --ppo_epochs=4 --seed=0 --learning_rate=1.4e-5 --early_stopping=True --output_dir=llama-se-rl-finetune-128-8-8-1.4e-5_adam --hf_access_token xxxxxx --use_habana
```
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,5 @@ datasets
bitsandbytes
evaluate
scikit-learn
intel-extension-for-transformers
tyro
Original file line number Diff line number Diff line change
Expand Up @@ -199,14 +199,14 @@ def preprocess_function(examples):
"input_ids_k": [],
"attention_mask_k": [],
}
for question, response_j, response_k in zip(
examples["question"], examples["chatgpt"], examples["llama2-13b-chat"]
for system, question, response_j, response_k in zip(
examples["system"], examples["question"], examples["chatgpt"], examples["llama2-13b-chat"]
):
tokenized_j = tokenizer(
"Question: " + question + "\n\nAnswer: " + response_j, truncation=True
system + question + response_j, truncation=True
)
tokenized_k = tokenizer(
"Question: " + question + "\n\nAnswer: " + response_k, truncation=True
system + question + response_k, truncation=True
)

new_examples["input_ids_j"].append(tokenized_j["input_ids"])
Expand Down
Loading

0 comments on commit 936c2d2

Please sign in to comment.