Skip to content

Fine-tuning LLMs with Reinforcement Learning from Human Feedback (RLHF)

Notifications You must be signed in to change notification settings

kushaangowda/rlhf_for_llm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Training open-source LLMs using RLHF

This project focuses on the fine-tuning of large language models via Reinforcement Learning with Human Feedback (RLHF). Our primary objective is to enhance Google's Text to Text Transfer Transformer (T5) model using the OpenHermesPreferences dataset. For the optimization process, we employ Proximal Policy Optimization (PPO) to refine the model's performance in generating text that aligns more closely with human preferences and values. We used PairRM as the reward model.

Ref: https://huggingface.co/blog/rlhf

Setup

  1. Clone this repository:

    git clone https://github.com/gtamer2/rl_final_project.git
  2. Install the dependencies:

    pip install -r requirements.txt

Model Training

Execute the training script with the following command:

python main.py --model_name="google-t5/t5-small" --batch_size=32 --epochs=200 --mode="train"

Parameters

  • batch_size: Batch size for training.
  • epochs: Number of training epochs.
  • model_name: LLM Model.
  • lr: Learning rate for the optimizer.
  • model_save_path: Path to save the trained model.
  • rewards_save_path: Path to save the rewards.
  • dataset_size: Number of data samples (Use -1 to train on the entire dataset)
  • seed

Model Prediction

Execute the prediction script with the following command:

python main.py --model_name="my_ppo_model" --batch_size=32 --mode="predict"

Parameters

  • batch_size: Batch size for training.
  • model_name: LLM Model.
  • dataset_size: Number of data samples (Use -1 to generate predictions for the entire test set)

Visualize Reward Curve

To visualize the reward curve, use the following command:

python main.py --rewards_save_path="reward.npy" --mode="visualize"

Results

Model Avg. Reward Avg. BLEU Score Avg. BERT Score
T5 Original -11.2550 0.0024 0.0273
T5 with RLHF -4.7752 0.0143 0.0339

Evaluation Curves

Average Reward Curve:

Team Members

About

Fine-tuning LLMs with Reinforcement Learning from Human Feedback (RLHF)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published