Sriyash Poddar1, Yanming Wan1, Hamish Ivison1, Abhishek Gupta1, Natasha Jaques1
1University of Washington
This repo is an implementation of the language experiments of VPL. VPL is a variational framework for learning from human feedback (binary preference labels) i.e. inferring a novel user-specific latent and learning reward models and policies conditioned on this latent without additional user-specific data. This is used for quick adaptation to specific user preferences without retraining the entire model or ignoring underrepresented groups.
For control experiments of VPL, please refer to here.
git clone git@github.com:WEIRDLabUW/vpl_llm.git
conda create -n vpl python=3.10
conda activate vpl
pip install -r requirements.txt
Our datasets and checkpoints can be downloaded from Google Drive link.
The datasets needed for VPL experiments should be downloaded and unzipped to ./data/
. There are three datasets in the folder: simple_pets
, P_survey_100
, and P_4_survey_100
.
The checkpoints for VPL experiments should be downloaded to ./logs
. We provide the checkpoints for VPL and other baseline models over each dataset.
We also provide the code for generating our datasets.
The following scripts will also give you the datasets in ./data/
.
To generate ./data/simple_pets
, run
bash generate_llm_embeddings_pets.sh gpt2
To generate ./data/P_survey_100
, run
python -m hidden_context.data_utils.ultrafeedback_augment -a 84 -n P
bash generate_llm_embeddings_UF_P_2.sh gpt2 84
To generate ./data/P_4_survey_100
, run
python -m hidden_context.data_utils.ultrafeedback_augment -a single -n P_4 -c
bash generate_llm_embeddings_UF_P_4.sh gpt2 single
In all the following scripts, <MODEL_TYPE>
can be chosen from vae
, base
, categorical
, and mean_and_variance
.
vae
corresponds to our VPL models, while the others are training baseline models.
The results are recorded on Wandb. Please refer to eval/accuracy
on Wandb page for model's performance.
To train models on ./data/simple_pets
, run
bash submit_job_pets.sh <MODEL_TYPE>
Note that the default settings are for Pets (full), please change the arguments as explained in the bash file if you want to train on Pets (controversial).
To train models on ./data/P_survey_100
, run
bash submit_job_UF_P_2.sh <MODEL_TYPE>
To train models on ./data/P_4_survey_100
, run
bash submit_job_UF_P_4.sh <MODEL_TYPE>
If you find this code useful, please cite:
@article{poddar2024vpl,
author = {Poddar, Sriyash and Wan, Yanming and Ivision, Hamish and Gupta, Abhishek and Jaques, Natasha},
title = {Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning},
booktitle = {ArXiv Preprint},
year = {2024},
}