GitHub

Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning

[Website] [Paper]

Sriyash Poddar¹, Yanming Wan¹, Hamish Ivison¹, Abhishek Gupta¹, Natasha Jaques¹

¹University of Washington

This repo is an implementation of the control experiments of VPL. VPL is a varitional framework for learning from human feedback (binary preference labels) i.e. inferring a novel user-specific latent and learning reward models and policies conditioned on this latent without additional user-specific data. This is used for quick adaptation to specific user preferences without retraining the entire model or ignoring underrepresented groups.

Instructions

Setting up repo

git clone git@github.com:WEIRDLabUW/vpl.git

Install Dependencies

conda create -n vpl python=3.10
conda activate vpl
pip install -r requirements.txt
pip install -e dependencies/d4rl --no-deps
pip install -e dependencies/ravens
pip install -e .
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
Install "mujoco-py" from here: https://github.com/vwxyzjn/free-mujoco-py

Reward Learning

To create preference dataset for VPL:

python -m pref_learn.create_dataset --num_query=<num of comparison pairs> --env=<env_name> --query_len=<query_length> --set_len=<annotation_batch_size>

To train VPL on a preference dataset, run the following commands:

python pref_learn/train.py \
    --comment=<project_name> \
    --env=<env_name> \
    --dataset_path=<dataset_path> \
    --model_type=<reward_model_class> \
    --logging.output_dir="logs" \
    --seed "seed" \

Policy Learning

To train VPL policies:

python experiments/run_iql.py \
        --env_name=<env_name> \
        --seed=<seed> \
        --model_type=<reward_model_type> \
        --ckpt=<reward_model_checkpoint_dir> \
        --use_reward_model=True

Active Learning

To evaluate the VPL policy using actively queried samples:

python experiments/eval.py \
   --env_name=<env_name> \
   --samples=<eval_episodes>
   --sampling_method="information_gain" \
   --preference_dataset_path=<path to preference dataset> \
   --reward_model_path=<reward_model_checkpoint_dir> \
   --policy_path=<policy_checkpoint_dir>

All examples scripts can be found in the scripts directory. Run the script as follows:

bash run.sh VAE max

Acknowledgement

This repository uses the IQL implementation in jax from https://github.com/dibyaghosh/jaxrl_m.

Bibtex

If you find this code useful, please cite:

@article{poddar2024vpl,
    author    = {Poddar, Sriyash and Wan, Yanming and Ivision, Hamish and Gupta, Abhishek and Jaques, Natasha},
    title     = {Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning},
    booktitle = {ArXiv Preprint},
    year      = {2024},
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
configs		configs
data		data
dependencies		dependencies
experiments		experiments
jaxrl_m		jaxrl_m
pref_learn		pref_learn
raven_debug_notebooks		raven_debug_notebooks
scripts		scripts
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt
requirements_d4rl.txt		requirements_d4rl.txt
run.sh		run.sh
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning

[Website] [Paper]

Instructions

Setting up repo

Install Dependencies

Reward Learning

Policy Learning

Active Learning

All examples scripts can be found in the scripts directory. Run the script as follows:

Acknowledgement

Bibtex

About

Releases

Packages

Languages

License

WEIRDLabUW/vpl

Folders and files

Latest commit

History

Repository files navigation

Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning

[Website] [Paper]

Instructions

Setting up repo

Install Dependencies

Reward Learning

Policy Learning

Active Learning

All examples scripts can be found in the scripts directory. Run the script as follows:

Acknowledgement

Bibtex

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages