Skip to content

jonathanlu31/gpt4v-pref

Repository files navigation


Reward Modeling from GPT4-Vision Preferences

View Demo

About

Product Name Screen Shot

This project aims to replicate the behavior of OpenAI and Deepmind's Deep Reinforcement Learning from Human Preferences using preferences elicited from GPT4-V instead of humans. The code and architecture of the project are based on Matthew Rahtz's implementation of the original paper, simplified for our purposes and translated into PyTorch.

The writeup is available here.

Roadmap

  • Test more tasks and environments

Contact

Jonathan Lu - jonathan.lu31@gmail.com

(back to top)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published