GitHub - jonathanlu31/gpt4v-pref

Reward Modeling from GPT4-Vision Preferences

About

This project aims to replicate the behavior of OpenAI and Deepmind's Deep Reinforcement Learning from Human Preferences using preferences elicited from GPT4-V instead of humans. The code and architecture of the project are based on Matthew Rahtz's implementation of the original paper, simplified for our purposes and translated into PyTorch.

The writeup is available here.

Roadmap

Test more tasks and environments

Contact

Jonathan Lu - jonathan.lu31@gmail.com

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
src		src
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
cartpole-test-line.png		cartpole-test-line.png
cartpole-test.png		cartpole-test.png
splits.png		splits.png
writeup.pdf		writeup.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reward Modeling from GPT4-Vision Preferences

About

Roadmap

Contact

About

Releases

Packages

Contributors 3

Languages

jonathanlu31/gpt4v-pref

Folders and files

Latest commit

History

Repository files navigation

Reward Modeling from GPT4-Vision Preferences

About

Roadmap

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages