Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial soft prompt example for stage 3 trlx #14

Merged
merged 21 commits into from
Dec 16, 2022

Conversation

daia99
Copy link
Collaborator

@daia99 daia99 commented Oct 17, 2022

Minimal example implementation of PPO with soft prompt embedding(s).

This goes towards an eventual implementation of the Stage 3 ELM experiment with discrete code embeddings, where a (soft prompt) embedding is tuned for sodaracer generation with conditional RL for a given terrain.

This PR introduces a new model (inherited from AcceleratePPOModel) with a soft prompt embedding (from example implementation). Config adapted from one used for ppo_config.

Can run example in ppo_softprompt_sentiment.py, after installing requirements in: https://github.com/CarperAI/trlx#installation.

Some prior discussion occurred in a closed PR CarperAI/trlx#32.

@daia99 daia99 changed the title initial soft prompt example for stage 3 trlx [draft] initial soft prompt example for stage 3 trlx Oct 28, 2022
@daia99 daia99 marked this pull request as draft October 28, 2022 21:43
@herbiebradley herbiebradley changed the base branch from main to 0.2.0-release December 7, 2022 01:45
@daia99 daia99 changed the title [draft] initial soft prompt example for stage 3 trlx initial soft prompt example for stage 3 trlx Dec 14, 2022
@daia99 daia99 marked this pull request as ready for review December 14, 2022 22:44
Copy link
Collaborator

@herbiebradley herbiebradley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks! I guess potentially we should add trlx as an optional dependency in the pyproject.toml but I will make a note of this and add it to the 0.2.0 branch later.

@herbiebradley herbiebradley merged commit a56e36b into CarperAI:0.2.0-release Dec 16, 2022
@herbiebradley herbiebradley mentioned this pull request Dec 23, 2022
5 tasks
herbiebradley added a commit that referenced this pull request Dec 23, 2022
* initial soft prompt example for stage 3 trlx (#14)

* initial softprompt example for stage 3 trlx

* clean up

* basic runnable changes after trlx pr merge

* softprompt prefix padding handling, whole model freezing, cleanup

* new toy tasks, restored config register, plot softprompt drift

* fix import

* minor fixes, +orchestrator to handle softprompt padding

* update for new trlx version

* fix(sandbox): build typo and updating lock file (#26)

* Fix dependency for box2d; Moved graphviz stuff to optional. (#27)

* update and cleanup for latest trlx, clarity

* bugfix: override get_model_inputs with softprompt support

* additional comments, sanity checks

* update configs and example scripts

* init readme for trlx softprompt tuning setup

* formatting

* fix filename typo

Co-authored-by: Francisco Carvalho <7385326+TheExGenesis@users.noreply.github.com>
Co-authored-by: Honglu Fan <64070721+honglu2875@users.noreply.github.com>

* Diff processing and evaluations (#29)

* fix(sandbox): build typo and updating lock file (#26)

* Fix dependency for box2d; Moved graphviz stuff to optional. (#27)

* Added a few diff util functions and tests; Added pytest to CI; Added box2d-py (for pytest to pass) and requests to requirements.txt.

* fix pytest.

* ugh, dependency...

* fix typo... (ugh probably drank too much)

* fix dependency.

* added torch to requirements.txt

* Expose `Genotype` and `BaseEnvironment` for others to inherit.

* Use `checkpoints_dir` in the config instead of hardcoding "checkpoints".

* Fix small bug in ImageOptim mutate.

* Modified benchmark config; Added diff benchmark script; Completed verify_diff and added tests.

* Fixed minor issues on device and invalid format.

* Force `.use_cache` to be True.

* Rename `elm.py` into `elm_main.py` to avoid conflicts in import.

* Minor changes according to review; Added `DiffState` to represent the validity of diff data sample.

* Fix CI.

* Box2d CI bug again! Trying a hot-fix...

* Another try on swig version.

* Another try on swig version.

* Seriously?? What about revert back.

* Ok, trying again with different install order...

Co-authored-by: Francisco Carvalho <7385326+TheExGenesis@users.noreply.github.com>

* Fix requirements

* Add sodarace tests

* fix ADDFILE line count verification.

* Add docs with Sphinx

* Fix readthedocs syntax

* Fix rtd build

* Fix rtd build (for real this time)

* Rename elm to openelm

* Rename util

* Linting

* Revert "Linting"

This reverts commit 8db8623.

* Linting utils

* Add docstrings to some files

* Add Sphinx autodoc specification

* Rename benchmark_diff

* Improve Sphinx docstrings

Co-authored-by: Andrew <33094749+daia99@users.noreply.github.com>
Co-authored-by: Francisco Carvalho <7385326+TheExGenesis@users.noreply.github.com>
Co-authored-by: Honglu Fan <64070721+honglu2875@users.noreply.github.com>
Co-authored-by: Honglu Fan <honglu2875@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants