-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
initial soft prompt example for stage 3 trlx #14
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
daia99
changed the title
initial soft prompt example for stage 3 trlx
[draft] initial soft prompt example for stage 3 trlx
Oct 28, 2022
daia99
changed the title
[draft] initial soft prompt example for stage 3 trlx
initial soft prompt example for stage 3 trlx
Dec 14, 2022
herbiebradley
approved these changes
Dec 16, 2022
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks! I guess potentially we should add trlx as an optional dependency in the pyproject.toml
but I will make a note of this and add it to the 0.2.0 branch later.
herbiebradley
added a commit
that referenced
this pull request
Dec 23, 2022
* initial soft prompt example for stage 3 trlx (#14) * initial softprompt example for stage 3 trlx * clean up * basic runnable changes after trlx pr merge * softprompt prefix padding handling, whole model freezing, cleanup * new toy tasks, restored config register, plot softprompt drift * fix import * minor fixes, +orchestrator to handle softprompt padding * update for new trlx version * fix(sandbox): build typo and updating lock file (#26) * Fix dependency for box2d; Moved graphviz stuff to optional. (#27) * update and cleanup for latest trlx, clarity * bugfix: override get_model_inputs with softprompt support * additional comments, sanity checks * update configs and example scripts * init readme for trlx softprompt tuning setup * formatting * fix filename typo Co-authored-by: Francisco Carvalho <7385326+TheExGenesis@users.noreply.github.com> Co-authored-by: Honglu Fan <64070721+honglu2875@users.noreply.github.com> * Diff processing and evaluations (#29) * fix(sandbox): build typo and updating lock file (#26) * Fix dependency for box2d; Moved graphviz stuff to optional. (#27) * Added a few diff util functions and tests; Added pytest to CI; Added box2d-py (for pytest to pass) and requests to requirements.txt. * fix pytest. * ugh, dependency... * fix typo... (ugh probably drank too much) * fix dependency. * added torch to requirements.txt * Expose `Genotype` and `BaseEnvironment` for others to inherit. * Use `checkpoints_dir` in the config instead of hardcoding "checkpoints". * Fix small bug in ImageOptim mutate. * Modified benchmark config; Added diff benchmark script; Completed verify_diff and added tests. * Fixed minor issues on device and invalid format. * Force `.use_cache` to be True. * Rename `elm.py` into `elm_main.py` to avoid conflicts in import. * Minor changes according to review; Added `DiffState` to represent the validity of diff data sample. * Fix CI. * Box2d CI bug again! Trying a hot-fix... * Another try on swig version. * Another try on swig version. * Seriously?? What about revert back. * Ok, trying again with different install order... Co-authored-by: Francisco Carvalho <7385326+TheExGenesis@users.noreply.github.com> * Fix requirements * Add sodarace tests * fix ADDFILE line count verification. * Add docs with Sphinx * Fix readthedocs syntax * Fix rtd build * Fix rtd build (for real this time) * Rename elm to openelm * Rename util * Linting * Revert "Linting" This reverts commit 8db8623. * Linting utils * Add docstrings to some files * Add Sphinx autodoc specification * Rename benchmark_diff * Improve Sphinx docstrings Co-authored-by: Andrew <33094749+daia99@users.noreply.github.com> Co-authored-by: Francisco Carvalho <7385326+TheExGenesis@users.noreply.github.com> Co-authored-by: Honglu Fan <64070721+honglu2875@users.noreply.github.com> Co-authored-by: Honglu Fan <honglu2875@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Minimal example implementation of PPO with soft prompt embedding(s).
This goes towards an eventual implementation of the Stage 3 ELM experiment with discrete code embeddings, where a (soft prompt) embedding is tuned for sodaracer generation with conditional RL for a given terrain.
This PR introduces a new model (inherited from AcceleratePPOModel) with a soft prompt embedding (from example implementation). Config adapted from one used for ppo_config.
Can run example in
ppo_softprompt_sentiment.py
, after installing requirements in: https://github.com/CarperAI/trlx#installation.Some prior discussion occurred in a closed PR CarperAI/trlx#32.