initial soft prompt example for stage 3 trlx #14

daia99 · 2022-10-17T19:32:16Z

Minimal example implementation of PPO with soft prompt embedding(s).

This goes towards an eventual implementation of the Stage 3 ELM experiment with discrete code embeddings, where a (soft prompt) embedding is tuned for sodaracer generation with conditional RL for a given terrain.

This PR introduces a new model (inherited from AcceleratePPOModel) with a soft prompt embedding (from example implementation). Config adapted from one used for ppo_config.

Can run example in ppo_softprompt_sentiment.py, after installing requirements in: https://github.com/CarperAI/trlx#installation.

Some prior discussion occurred in a closed PR CarperAI/trlx#32.

)

herbiebradley

Looks good, thanks! I guess potentially we should add trlx as an optional dependency in the pyproject.toml but I will make a note of this and add it to the 0.2.0 branch later.

* initial soft prompt example for stage 3 trlx (#14) * initial softprompt example for stage 3 trlx * clean up * basic runnable changes after trlx pr merge * softprompt prefix padding handling, whole model freezing, cleanup * new toy tasks, restored config register, plot softprompt drift * fix import * minor fixes, +orchestrator to handle softprompt padding * update for new trlx version * fix(sandbox): build typo and updating lock file (#26) * Fix dependency for box2d; Moved graphviz stuff to optional. (#27) * update and cleanup for latest trlx, clarity * bugfix: override get_model_inputs with softprompt support * additional comments, sanity checks * update configs and example scripts * init readme for trlx softprompt tuning setup * formatting * fix filename typo Co-authored-by: Francisco Carvalho <7385326+TheExGenesis@users.noreply.github.com> Co-authored-by: Honglu Fan <64070721+honglu2875@users.noreply.github.com> * Diff processing and evaluations (#29) * fix(sandbox): build typo and updating lock file (#26) * Fix dependency for box2d; Moved graphviz stuff to optional. (#27) * Added a few diff util functions and tests; Added pytest to CI; Added box2d-py (for pytest to pass) and requests to requirements.txt. * fix pytest. * ugh, dependency... * fix typo... (ugh probably drank too much) * fix dependency. * added torch to requirements.txt * Expose `Genotype` and `BaseEnvironment` for others to inherit. * Use `checkpoints_dir` in the config instead of hardcoding "checkpoints". * Fix small bug in ImageOptim mutate. * Modified benchmark config; Added diff benchmark script; Completed verify_diff and added tests. * Fixed minor issues on device and invalid format. * Force `.use_cache` to be True. * Rename `elm.py` into `elm_main.py` to avoid conflicts in import. * Minor changes according to review; Added `DiffState` to represent the validity of diff data sample. * Fix CI. * Box2d CI bug again! Trying a hot-fix... * Another try on swig version. * Another try on swig version. * Seriously?? What about revert back. * Ok, trying again with different install order... Co-authored-by: Francisco Carvalho <7385326+TheExGenesis@users.noreply.github.com> * Fix requirements * Add sodarace tests * fix ADDFILE line count verification. * Add docs with Sphinx * Fix readthedocs syntax * Fix rtd build * Fix rtd build (for real this time) * Rename elm to openelm * Rename util * Linting * Revert "Linting" This reverts commit 8db8623. * Linting utils * Add docstrings to some files * Add Sphinx autodoc specification * Rename benchmark_diff * Improve Sphinx docstrings Co-authored-by: Andrew <33094749+daia99@users.noreply.github.com> Co-authored-by: Francisco Carvalho <7385326+TheExGenesis@users.noreply.github.com> Co-authored-by: Honglu Fan <64070721+honglu2875@users.noreply.github.com> Co-authored-by: Honglu Fan <honglu2875@gmail.com>

daia99 and others added 5 commits October 17, 2022 21:13

initial softprompt example for stage 3 trlx

2ddff3c

clean up

e2b6eed

basic runnable changes after trlx pr merge

1fa6be6

Merge branch 'CarperAI:main' into main

83e54e1

softprompt prefix padding handling, whole model freezing, cleanup

c47362e

daia99 changed the title ~~initial soft prompt example for stage 3 trlx~~ [draft] initial soft prompt example for stage 3 trlx Oct 28, 2022

daia99 marked this pull request as draft October 28, 2022 21:43

daia99 and others added 8 commits November 18, 2022 19:53

new toy tasks, restored config register, plot softprompt drift

51d8762

Merge branch 'CarperAI:main' into main

953d3ca

fix import

326199e

minor fixes, +orchestrator to handle softprompt padding

4978c15

update for new trlx version

ac94def

fix(sandbox): build typo and updating lock file (CarperAI#26)

a8771fa

Fix dependency for box2d; Moved graphviz stuff to optional. (CarperAI#27

7458eb2

)

update and cleanup for latest trlx, clarity

3a2b210

herbiebradley changed the base branch from main to 0.2.0-release December 7, 2022 01:45

daia99 and others added 7 commits December 7, 2022 20:59

bugfix: override get_model_inputs with softprompt support

894e1d9

Merge branch 'CarperAI:main' into main

4f14cdd

additional comments, sanity checks

2721a82

update configs and example scripts

2a089d5

init readme for trlx softprompt tuning setup

0c37410

Merge branch 'main' of github.com:daia99/ELM into main

acf737b

formatting

f7c51d4

daia99 changed the title ~~[draft] initial soft prompt example for stage 3 trlx~~ initial soft prompt example for stage 3 trlx Dec 14, 2022

daia99 marked this pull request as ready for review December 14, 2022 22:44

fix filename typo

dc8c50a

herbiebradley approved these changes Dec 16, 2022

View reviewed changes

herbiebradley merged commit a56e36b into CarperAI:0.2.0-release Dec 16, 2022

herbiebradley mentioned this pull request Dec 23, 2022

0.1.5 release #32

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

initial soft prompt example for stage 3 trlx #14

initial soft prompt example for stage 3 trlx #14

daia99 commented Oct 17, 2022

herbiebradley left a comment

initial soft prompt example for stage 3 trlx #14

initial soft prompt example for stage 3 trlx #14

Conversation

daia99 commented Oct 17, 2022

herbiebradley left a comment

Choose a reason for hiding this comment