[RLlib] Cleanup examples folder (vol 30): BC pretraining, then PPO finetuning (new API stack with RLModule checkpoints). #47838

sven1977 · 2024-09-27T10:21:40Z

Cleanup examples folder (vol 30): BC pretraining, then PPO finetuning (new API stack with RLModule checkpoints).

Why are these changes needed?

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: sven1977 <svenmika1977@gmail.com>

simonsays1980

Awesome example. Maybe adding the avg time for reaching 450 points without pretraining to show the advantage of pretraining.

simonsays1980 · 2024-09-27T10:31:20Z

rllib/examples/offline_rl/train_w_bc_finetune_w_ppo.py

+    best_result = results.get_best_result(metric_key)
+    rl_module_checkpoint = (
+        Path(best_result.checkpoint.path)
+        / COMPONENT_LEARNER_GROUP


simonsays1980 · 2024-09-27T10:33:36Z

rllib/examples/offline_rl/train_w_bc_finetune_w_ppo.py

+|   total time (s) |    episode_return_mean |   num_env_steps_traine |
+|                  |                        |             d_lifetime |
+|------------------+------------------------|------------------------|
+|          11.4828 |                  250.5 |                  42394 |


Awesome 11 seconds for 250 points. 51 iterations per 11 seconds is pretty fast.

simonsays1980 · 2024-09-27T10:34:05Z

rllib/examples/offline_rl/train_w_bc_finetune_w_ppo.py

+|   total time (s) |    episode_return_mean |  num_episodes_lifetime |
+|                  |                        |                        |
+------------------+------------------------+------------------------+
+|          32.7647 |                 450.76 |                    406 |


Dom we have by case a number how long it takes to train PPO from zero to 450?

It's probably not much slower, if at all. But I think this is not the main point here (we know PPO is super fast learning CartPole). The main goal here is to show that:
a) you can use simple custom Models w/o having to sub-class algo-specific RLModule classes!! <- this is huge and thanks to the new RLModule API concept.
b) it doesn't tank (catastrophic forgetting) after transfer from BC to PPO :)

sven1977 · 2024-09-27T10:35:07Z

rllib/algorithms/algorithm.py

@@ -857,6 +857,7 @@ def setup(self, config: AlgorithmConfig) -> None:
                    env_steps_sampled=self.metrics.peek(
                        NUM_ENV_STEPS_SAMPLED_LIFETIME, default=0
                    ),
+                    rl_module_state=rl_module_state,


This was a bug!

sven1977 · 2024-09-27T10:35:34Z

rllib/utils/test_utils.py

@@ -1362,7 +1362,11 @@ def run_rllib_example_script_experiment(
        args.as_test = True

    # Initialize Ray.
-    ray.init(num_cpus=args.num_cpus or None, local_mode=args.local_mode)
+    ray.init(


Added reinit error ignore:
In case one calls this utility function twice in an example script.

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

402b124

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 requested review from ArturNiederfahrenhorst and simonsays1980 as code owners September 27, 2024 10:21

sven1977 assigned simonsays1980 Sep 27, 2024

simonsays1980 approved these changes Sep 27, 2024

View reviewed changes

sven1977 commented Sep 27, 2024

View reviewed changes

sven1977 enabled auto-merge (squash) September 27, 2024 10:37

github-actions bot added the go add ONLY when ready to merge, run all tests label Sep 27, 2024

fixes

51ae3c7

Signed-off-by: sven1977 <svenmika1977@gmail.com>

github-actions bot disabled auto-merge September 27, 2024 11:51

sven1977 enabled auto-merge (squash) September 27, 2024 12:44

wip

7002a88

Signed-off-by: sven1977 <svenmika1977@gmail.com>

github-actions bot disabled auto-merge September 27, 2024 13:40

wip

b5fe5ad

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 enabled auto-merge (squash) September 27, 2024 15:36

github-actions bot disabled auto-merge September 27, 2024 15:36

wip

86f71d9

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 enabled auto-merge (squash) September 28, 2024 13:27

sven1977 merged commit b676d02 into ray-project:master Sep 28, 2024
6 checks passed

sven1977 deleted the cleanup_examples_folder_30_bc_train_ppo_finetune branch September 28, 2024 17:27

sven1977 added rllib RLlib related issues rllib-offline-rl Offline RL problems rllib-docs-or-examples Issues related to RLlib documentation or rllib/examples rllib-newstack labels Sep 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Cleanup examples folder (vol 30): BC pretraining, then PPO finetuning (new API stack with RLModule checkpoints). #47838

[RLlib] Cleanup examples folder (vol 30): BC pretraining, then PPO finetuning (new API stack with RLModule checkpoints). #47838

sven1977 commented Sep 27, 2024 •

edited

Loading

simonsays1980 left a comment

simonsays1980 Sep 27, 2024

simonsays1980 Sep 27, 2024

simonsays1980 Sep 27, 2024

sven1977 Sep 27, 2024

sven1977 Sep 27, 2024

sven1977 Sep 27, 2024

[RLlib] Cleanup examples folder (vol 30): BC pretraining, then PPO finetuning (new API stack with RLModule checkpoints). #47838

[RLlib] Cleanup examples folder (vol 30): BC pretraining, then PPO finetuning (new API stack with RLModule checkpoints). #47838

Conversation

sven1977 commented Sep 27, 2024 • edited Loading

Why are these changes needed?

Related issue number

Checks

simonsays1980 left a comment

Choose a reason for hiding this comment

simonsays1980 Sep 27, 2024

Choose a reason for hiding this comment

simonsays1980 Sep 27, 2024

Choose a reason for hiding this comment

simonsays1980 Sep 27, 2024

Choose a reason for hiding this comment

sven1977 Sep 27, 2024

Choose a reason for hiding this comment

sven1977 Sep 27, 2024

Choose a reason for hiding this comment

sven1977 Sep 27, 2024

Choose a reason for hiding this comment

sven1977 commented Sep 27, 2024 •

edited

Loading