[RLlib] Cleanup examples folder #01. #44067

sven1977 · 2024-03-16T19:06:11Z

Cleanup examples folder no. 01:

Move some old-API only scripts into examples/_old_api_stack.
Create new sub directories and move scripts into these as appropriate:
- evaluation
- multi_agent_and_self_play
- gpu_training
Left all old script names as-is, in case of old web-links pointing to these, but they error out now and provide information on where to find their new versions.
Some of the existing scripts were not just moved, but also altered to work with the new API stack. In particular, these include all multi-agent scripts as well as all evaluation-related scripts.
Adjust docs and BUILD accordingly.
A few (very minor) bug fixes in the ConnectorV2 API are included.

TODO (in follow-up PRs):

Continue emptying old scripts, translating them to the new API stack, and moving them to better, more comprehensive sub-directories.
Consistently move examples classes (as opposed to example scripts) into special subdirectories within examples, e.g. examples/rl_module/classes/ or examples/env/classes and leave the space in the direct sub-directories for entire scripts only (i.e. inside examples/env, there should be all the example scripts demo'ing env/env-runner/custom env stuff; only within the sub-dir classes should all the example envs go; same with models, rl_module, learner, etc..).

Why are these changes needed?

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…nup_examples_folder

…_on_new_api_stack_w_env_runner_and_connectorv2 Signed-off-by: sven1977 <svenmika1977@gmail.com> # Conflicts: # rllib/algorithms/algorithm.py # rllib/utils/actor_manager.py

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…f some bug in the rlmodule specs. Signed-off-by: sven1977 <svenmika1977@gmail.com>

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…nup_examples_folder

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…nup_examples_folder

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…nup_examples_folder

Signed-off-by: sven1977 <svenmika1977@gmail.com>

simonsays1980

LGTM. Some more infos would be helpful here and there to give a user/developer the big picture and the why. Awesome example updates!

simonsays1980 · 2024-03-19T11:17:01Z

.buildkite/rllib.rayci.yml

-        --test-env RLLIB_ENABLE_RL_MODULE=1
-        --test-env RAY_USE_MULTIPROCESSING_CPU_COUNT=1
-    depends_on: rllibbuild
-
  - label: ":brain: rllib: data tests"


What's actually the meaning of brain here? Learning tests?

:) It defines the little icon shown in buildkite

simonsays1980 · 2024-03-19T11:35:31Z

rllib/algorithms/algorithm_config.py

+                        module_specs=(
+                            self.rl_module_spec.module_specs
+                            if isinstance(self.rl_module_spec, MultiAgentRLModuleSpec)
+                            else set(self.policies)


I hope we can soon get rid of the policy/ies naming. This is still confusing in the module setups.

Great point. We need to unify this soon and fully adapt to the new stack terminology. Some ideas:

Have user explicitly enable multi-agent (otherwise, will error if multi-agent components are used).

config.multi_agent(policies) should no longer be necessary (already kind of replaced by config.rl_module)

policy_mapping_fn -> agent_to_module_mapping_fn

etc..

simonsays1980 · 2024-03-19T11:50:58Z

rllib/connectors/common/add_states_from_episodes_to_batch.py

-                ),
-                data,
-            )
+            for column, column_data in data.copy().items():


Can we make for each connector either a doc string that tells us the input and output shape?

We could also create something like the RLModules have: get_input_specs, get_output_specs and then check modules (also user modules) in the pipeline if they "fit".

Also let's add a comment about the bigger picture here: what do the recurrent modules expect and what do we feed them.

simonsays1980 · 2024-03-19T13:29:45Z

rllib/connectors/common/agent_to_module_mapping.py

@@ -69,7 +69,7 @@ class AgentToModuleMapping(ConnectorV2):

        # Create our connector piece.
        connector = AgentToModuleMapping(
-            modules=["module0", "module1"],
+            module_specs={"module0", "module1"},


Shouldn't this be: {"module0": SingleAgentModuleSpec(....), "module1": ...}?

Both is possible. Sometimes, users don't specify the individual SingleAgentRLModuleSpec (RLlib then uses the algo's default ones), thus they also do NOT provide space/class/config information for individual modules. The connector needs to be ok with that and fall back to only having the IDs of the modules w/o any further information.

simonsays1980 · 2024-03-19T13:35:58Z

rllib/connectors/connector_v2.py

@@ -615,6 +615,7 @@ def foreach_batch_item_change_in_place(
        func: Callable[[Any, int, AgentID, ModuleID], Any],
    ) -> None:


Can we add a doc string that explains when to use it and how?

Good catch. You are right, this one is completely missing a docstring :|

Added docstring and thorough .. testcode::.

simonsays1980 · 2024-03-19T14:05:22Z

rllib/examples/evaluation/custom_evaluation.py

+We define a custom evaluation method that does the following:
+- It changes the corridor length of all environments used on the evaluation EnvRunners.
+- It runs a defined number of episodes for evaluation purposes.
+- It collects the metrics from those runs, summarizes these metrics and return them.


Tiny typo :) "return" -> "returns"

simonsays1980 · 2024-03-19T14:09:06Z

rllib/examples/evaluation/custom_evaluation.py

+            func=lambda worker: (worker.sample(), worker.get_metrics())[1],
+            local_worker=False,
+        )
+        for metrics_per_worker in metrics_all_workers:


Can we show here maybe how to sort the metrics from different corridor lengths in the results dict? (e.g. such that they show up in different diagrams in wandb/tensorboard)

simonsays1980 · 2024-03-19T14:19:49Z

rllib/examples/multi_agent_and_self_play/pettingzoo_shared_value_function.py

+See: https://pettingzoo.farama.org/environments/sisl/waterworld/
+for more details on the environment.
+
+Note that this example is different from the old API stack scripts:


Ah here they are. Awesome!

simonsays1980 · 2024-03-19T14:20:26Z

rllib/examples/multi_agent_and_self_play/pettingzoo_shared_value_function.py

+`examples/centralized_critic.py` and `examples/centralized_critic_2.py` in the
+sense that here, a true shared value function is used via the new
+`MultiAgentRLModule` class as opposed to both of the old API stack scripts, which
+do NOT use a single central value function, but 2: One for each policy learnt.


Typo: "learnt" -> "learned"

simonsays1980 · 2024-03-19T14:25:36Z

rllib/examples/multi_agent_and_self_play/pettingzoo_shared_value_function.py

+        .multi_agent(
+            policies=policies,
+            # Exact 1:1 mapping from AgentID to ModuleID.
+            policy_mapping_fn=(lambda aid, *args, **kwargs: aid),


Doesn't this mean that they all share in addition the policy?

Ah, sorry, this example is NOT done yet. I'll finish it as discussed above.
Good call! :)

Signed-off-by: sven1977 <svenmika1977@gmail.com>

angelinalg

Releasing first batch of comments to review a higher priority PR first.

doc/source/rllib/rllib-examples.rst

rllib/algorithms/algorithm_config.py

rllib/examples/_old_api_stack/remote_envs_with_inference_done_on_main_node.py

rllib/examples/_old_api_stack/sb2rllib_sb_example.py

rllib/examples/_old_api_stack/two_trainer_workflow.py

Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Signed-off-by: Sven Mika <sven@anyscale.io>

angelinalg

I think I reviewed more than you intended. Sorry for the delay. Hope this helps.

angelinalg · 2024-03-21T02:19:14Z

rllib/examples/_old_api_stack/two_trainer_workflow.py

+
+    ray.init(local_mode=args.local_mode)
+
+    # Simple environment with 4 independent cartpole entities


Suggested change

# Simple environment with 4 independent cartpole entities

# Simple environment with 4 independent cartpole entities.

angelinalg · 2024-03-21T02:21:18Z

rllib/examples/evaluation/custom_evaluation.py

+"""Example of customizing the evaluation procedure for an RLlib algorithm.
+
+Note, that you should only choose to provide a custom eval function, in case the already
+built-in eval options are not sufficient. Normally, though, RLlib's eval utilities


Suggested change

built-in eval options are not sufficient. Normally, though, RLlib's eval utilities

built-in eval options aren't sufficient. Normally, though, RLlib's eval utilities

angelinalg · 2024-03-21T02:22:03Z

rllib/examples/evaluation/custom_evaluation.py

+This script uses the SimpleCorridor environment, a simple 1D gridworld, in which
+the agent can only walk left (action=0) or right (action=1). The goal is at the end of
+the (1D) corridor. The env exposes an API to change the length of the corridor
+on-the-fly. We use this API here to extend the size of the corridor for the evaluation


Suggested change

on-the-fly. We use this API here to extend the size of the corridor for the evaluation

on-the-fly. This API extends the size of the corridor for the evaluation

angelinalg · 2024-03-21T02:22:18Z

rllib/examples/evaluation/custom_evaluation.py

+on-the-fly. We use this API here to extend the size of the corridor for the evaluation
+runs.
+
+We define a custom evaluation method that does the following:


Suggested change

We define a custom evaluation method that does the following:

A custom evaluation method does the following:

angelinalg · 2024-03-21T02:22:46Z

rllib/examples/evaluation/custom_evaluation.py

+runs.
+
+We define a custom evaluation method that does the following:
+- It changes the corridor length of all environments used on the evaluation EnvRunners.


Suggested change

- It changes the corridor length of all environments used on the evaluation EnvRunners.

- It changes the corridor length of all environments RLlib uses on the evaluation EnvRunners.

rllib/examples/multi_agent_and_self_play/two_step_game_with_grouped_agents.py

angelinalg · 2024-03-21T02:40:38Z

rllib/examples/multi_agent_and_self_play/two_step_game_with_grouped_agents.py

+
+For debugging, use the following additional command line options
+`--no-tune --num-env-runners=0`
+Which should allow you to set breakpoints anywhere in the RLlib code and


Suggested change

Which should allow you to set breakpoints anywhere in the RLlib code and

which should allow you to set breakpoints anywhere in the RLlib code and

rllib/examples/rl_module/classes/rock_paper_scissors_heuristic_rlm.py

…nup_examples_folder

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…eanup_examples_folder

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Signed-off-by: Sven Mika <sven@anyscale.io>

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…eanup_examples_folder

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…eanup_examples_folder

sven1977 added 3 commits March 6, 2024 10:30

wip

8a740d2

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray into clea…

d043044

…nup_examples_folder

Merge branch 'master' of https://github.com/ray-project/ray into appo…

79ee728

…_on_new_api_stack_w_env_runner_and_connectorv2 Signed-off-by: sven1977 <svenmika1977@gmail.com> # Conflicts: # rllib/algorithms/algorithm.py # rllib/utils/actor_manager.py

sven1977 requested review from avnishn, ArturNiederfahrenhorst, maxpumperla, kouroshHakha, simonsays1980 and a team as code owners March 16, 2024 19:06

sven1977 added 13 commits March 17, 2024 21:49

wip

91b6eab

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Rock-paper-scissors example working. NOT for non-shared vf net, b/c o…

dc34656

…f some bug in the rlmodule specs. Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

e924eeb

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

0d0f8c2

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray into clea…

69200be

…nup_examples_folder

wip

7eeffed

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray into clea…

5711bba

…nup_examples_folder

wip

204a4fd

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

92de3dd

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

7a7a360

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray into clea…

32fa511

…nup_examples_folder

wip

651b1f1

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

67f51c4

Signed-off-by: sven1977 <svenmika1977@gmail.com>

simonsays1980 approved these changes Mar 19, 2024

View reviewed changes

sven1977 added 2 commits March 19, 2024 17:22

wip

661d046

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

b80a93f

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 requested review from matthewdeng, justinvyu and woshiyyya as code owners March 19, 2024 16:47

sven1977 assigned simonsays1980 and angelinalg Mar 19, 2024

angelinalg reviewed Mar 19, 2024

View reviewed changes

sven1977 and others added 2 commits March 20, 2024 10:22

Apply suggestions from code review

15bbdbf

Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Signed-off-by: Sven Mika <sven@anyscale.io>

Merge branch 'master' into cleanup_examples_folder

01cdb80

angelinalg approved these changes Mar 21, 2024

View reviewed changes

sven1977 and others added 14 commits April 2, 2024 10:10

Merge branch 'master' of https://github.com/ray-project/ray into clea…

b7be627

…nup_examples_folder

wip

14d7cd2

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

48d66c1

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge remote-tracking branch 'origin/cleanup_examples_folder' into cl…

2b7e640

…eanup_examples_folder

wip

84158c0

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Apply suggestions from code review

42fe35f

Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Signed-off-by: Sven Mika <sven@anyscale.io>

wip

8c6971a

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge remote-tracking branch 'origin/cleanup_examples_folder' into cl…

4243bed

…eanup_examples_folder

wip

3d188f4

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

49d68af

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

79ec502

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge branch 'master' into cleanup_examples_folder

767c7ff

wip

73bff05

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge remote-tracking branch 'origin/cleanup_examples_folder' into cl…

aace3d5

…eanup_examples_folder

sven1977 merged commit cb05540 into ray-project:master Apr 2, 2024
4 of 5 checks passed

sven1977 deleted the cleanup_examples_folder branch April 9, 2024 11:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Cleanup examples folder #01. #44067

[RLlib] Cleanup examples folder #01. #44067

sven1977 commented Mar 16, 2024 •

edited

Loading

simonsays1980 left a comment

simonsays1980 Mar 19, 2024

sven1977 Mar 19, 2024

simonsays1980 Mar 19, 2024

sven1977 Mar 19, 2024

simonsays1980 Mar 19, 2024

simonsays1980 Mar 19, 2024

simonsays1980 Mar 19, 2024

sven1977 Mar 19, 2024

simonsays1980 Mar 19, 2024

sven1977 Mar 19, 2024

sven1977 Mar 19, 2024

simonsays1980 Mar 19, 2024

sven1977 Mar 19, 2024

simonsays1980 Mar 19, 2024

simonsays1980 Mar 19, 2024

simonsays1980 Mar 19, 2024

sven1977 Mar 19, 2024

simonsays1980 Mar 19, 2024

sven1977 Mar 19, 2024

angelinalg left a comment

angelinalg left a comment

angelinalg Mar 21, 2024

angelinalg Mar 21, 2024

angelinalg Mar 21, 2024

angelinalg Mar 21, 2024

angelinalg Mar 21, 2024

angelinalg Mar 21, 2024

		@@ -615,6 +615,7 @@ def foreach_batch_item_change_in_place(
		func: Callable[[Any, int, AgentID, ModuleID], Any],
		) -> None:


		ray.init(local_mode=args.local_mode)

		# Simple environment with 4 independent cartpole entities

	built-in eval options are not sufficient. Normally, though, RLlib's eval utilities
	built-in eval options aren't sufficient. Normally, though, RLlib's eval utilities

	on-the-fly. We use this API here to extend the size of the corridor for the evaluation
	on-the-fly. This API extends the size of the corridor for the evaluation

	We define a custom evaluation method that does the following:
	A custom evaluation method does the following:

	- It changes the corridor length of all environments used on the evaluation EnvRunners.
	- It changes the corridor length of all environments RLlib uses on the evaluation EnvRunners.

	Which should allow you to set breakpoints anywhere in the RLlib code and
	which should allow you to set breakpoints anywhere in the RLlib code and

[RLlib] Cleanup examples folder #01. #44067

[RLlib] Cleanup examples folder #01. #44067

Conversation

sven1977 commented Mar 16, 2024 • edited Loading

Why are these changes needed?

Related issue number

Checks

simonsays1980 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

angelinalg left a comment

Choose a reason for hiding this comment

angelinalg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sven1977 commented Mar 16, 2024 •

edited

Loading