-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Cleanup examples folder #01. #44067
[RLlib] Cleanup examples folder #01. #44067
Conversation
…nup_examples_folder
…_on_new_api_stack_w_env_runner_and_connectorv2 Signed-off-by: sven1977 <svenmika1977@gmail.com> # Conflicts: # rllib/algorithms/algorithm.py # rllib/utils/actor_manager.py
…f some bug in the rlmodule specs. Signed-off-by: sven1977 <svenmika1977@gmail.com>
…nup_examples_folder
…nup_examples_folder
…nup_examples_folder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Some more infos would be helpful here and there to give a user/developer the big picture and the why. Awesome example updates!
--test-env RLLIB_ENABLE_RL_MODULE=1 | ||
--test-env RAY_USE_MULTIPROCESSING_CPU_COUNT=1 | ||
depends_on: rllibbuild | ||
|
||
- label: ":brain: rllib: data tests" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's actually the meaning of brain
here? Learning tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
module_specs=( | ||
self.rl_module_spec.module_specs | ||
if isinstance(self.rl_module_spec, MultiAgentRLModuleSpec) | ||
else set(self.policies) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope we can soon get rid of the policy/ies
naming. This is still confusing in the module setups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point. We need to unify this soon and fully adapt to the new stack terminology. Some ideas:
- Have user explicitly enable multi-agent (otherwise, will error if multi-agent components are used).
config.multi_agent(policies)
should no longer be necessary (already kind of replaced byconfig.rl_module
)- policy_mapping_fn -> agent_to_module_mapping_fn
- etc..
), | ||
data, | ||
) | ||
for column, column_data in data.copy().items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make for each connector either a doc string that tells us the input and output shape?
We could also create something like the RLModule
s have: get_input_specs
, get_output_specs
and then check modules (also user modules) in the pipeline if they "fit".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also let's add a comment about the bigger picture here: what do the recurrent modules expect and what do we feed them.
@@ -69,7 +69,7 @@ class AgentToModuleMapping(ConnectorV2): | |||
|
|||
# Create our connector piece. | |||
connector = AgentToModuleMapping( | |||
modules=["module0", "module1"], | |||
module_specs={"module0", "module1"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be: {"module0": SingleAgentModuleSpec(....), "module1": ...}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both is possible. Sometimes, users don't specify the individual SingleAgentRLModuleSpec
(RLlib then uses the algo's default ones), thus they also do NOT provide space/class/config information for individual modules. The connector needs to be ok with that and fall back to only having the IDs of the modules w/o any further information.
@@ -615,6 +615,7 @@ def foreach_batch_item_change_in_place( | |||
func: Callable[[Any, int, AgentID, ModuleID], Any], | |||
) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a doc string that explains when to use it and how?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. You are right, this one is completely missing a docstring :|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added docstring and thorough .. testcode::
.
We define a custom evaluation method that does the following: | ||
- It changes the corridor length of all environments used on the evaluation EnvRunners. | ||
- It runs a defined number of episodes for evaluation purposes. | ||
- It collects the metrics from those runs, summarizes these metrics and return them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiny typo :) "return" -> "returns"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed :)
func=lambda worker: (worker.sample(), worker.get_metrics())[1], | ||
local_worker=False, | ||
) | ||
for metrics_per_worker in metrics_all_workers: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we show here maybe how to sort the metrics from different corridor lengths in the results
dict? (e.g. such that they show up in different diagrams in wandb/tensorboard)
See: https://pettingzoo.farama.org/environments/sisl/waterworld/ | ||
for more details on the environment. | ||
|
||
Note that this example is different from the old API stack scripts: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah here they are. Awesome!
`examples/centralized_critic.py` and `examples/centralized_critic_2.py` in the | ||
sense that here, a true shared value function is used via the new | ||
`MultiAgentRLModule` class as opposed to both of the old API stack scripts, which | ||
do NOT use a single central value function, but 2: One for each policy learnt. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: "learnt" -> "learned"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
.multi_agent( | ||
policies=policies, | ||
# Exact 1:1 mapping from AgentID to ModuleID. | ||
policy_mapping_fn=(lambda aid, *args, **kwargs: aid), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this mean that they all share in addition the policy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry, this example is NOT done yet. I'll finish it as discussed above.
Good call! :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Releasing first batch of comments to review a higher priority PR first.
rllib/examples/_old_api_stack/remote_envs_with_inference_done_on_main_node.py
Outdated
Show resolved
Hide resolved
rllib/examples/_old_api_stack/remote_envs_with_inference_done_on_main_node.py
Outdated
Show resolved
Hide resolved
Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Signed-off-by: Sven Mika <sven@anyscale.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I reviewed more than you intended. Sorry for the delay. Hope this helps.
|
||
ray.init(local_mode=args.local_mode) | ||
|
||
# Simple environment with 4 independent cartpole entities |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Simple environment with 4 independent cartpole entities | |
# Simple environment with 4 independent cartpole entities. |
"""Example of customizing the evaluation procedure for an RLlib algorithm. | ||
|
||
Note, that you should only choose to provide a custom eval function, in case the already | ||
built-in eval options are not sufficient. Normally, though, RLlib's eval utilities |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
built-in eval options are not sufficient. Normally, though, RLlib's eval utilities | |
built-in eval options aren't sufficient. Normally, though, RLlib's eval utilities |
This script uses the SimpleCorridor environment, a simple 1D gridworld, in which | ||
the agent can only walk left (action=0) or right (action=1). The goal is at the end of | ||
the (1D) corridor. The env exposes an API to change the length of the corridor | ||
on-the-fly. We use this API here to extend the size of the corridor for the evaluation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on-the-fly. We use this API here to extend the size of the corridor for the evaluation | |
on-the-fly. This API extends the size of the corridor for the evaluation |
on-the-fly. We use this API here to extend the size of the corridor for the evaluation | ||
runs. | ||
|
||
We define a custom evaluation method that does the following: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We define a custom evaluation method that does the following: | |
A custom evaluation method does the following: |
runs. | ||
|
||
We define a custom evaluation method that does the following: | ||
- It changes the corridor length of all environments used on the evaluation EnvRunners. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- It changes the corridor length of all environments used on the evaluation EnvRunners. | |
- It changes the corridor length of all environments RLlib uses on the evaluation EnvRunners. |
rllib/examples/multi_agent_and_self_play/two_step_game_with_grouped_agents.py
Outdated
Show resolved
Hide resolved
rllib/examples/multi_agent_and_self_play/two_step_game_with_grouped_agents.py
Outdated
Show resolved
Hide resolved
|
||
For debugging, use the following additional command line options | ||
`--no-tune --num-env-runners=0` | ||
Which should allow you to set breakpoints anywhere in the RLlib code and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which should allow you to set breakpoints anywhere in the RLlib code and | |
which should allow you to set breakpoints anywhere in the RLlib code and |
rllib/examples/rl_module/classes/rock_paper_scissors_heuristic_rlm.py
Outdated
Show resolved
Hide resolved
rllib/examples/rl_module/classes/rock_paper_scissors_heuristic_rlm.py
Outdated
Show resolved
Hide resolved
…nup_examples_folder
…eanup_examples_folder
Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Signed-off-by: Sven Mika <sven@anyscale.io>
…eanup_examples_folder
…eanup_examples_folder
Cleanup examples folder no. 01:
evaluation
multi_agent_and_self_play
gpu_training
TODO (in follow-up PRs):
examples
, e.g.examples/rl_module/classes/
orexamples/env/classes
and leave the space in the direct sub-directories for entire scripts only (i.e. insideexamples/env
, there should be all the example scripts demo'ing env/env-runner/custom env stuff; only within the sub-dirclasses
should all the example envs go; same with models, rl_module, learner, etc..).Why are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.