[RLlib; docs] Docs do-over (new API stack): Env pages vol 02. #48542

sven1977 · 2024-11-04T13:42:49Z

Do-over of RLlib docs (new API stack):

Redo existing rllib-env.rst page.
Add new multi-agent-envs.rst page (and move all the multi-agent relevant docs here, plus rewrite them)
Add new hierarchical-envs.rst page (and move the paragraph on hierarchical envs here)
Add new external-envs.rst page (and move the paragraphs describing these here)
Add new example classes and scripts (also add to CI), highlighting the different multi-agent movement patterns: sequential vs simultaneous.
Add new figures to docs.

Why are these changes needed?

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…_redo_cleanup_old_api_stack_01

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…_redo_cleanup_old_api_stack_01 Signed-off-by: sven1977 <svenmika1977@gmail.com> # Conflicts: # doc/source/rllib/images/rllib-envs.svg

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…_redo_cleanup_old_api_stack_01

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…_redo_cleanup_old_api_stack_01

Signed-off-by: sven1977 <svenmika1977@gmail.com>

angelinalg

Not done yet, but releasing some comments.

doc/source/rllib/external-envs.rst

angelinalg · 2024-12-12T01:52:41Z

doc/source/rllib/external-envs.rst

+In many situations, it does not make sense for an RL environment to be "stepped" by RLlib.
+For example, if we train one or more policies inside a complex simulator, for example, a game engine
+or a robotics simulation, it would be more natural and user friendly to flip this setup around
+and - instead of RLlib "stepping" the env - allow the simulations and the agents to fully control


Suggested change

and - instead of RLlib "stepping" the env - allow the simulations and the agents to fully control

and - instead of RLlib "stepping" the env - allow the simulations and the agents to fully control

doc/source/rllib/external-envs.rst

doc/source/rllib/hierarchical-envs.rst

angelinalg · 2024-12-12T02:04:08Z

doc/source/rllib/multi-agent-envs.rst

+"traffic light" agents interacting simultaneously, whereas in a board game,
+two or more agents may act in a turn-based sequence.
+
+Several different policy networks may be used to control the various agents.


Suggested change

Several different policy networks may be used to control the various agents.

You can use several different policy networks to control the various agents.

angelinalg · 2024-12-12T02:13:12Z

I only got as far as doc/source/rllib/multi-agent-envs.rst but I will pick up again tomorrow.

angelinalg

Good work on all this writing. If this is a good time to change titles to sentence case, that would be more consistent with our style guide. Otherwise, don't worry about it. It's definitely not a blocker. Hope the suggestions are helpful. Will approve to not block you. Sorry for the delay.

angelinalg · 2024-12-13T16:59:49Z

doc/source/rllib/multi-agent-envs.rst

+two or more agents may act in a turn-based sequence.
+
+Several different policy networks may be used to control the various agents.
+Thereby, each of the agents in the environment maps to exactly one particular policy. This mapping is


Suggested change

Thereby, each of the agents in the environment maps to exactly one particular policy. This mapping is

Each agent in the environment maps to exactly one particular policy. Define this mapping

angelinalg · 2024-12-13T17:00:51Z

doc/source/rllib/multi-agent-envs.rst

+
+Several different policy networks may be used to control the various agents.
+Thereby, each of the agents in the environment maps to exactly one particular policy. This mapping is
+determined by a user-provided function, called the "mapping function". Note that if there


Suggested change

determined by a user-provided function, called the "mapping function". Note that if there

with a user-provided function, called the "mapping function". Note that if there

angelinalg · 2024-12-13T17:01:19Z

doc/source/rllib/multi-agent-envs.rst

+Several different policy networks may be used to control the various agents.
+Thereby, each of the agents in the environment maps to exactly one particular policy. This mapping is
+determined by a user-provided function, called the "mapping function". Note that if there
+are ``N`` agents mapping to ``M`` policies, ``N`` is always larger or equal to ``M``,


Suggested change

are ``N`` agents mapping to ``M`` policies, ``N`` is always larger or equal to ``M``,

are ``N`` agents mapping to ``M`` policies, ``N`` must be equal or greater to ``M``,

Not sure if the rewrite is too strong, but I'm guessing you're trying to say that.

angelinalg · 2024-12-13T17:03:06Z

doc/source/rllib/multi-agent-envs.rst

+    :width: 600
+
+    **Multi-agent setup:** ``N`` agents live in the environment and take actions computed by ``M`` policy networks.
+    The mapping from agent to policy is flexible and determined by a user-provided mapping function. Here, `agent_1`


Suggested change

The mapping from agent to policy is flexible and determined by a user-provided mapping function. Here, `agent_1`

The mapping from agent to policy is flexible and determined by a user-provided mapping function. In this diagram, `agent_1`

angelinalg · 2024-12-13T17:03:34Z

doc/source/rllib/multi-agent-envs.rst

+
+.. hint::
+
+    This paragraph describes RLlib's own :py:class`~ray.rllib.env.multi_agent_env.MultiAgentEnv` API, which is the


Suggested change

This paragraph describes RLlib's own :py:class`~ray.rllib.env.multi_agent_env.MultiAgentEnv` API, which is the

This paragraph describes RLlib's :py:class`~ray.rllib.env.multi_agent_env.MultiAgentEnv` API, which is the

Just a suggestion to remove a second instance of "own".

angelinalg · 2024-12-14T05:05:55Z

doc/source/rllib/rllib-env.rst

-.. seealso::
+1. **Vectorization within a single process:** Many environments achieve high
+   frame rates per core but are limited by policy inference latency. To address
+   this, create multiple environments per process and thus batch the policy forward pass


Suggested change

this, create multiple environments per process and thus batch the policy forward pass

this limitation, create multiple environments per process to batch the policy forward pass

angelinalg · 2024-12-14T05:06:27Z

doc/source/rllib/rllib-env.rst

+   across these vectorized environments. Set ``config.env_runners(num_envs_per_env_runner=..)``
+   to create more than one environment copy per :py:class:`~ray.rllib.envs.env_runner.EnvRunner`
+   actor. Additionally, you can make the individual sub-environments within a vector
+   independent processes (through python's multiprocessing used by gymnasium).


Suggested change

independent processes (through python's multiprocessing used by gymnasium).

independent processes through Python's multiprocessing used by gymnasium.

angelinalg · 2024-12-14T05:06:42Z

doc/source/rllib/rllib-env.rst


-External Application Clients
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    Multi-agent setups are not vectorizable yet. The Ray team is working on a solution for


Suggested change

Multi-agent setups are not vectorizable yet. The Ray team is working on a solution for

Multi-agent setups aren't vectorizable yet. The Ray team is working on a solution for

angelinalg · 2024-12-14T05:06:58Z

doc/source/rllib/rllib-env.rst

-External Application Clients
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    Multi-agent setups are not vectorizable yet. The Ray team is working on a solution for
+    this restriction by utilizing `gymnasium >= 1.x` custom vectorization feature.


Suggested change

this restriction by utilizing `gymnasium >= 1.x` custom vectorization feature.

this restriction by using the `gymnasium >= 1.x` custom vectorization feature.

angelinalg · 2024-12-14T05:07:52Z

doc/source/rllib/rllib-env.rst

-This low-level API models multiple agents executing asynchronously in multiple environments.
-A call to ``BaseEnv:poll()`` returns observations from ready agents keyed by 1) their environment, then 2) agent ids.
-Actions for those agents are sent back via ``BaseEnv:send_actions()``. BaseEnv is used to implement all the other env types in RLlib, so it offers a superset of their functionality.
+Some environments may require substantial resources to initialize and run. Should your environments require


Suggested change

Some environments may require substantial resources to initialize and run. Should your environments require

Some environments may require substantial resources to initialize and run. If your environments require

python/ray/data/_internal/planner/plan_udf_map_op.py

python/ray/data/exceptions.py

Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Signed-off-by: Sven Mika <sven@anyscale.io>

…_redo_cleanup_old_api_stack_01

Signed-off-by: Sven Mika <sven@anyscale.io>

simonsays1980

LGTM. Great overview about environments and detailed description of the delicate multi-agent environments and episodes users have to take care of.

simonsays1980 · 2024-12-19T11:42:41Z

doc/source/rllib/external-envs.rst

+.. figure:: images/envs/external_env_setup_client_inference.svg
+    :width: 600
+    **External application with client-side inference**: An external simulator (for example a game engine)
+    connects to RLlib, which runs as a server through a tcp-cabable, custom EnvRunner.


Do we want to use inline code formatting for all classes like EnvRunner?

simonsays1980 · 2024-12-19T11:43:22Z

doc/source/rllib/external-envs.rst

+.. scale: 75 %
+..    A Unity3D soccer game being learnt by RLlib via the ExternalEnv API.
+
+RLlib provides an `external messaging protocol <https://github.com/ray-project/ray/blob/master/rllib/env/utils/external_env_protocol.py>`__


This is so cool!

Yeah, let's make this a widely adopted standard! :)

simonsays1980 · 2024-12-19T11:50:19Z

doc/source/rllib/external-envs.rst

+The RLlink Protocol
+-------------------
+
+RLlink is a simple, stateful protocol designed for communication between a reinforcement learning (RL) server (ex., RLlib) and an


Dumb question: Why not using plain HTTP/2? It is standard and provides security and serialization via Protobuf?

Definitely in the next iteration! Trying to keep it as simple as possible for this very first iteration. For now, this is just about the message types (what to say when and what to expect back from server?), not really the actual implementation of the messages.

simonsays1980 · 2024-12-19T11:52:52Z

doc/source/rllib/hierarchical-envs.rst

+   top-level: action_0 -------------------------------------> action_1 ->
+   low-level: action_0 -> action_1 -> action_2 -> action_3 -> action_4 ->
+
+Alternatively, you could implement an environment, in which the two agent types don't act at the same time (overlappingly),


Awesome explanation!

simonsays1980 · 2024-12-19T11:54:29Z

doc/source/rllib/images/envs/external_env_setup_client_inference.svg

As I mentioned a while ago: For the long run it might be cool imo, if we get some design support such that it gets a more professional look and feel

simonsays1980 · 2024-12-19T11:58:43Z

doc/source/rllib/multi-agent-envs.rst

+
+    This paragraph describes RLlib's own :py:class`~ray.rllib.env.multi_agent_env.MultiAgentEnv` API, which is the
+    recommended way of defining your own multi-agent environment logic. However, if you are already using a
+    third-party multi-agent API, RLlib offers wrappers for :ref:`Farama's PettingZoo API <farama-pettingzoo-api>` as well


We might want to leave some comment about the game form we implement and PettingZoo (I think it is extensive form game)/OpenSpiel

…_redo_cleanup_old_api_stack_01

sven1977 added 8 commits August 7, 2024 09:25

wip

3fb189c

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray into docs…

bf5a0a9

…_redo_cleanup_old_api_stack_01

Merge branch 'master' of https://github.com/ray-project/ray into docs…

f275de5

…_redo_cleanup_old_api_stack_01

wip

c371005

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

84a4709

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray into docs…

501f228

…_redo_cleanup_old_api_stack_01 Signed-off-by: sven1977 <svenmika1977@gmail.com> # Conflicts: # doc/source/rllib/images/rllib-envs.svg

merge

32ee04f

Signed-off-by: sven1977 <svenmika1977@gmail.com>

merge

fabc6b6

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 requested review from maxpumperla, simonsays1980 and a team as code owners November 4, 2024 13:42

sven1977 assigned simonsays1980 Nov 4, 2024

sven1977 added 6 commits November 4, 2024 15:36

Merge branch 'master' of https://github.com/ray-project/ray into docs…

d7ba4d7

…_redo_cleanup_old_api_stack_01

wip

6d4a22c

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

27e0f7f

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray into docs…

6564638

…_redo_cleanup_old_api_stack_01

wip

7c94736

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

a3146ea

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 changed the title ~~[RLlib; docs] Do-over of RLlib docs; RL env pages.~~ [RLlib; docs] Do-over of RLlib docs (new API stack): Environments pages. Dec 7, 2024

sven1977 assigned angelinalg Dec 7, 2024

wip

36798de

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 requested a review from a team as a code owner December 8, 2024 16:39

sven1977 added 2 commits December 8, 2024 19:34

wip

36ed15b

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

908368d

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 added 2 commits December 11, 2024 18:40

wip

43efbff

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

425b3f6

Signed-off-by: sven1977 <svenmika1977@gmail.com>

angelinalg reviewed Dec 12, 2024

View reviewed changes

angelinalg approved these changes Dec 14, 2024

View reviewed changes

sven1977 commented Dec 18, 2024

View reviewed changes

python/ray/data/_internal/planner/plan_udf_map_op.py Outdated Show resolved Hide resolved

sven1977 commented Dec 18, 2024

View reviewed changes

python/ray/data/exceptions.py Outdated Show resolved Hide resolved

sven1977 and others added 8 commits December 18, 2024 18:31

Apply suggestions from code review

f4c32e5

Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Signed-off-by: Sven Mika <sven@anyscale.io>

Merge branch 'master' of https://github.com/ray-project/ray into docs…

6b33398

…_redo_cleanup_old_api_stack_01

wip

25419d6

Apply suggestions from code review

91a891d

Signed-off-by: Sven Mika <sven@anyscale.io>

wip

5b14b7b

wip

ca71107

wip

aca7953

wip

adec191

simonsays1980 approved these changes Dec 19, 2024

View reviewed changes

sven1977 added 2 commits December 19, 2024 13:17

Merge branch 'master' of https://github.com/ray-project/ray into docs…

99c783e

…_redo_cleanup_old_api_stack_01

wip

96af3bf

sven1977 enabled auto-merge (squash) December 19, 2024 12:52

github-actions bot added the go add ONLY when ready to merge, run all tests label Dec 19, 2024

fixes

170beaa

github-actions bot disabled auto-merge December 19, 2024 15:26

sven1977 enabled auto-merge (squash) December 19, 2024 15:40

fixes

157d2fa

github-actions bot disabled auto-merge December 19, 2024 15:49

sven1977 added 2 commits December 19, 2024 17:25

fixes

0c84080

fixes

55d5f55

sven1977 enabled auto-merge (squash) December 19, 2024 17:17

sven1977 merged commit 1b07eaf into ray-project:master Dec 19, 2024
6 checks passed

sven1977 deleted the docs_redo_cleanup_old_api_stack_01 branch December 20, 2024 06:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib; docs] Docs do-over (new API stack): Env pages vol 02. #48542

[RLlib; docs] Docs do-over (new API stack): Env pages vol 02. #48542

sven1977 commented Nov 4, 2024 •

edited

Loading

angelinalg left a comment

angelinalg Dec 12, 2024

angelinalg Dec 12, 2024 •

edited

Loading

angelinalg commented Dec 12, 2024

angelinalg left a comment

angelinalg Dec 13, 2024

angelinalg Dec 13, 2024

angelinalg Dec 13, 2024

angelinalg Dec 13, 2024

angelinalg Dec 13, 2024

angelinalg Dec 13, 2024

angelinalg Dec 13, 2024

angelinalg Dec 14, 2024

angelinalg Dec 14, 2024

angelinalg Dec 14, 2024

angelinalg Dec 14, 2024

angelinalg Dec 14, 2024

simonsays1980 left a comment

simonsays1980 Dec 19, 2024

simonsays1980 Dec 19, 2024

sven1977 Dec 19, 2024

simonsays1980 Dec 19, 2024

sven1977 Dec 19, 2024

simonsays1980 Dec 19, 2024

simonsays1980 Dec 19, 2024

simonsays1980 Dec 19, 2024

	and - instead of RLlib "stepping" the env - allow the simulations and the agents to fully control
	and - instead of RLlib "stepping" the env - allow the simulations and the agents to fully control

	Several different policy networks may be used to control the various agents.
	You can use several different policy networks to control the various agents.

	Thereby, each of the agents in the environment maps to exactly one particular policy. This mapping is
	Each agent in the environment maps to exactly one particular policy. Define this mapping

	determined by a user-provided function, called the "mapping function". Note that if there
	with a user-provided function, called the "mapping function". Note that if there

	are ``N`` agents mapping to ``M`` policies, ``N`` is always larger or equal to ``M``,
	are ``N`` agents mapping to ``M`` policies, ``N`` must be equal or greater to ``M``,

	The mapping from agent to policy is flexible and determined by a user-provided mapping function. Here, `agent_1`
	The mapping from agent to policy is flexible and determined by a user-provided mapping function. In this diagram, `agent_1`


		.. hint::

		This paragraph describes RLlib's own :py:class`~ray.rllib.env.multi_agent_env.MultiAgentEnv` API, which is the

	This paragraph describes RLlib's own :py:class`~ray.rllib.env.multi_agent_env.MultiAgentEnv` API, which is the
	This paragraph describes RLlib's :py:class`~ray.rllib.env.multi_agent_env.MultiAgentEnv` API, which is the

	this, create multiple environments per process and thus batch the policy forward pass
	this limitation, create multiple environments per process to batch the policy forward pass

	independent processes (through python's multiprocessing used by gymnasium).
	independent processes through Python's multiprocessing used by gymnasium.

	Multi-agent setups are not vectorizable yet. The Ray team is working on a solution for
	Multi-agent setups aren't vectorizable yet. The Ray team is working on a solution for

	this restriction by utilizing `gymnasium >= 1.x` custom vectorization feature.
	this restriction by using the `gymnasium >= 1.x` custom vectorization feature.

	Some environments may require substantial resources to initialize and run. Should your environments require
	Some environments may require substantial resources to initialize and run. If your environments require

[RLlib; docs] Docs do-over (new API stack): Env pages vol 02. #48542

[RLlib; docs] Docs do-over (new API stack): Env pages vol 02. #48542

Conversation

sven1977 commented Nov 4, 2024 • edited Loading

Why are these changes needed?

Related issue number

Checks

angelinalg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

angelinalg Dec 12, 2024 • edited Loading

Choose a reason for hiding this comment

angelinalg commented Dec 12, 2024

angelinalg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simonsays1980 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sven1977 commented Nov 4, 2024 •

edited

Loading

angelinalg Dec 12, 2024 •

edited

Loading