Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib; docs] Docs do-over (new API stack): Env pages vol 02. #48542

Merged

Conversation

sven1977
Copy link
Contributor

@sven1977 sven1977 commented Nov 4, 2024

Do-over of RLlib docs (new API stack):

  • Redo existing rllib-env.rst page.
  • Add new multi-agent-envs.rst page (and move all the multi-agent relevant docs here, plus rewrite them)
  • Add new hierarchical-envs.rst page (and move the paragraph on hierarchical envs here)
  • Add new external-envs.rst page (and move the paragraphs describing these here)
  • Add new example classes and scripts (also add to CI), highlighting the different multi-agent movement patterns: sequential vs simultaneous.
  • Add new figures to docs.

Why are these changes needed?

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
…_redo_cleanup_old_api_stack_01

Signed-off-by: sven1977 <svenmika1977@gmail.com>

# Conflicts:
#	doc/source/rllib/images/rllib-envs.svg
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
@sven1977 sven1977 added rllib RLlib related issues docs An issue or change related to documentation rllib-env rllib env related issues rllib-docs-or-examples Issues related to RLlib documentation or rllib/examples rllib-newstack rllib-oldstack-cleanup Issues related to cleaning up classes, utilities on the old API stack labels Nov 4, 2024
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
@sven1977 sven1977 changed the title [RLlib; docs] Do-over of RLlib docs; RL env pages. [RLlib; docs] Do-over of RLlib docs (new API stack): Environments pages. Dec 7, 2024
Signed-off-by: sven1977 <svenmika1977@gmail.com>
@sven1977 sven1977 requested a review from a team as a code owner December 8, 2024 16:39
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Copy link
Contributor

@angelinalg angelinalg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not done yet, but releasing some comments.

doc/source/rllib/external-envs.rst Outdated Show resolved Hide resolved
doc/source/rllib/external-envs.rst Outdated Show resolved Hide resolved
doc/source/rllib/external-envs.rst Outdated Show resolved Hide resolved
In many situations, it does not make sense for an RL environment to be "stepped" by RLlib.
For example, if we train one or more policies inside a complex simulator, for example, a game engine
or a robotics simulation, it would be more natural and user friendly to flip this setup around
and - instead of RLlib "stepping" the env - allow the simulations and the agents to fully control
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
and - instead of RLlib "stepping" the env - allow the simulations and the agents to fully control
and - instead of RLlib "stepping" the env - allow the simulations and the agents to fully control

doc/source/rllib/external-envs.rst Outdated Show resolved Hide resolved
doc/source/rllib/hierarchical-envs.rst Outdated Show resolved Hide resolved
doc/source/rllib/hierarchical-envs.rst Outdated Show resolved Hide resolved
doc/source/rllib/hierarchical-envs.rst Outdated Show resolved Hide resolved
doc/source/rllib/hierarchical-envs.rst Outdated Show resolved Hide resolved
"traffic light" agents interacting simultaneously, whereas in a board game,
two or more agents may act in a turn-based sequence.

Several different policy networks may be used to control the various agents.
Copy link
Contributor

@angelinalg angelinalg Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Several different policy networks may be used to control the various agents.
You can use several different policy networks to control the various agents.

@angelinalg
Copy link
Contributor

I only got as far as doc/source/rllib/multi-agent-envs.rst but I will pick up again tomorrow.

Copy link
Contributor

@angelinalg angelinalg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work on all this writing. If this is a good time to change titles to sentence case, that would be more consistent with our style guide. Otherwise, don't worry about it. It's definitely not a blocker. Hope the suggestions are helpful. Will approve to not block you. Sorry for the delay.

two or more agents may act in a turn-based sequence.

Several different policy networks may be used to control the various agents.
Thereby, each of the agents in the environment maps to exactly one particular policy. This mapping is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Thereby, each of the agents in the environment maps to exactly one particular policy. This mapping is
Each agent in the environment maps to exactly one particular policy. Define this mapping


Several different policy networks may be used to control the various agents.
Thereby, each of the agents in the environment maps to exactly one particular policy. This mapping is
determined by a user-provided function, called the "mapping function". Note that if there
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
determined by a user-provided function, called the "mapping function". Note that if there
with a user-provided function, called the "mapping function". Note that if there

Several different policy networks may be used to control the various agents.
Thereby, each of the agents in the environment maps to exactly one particular policy. This mapping is
determined by a user-provided function, called the "mapping function". Note that if there
are ``N`` agents mapping to ``M`` policies, ``N`` is always larger or equal to ``M``,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
are ``N`` agents mapping to ``M`` policies, ``N`` is always larger or equal to ``M``,
are ``N`` agents mapping to ``M`` policies, ``N`` must be equal or greater to ``M``,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if the rewrite is too strong, but I'm guessing you're trying to say that.

:width: 600

**Multi-agent setup:** ``N`` agents live in the environment and take actions computed by ``M`` policy networks.
The mapping from agent to policy is flexible and determined by a user-provided mapping function. Here, `agent_1`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The mapping from agent to policy is flexible and determined by a user-provided mapping function. Here, `agent_1`
The mapping from agent to policy is flexible and determined by a user-provided mapping function. In this diagram, `agent_1`


.. hint::

This paragraph describes RLlib's own :py:class`~ray.rllib.env.multi_agent_env.MultiAgentEnv` API, which is the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This paragraph describes RLlib's own :py:class`~ray.rllib.env.multi_agent_env.MultiAgentEnv` API, which is the
This paragraph describes RLlib's :py:class`~ray.rllib.env.multi_agent_env.MultiAgentEnv` API, which is the

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a suggestion to remove a second instance of "own".

.. seealso::
1. **Vectorization within a single process:** Many environments achieve high
frame rates per core but are limited by policy inference latency. To address
this, create multiple environments per process and thus batch the policy forward pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
this, create multiple environments per process and thus batch the policy forward pass
this limitation, create multiple environments per process to batch the policy forward pass

across these vectorized environments. Set ``config.env_runners(num_envs_per_env_runner=..)``
to create more than one environment copy per :py:class:`~ray.rllib.envs.env_runner.EnvRunner`
actor. Additionally, you can make the individual sub-environments within a vector
independent processes (through python's multiprocessing used by gymnasium).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
independent processes (through python's multiprocessing used by gymnasium).
independent processes through Python's multiprocessing used by gymnasium.


External Application Clients
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Multi-agent setups are not vectorizable yet. The Ray team is working on a solution for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Multi-agent setups are not vectorizable yet. The Ray team is working on a solution for
Multi-agent setups aren't vectorizable yet. The Ray team is working on a solution for

External Application Clients
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Multi-agent setups are not vectorizable yet. The Ray team is working on a solution for
this restriction by utilizing `gymnasium >= 1.x` custom vectorization feature.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
this restriction by utilizing `gymnasium >= 1.x` custom vectorization feature.
this restriction by using the `gymnasium >= 1.x` custom vectorization feature.

This low-level API models multiple agents executing asynchronously in multiple environments.
A call to ``BaseEnv:poll()`` returns observations from ready agents keyed by 1) their environment, then 2) agent ids.
Actions for those agents are sent back via ``BaseEnv:send_actions()``. BaseEnv is used to implement all the other env types in RLlib, so it offers a superset of their functionality.
Some environments may require substantial resources to initialize and run. Should your environments require
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Some environments may require substantial resources to initialize and run. Should your environments require
Some environments may require substantial resources to initialize and run. If your environments require

sven1977 and others added 8 commits December 18, 2024 18:31
Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com>
Signed-off-by: Sven Mika <sven@anyscale.io>
Signed-off-by: Sven Mika <sven@anyscale.io>
Copy link
Collaborator

@simonsays1980 simonsays1980 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Great overview about environments and detailed description of the delicate multi-agent environments and episodes users have to take care of.

.. figure:: images/envs/external_env_setup_client_inference.svg
:width: 600
**External application with client-side inference**: An external simulator (for example a game engine)
connects to RLlib, which runs as a server through a tcp-cabable, custom EnvRunner.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to use inline code formatting for all classes like EnvRunner?

.. scale: 75 %
.. A Unity3D soccer game being learnt by RLlib via the ExternalEnv API.

RLlib provides an `external messaging protocol <https://github.com/ray-project/ray/blob/master/rllib/env/utils/external_env_protocol.py>`__
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so cool!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let's make this a widely adopted standard! :)

The RLlink Protocol
-------------------

RLlink is a simple, stateful protocol designed for communication between a reinforcement learning (RL) server (ex., RLlib) and an
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dumb question: Why not using plain HTTP/2? It is standard and provides security and serialization via Protobuf?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely in the next iteration! Trying to keep it as simple as possible for this very first iteration. For now, this is just about the message types (what to say when and what to expect back from server?), not really the actual implementation of the messages.

top-level: action_0 -------------------------------------> action_1 ->
low-level: action_0 -> action_1 -> action_2 -> action_3 -> action_4 ->

Alternatively, you could implement an environment, in which the two agent types don't act at the same time (overlappingly),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome explanation!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned a while ago: For the long run it might be cool imo, if we get some design support such that it gets a more professional look and feel


This paragraph describes RLlib's own :py:class`~ray.rllib.env.multi_agent_env.MultiAgentEnv` API, which is the
recommended way of defining your own multi-agent environment logic. However, if you are already using a
third-party multi-agent API, RLlib offers wrappers for :ref:`Farama's PettingZoo API <farama-pettingzoo-api>` as well
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to leave some comment about the game form we implement and PettingZoo (I think it is extensive form game)/OpenSpiel

@sven1977 sven1977 enabled auto-merge (squash) December 19, 2024 12:52
@github-actions github-actions bot added the go add ONLY when ready to merge, run all tests label Dec 19, 2024
@github-actions github-actions bot disabled auto-merge December 19, 2024 15:26
@sven1977 sven1977 enabled auto-merge (squash) December 19, 2024 15:40
@github-actions github-actions bot disabled auto-merge December 19, 2024 15:49
@sven1977 sven1977 enabled auto-merge (squash) December 19, 2024 17:17
@sven1977 sven1977 merged commit 1b07eaf into ray-project:master Dec 19, 2024
6 checks passed
@sven1977 sven1977 deleted the docs_redo_cleanup_old_api_stack_01 branch December 20, 2024 06:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs An issue or change related to documentation go add ONLY when ready to merge, run all tests rllib RLlib related issues rllib-docs-or-examples Issues related to RLlib documentation or rllib/examples rllib-env rllib env related issues rllib-newstack rllib-oldstack-cleanup Issues related to cleaning up classes, utilities on the old API stack
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants