-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib; docs] Docs do-over (new API stack): Env pages vol 02. #48542
[RLlib; docs] Docs do-over (new API stack): Env pages vol 02. #48542
Conversation
…_redo_cleanup_old_api_stack_01
…_redo_cleanup_old_api_stack_01
…_redo_cleanup_old_api_stack_01 Signed-off-by: sven1977 <svenmika1977@gmail.com> # Conflicts: # doc/source/rllib/images/rllib-envs.svg
…_redo_cleanup_old_api_stack_01
…_redo_cleanup_old_api_stack_01
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not done yet, but releasing some comments.
doc/source/rllib/external-envs.rst
Outdated
In many situations, it does not make sense for an RL environment to be "stepped" by RLlib. | ||
For example, if we train one or more policies inside a complex simulator, for example, a game engine | ||
or a robotics simulation, it would be more natural and user friendly to flip this setup around | ||
and - instead of RLlib "stepping" the env - allow the simulations and the agents to fully control |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and - instead of RLlib "stepping" the env - allow the simulations and the agents to fully control | |
and - instead of RLlib "stepping" the env - allow the simulations and the agents to fully control |
"traffic light" agents interacting simultaneously, whereas in a board game, | ||
two or more agents may act in a turn-based sequence. | ||
|
||
Several different policy networks may be used to control the various agents. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several different policy networks may be used to control the various agents. | |
You can use several different policy networks to control the various agents. |
I only got as far as doc/source/rllib/multi-agent-envs.rst but I will pick up again tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work on all this writing. If this is a good time to change titles to sentence case, that would be more consistent with our style guide. Otherwise, don't worry about it. It's definitely not a blocker. Hope the suggestions are helpful. Will approve to not block you. Sorry for the delay.
two or more agents may act in a turn-based sequence. | ||
|
||
Several different policy networks may be used to control the various agents. | ||
Thereby, each of the agents in the environment maps to exactly one particular policy. This mapping is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thereby, each of the agents in the environment maps to exactly one particular policy. This mapping is | |
Each agent in the environment maps to exactly one particular policy. Define this mapping |
|
||
Several different policy networks may be used to control the various agents. | ||
Thereby, each of the agents in the environment maps to exactly one particular policy. This mapping is | ||
determined by a user-provided function, called the "mapping function". Note that if there |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
determined by a user-provided function, called the "mapping function". Note that if there | |
with a user-provided function, called the "mapping function". Note that if there |
Several different policy networks may be used to control the various agents. | ||
Thereby, each of the agents in the environment maps to exactly one particular policy. This mapping is | ||
determined by a user-provided function, called the "mapping function". Note that if there | ||
are ``N`` agents mapping to ``M`` policies, ``N`` is always larger or equal to ``M``, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are ``N`` agents mapping to ``M`` policies, ``N`` is always larger or equal to ``M``, | |
are ``N`` agents mapping to ``M`` policies, ``N`` must be equal or greater to ``M``, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if the rewrite is too strong, but I'm guessing you're trying to say that.
:width: 600 | ||
|
||
**Multi-agent setup:** ``N`` agents live in the environment and take actions computed by ``M`` policy networks. | ||
The mapping from agent to policy is flexible and determined by a user-provided mapping function. Here, `agent_1` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mapping from agent to policy is flexible and determined by a user-provided mapping function. Here, `agent_1` | |
The mapping from agent to policy is flexible and determined by a user-provided mapping function. In this diagram, `agent_1` |
|
||
.. hint:: | ||
|
||
This paragraph describes RLlib's own :py:class`~ray.rllib.env.multi_agent_env.MultiAgentEnv` API, which is the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This paragraph describes RLlib's own :py:class`~ray.rllib.env.multi_agent_env.MultiAgentEnv` API, which is the | |
This paragraph describes RLlib's :py:class`~ray.rllib.env.multi_agent_env.MultiAgentEnv` API, which is the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a suggestion to remove a second instance of "own".
doc/source/rllib/rllib-env.rst
Outdated
.. seealso:: | ||
1. **Vectorization within a single process:** Many environments achieve high | ||
frame rates per core but are limited by policy inference latency. To address | ||
this, create multiple environments per process and thus batch the policy forward pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this, create multiple environments per process and thus batch the policy forward pass | |
this limitation, create multiple environments per process to batch the policy forward pass |
doc/source/rllib/rllib-env.rst
Outdated
across these vectorized environments. Set ``config.env_runners(num_envs_per_env_runner=..)`` | ||
to create more than one environment copy per :py:class:`~ray.rllib.envs.env_runner.EnvRunner` | ||
actor. Additionally, you can make the individual sub-environments within a vector | ||
independent processes (through python's multiprocessing used by gymnasium). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
independent processes (through python's multiprocessing used by gymnasium). | |
independent processes through Python's multiprocessing used by gymnasium. |
doc/source/rllib/rllib-env.rst
Outdated
|
||
External Application Clients | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
Multi-agent setups are not vectorizable yet. The Ray team is working on a solution for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multi-agent setups are not vectorizable yet. The Ray team is working on a solution for | |
Multi-agent setups aren't vectorizable yet. The Ray team is working on a solution for |
doc/source/rllib/rllib-env.rst
Outdated
External Application Clients | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
Multi-agent setups are not vectorizable yet. The Ray team is working on a solution for | ||
this restriction by utilizing `gymnasium >= 1.x` custom vectorization feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this restriction by utilizing `gymnasium >= 1.x` custom vectorization feature. | |
this restriction by using the `gymnasium >= 1.x` custom vectorization feature. |
doc/source/rllib/rllib-env.rst
Outdated
This low-level API models multiple agents executing asynchronously in multiple environments. | ||
A call to ``BaseEnv:poll()`` returns observations from ready agents keyed by 1) their environment, then 2) agent ids. | ||
Actions for those agents are sent back via ``BaseEnv:send_actions()``. BaseEnv is used to implement all the other env types in RLlib, so it offers a superset of their functionality. | ||
Some environments may require substantial resources to initialize and run. Should your environments require |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some environments may require substantial resources to initialize and run. Should your environments require | |
Some environments may require substantial resources to initialize and run. If your environments require |
Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Signed-off-by: Sven Mika <sven@anyscale.io>
…_redo_cleanup_old_api_stack_01
Signed-off-by: Sven Mika <sven@anyscale.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Great overview about environments and detailed description of the delicate multi-agent environments and episodes users have to take care of.
.. figure:: images/envs/external_env_setup_client_inference.svg | ||
:width: 600 | ||
**External application with client-side inference**: An external simulator (for example a game engine) | ||
connects to RLlib, which runs as a server through a tcp-cabable, custom EnvRunner. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to use inline code formatting for all classes like EnvRunner
?
.. scale: 75 % | ||
.. A Unity3D soccer game being learnt by RLlib via the ExternalEnv API. | ||
|
||
RLlib provides an `external messaging protocol <https://github.com/ray-project/ray/blob/master/rllib/env/utils/external_env_protocol.py>`__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is so cool!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, let's make this a widely adopted standard! :)
The RLlink Protocol | ||
------------------- | ||
|
||
RLlink is a simple, stateful protocol designed for communication between a reinforcement learning (RL) server (ex., RLlib) and an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dumb question: Why not using plain HTTP/2? It is standard and provides security and serialization via Protobuf?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely in the next iteration! Trying to keep it as simple as possible for this very first iteration. For now, this is just about the message types (what to say when and what to expect back from server?), not really the actual implementation of the messages.
top-level: action_0 -------------------------------------> action_1 -> | ||
low-level: action_0 -> action_1 -> action_2 -> action_3 -> action_4 -> | ||
|
||
Alternatively, you could implement an environment, in which the two agent types don't act at the same time (overlappingly), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome explanation!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned a while ago: For the long run it might be cool imo, if we get some design support such that it gets a more professional look and feel
|
||
This paragraph describes RLlib's own :py:class`~ray.rllib.env.multi_agent_env.MultiAgentEnv` API, which is the | ||
recommended way of defining your own multi-agent environment logic. However, if you are already using a | ||
third-party multi-agent API, RLlib offers wrappers for :ref:`Farama's PettingZoo API <farama-pettingzoo-api>` as well |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to leave some comment about the game form we implement and PettingZoo (I think it is extensive form game)/OpenSpiel
…_redo_cleanup_old_api_stack_01
Do-over of RLlib docs (new API stack):
Why are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.