Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Cleanup examples folder #01. #44067

Merged
merged 34 commits into from
Apr 2, 2024
Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
8a740d2
wip
sven1977 Mar 6, 2024
d043044
Merge branch 'master' of https://github.com/ray-project/ray into clea…
sven1977 Mar 16, 2024
79ee728
Merge branch 'master' of https://github.com/ray-project/ray into appo…
sven1977 Mar 16, 2024
91b6eab
wip
sven1977 Mar 17, 2024
dc34656
Rock-paper-scissors example working. NOT for non-shared vf net, b/c o…
sven1977 Mar 17, 2024
e924eeb
wip
sven1977 Mar 18, 2024
0d0f8c2
wip
sven1977 Mar 18, 2024
69200be
Merge branch 'master' of https://github.com/ray-project/ray into clea…
sven1977 Mar 18, 2024
7eeffed
wip
sven1977 Mar 18, 2024
5711bba
Merge branch 'master' of https://github.com/ray-project/ray into clea…
sven1977 Mar 18, 2024
204a4fd
wip
sven1977 Mar 18, 2024
92de3dd
wip
sven1977 Mar 18, 2024
7a7a360
wip
sven1977 Mar 19, 2024
32fa511
Merge branch 'master' of https://github.com/ray-project/ray into clea…
sven1977 Mar 19, 2024
651b1f1
wip
sven1977 Mar 19, 2024
67f51c4
wip
sven1977 Mar 19, 2024
661d046
wip
sven1977 Mar 19, 2024
b80a93f
wip
sven1977 Mar 19, 2024
15bbdbf
Apply suggestions from code review
sven1977 Mar 20, 2024
01cdb80
Merge branch 'master' into cleanup_examples_folder
sven1977 Mar 20, 2024
b7be627
Merge branch 'master' of https://github.com/ray-project/ray into clea…
sven1977 Apr 2, 2024
14d7cd2
wip
sven1977 Apr 2, 2024
48d66c1
wip
sven1977 Apr 2, 2024
2b7e640
Merge remote-tracking branch 'origin/cleanup_examples_folder' into cl…
sven1977 Apr 2, 2024
84158c0
wip
sven1977 Apr 2, 2024
42fe35f
Apply suggestions from code review
sven1977 Apr 2, 2024
8c6971a
wip
sven1977 Apr 2, 2024
4243bed
Merge remote-tracking branch 'origin/cleanup_examples_folder' into cl…
sven1977 Apr 2, 2024
3d188f4
wip
sven1977 Apr 2, 2024
49d68af
wip
sven1977 Apr 2, 2024
79ec502
wip
sven1977 Apr 2, 2024
767c7ff
Merge branch 'master' into cleanup_examples_folder
sven1977 Apr 2, 2024
73bff05
wip
sven1977 Apr 2, 2024
aace3d5
Merge remote-tracking branch 'origin/cleanup_examples_folder' into cl…
sven1977 Apr 2, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 0 additions & 11 deletions .buildkite/rllib.rayci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -106,17 +106,6 @@ steps:
--test-env=RLLIB_NUM_GPUS=1
depends_on: rllibgpubuild

- label: ":brain: rllib: rlmodule tests"
tags: rllib_directly
instance_type: large
commands:
- bazel run //ci/ray_ci:test_in_docker -- //rllib/... rllib
--parallelism-per-worker 3
--only-tags rlm
--test-env RLLIB_ENABLE_RL_MODULE=1
--test-env RAY_USE_MULTIPROCESSING_CPU_COUNT=1
depends_on: rllibbuild

- label: ":brain: rllib: data tests"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's actually the meaning of brain here? Learning tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:) It defines the little icon shown in buildkite
image

if: build.branch != "master"
tags: data
Expand Down
2 changes: 1 addition & 1 deletion doc/source/ray-overview/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -303,7 +303,7 @@ pip install -U "ray[rllib]" tensorflow # or torch
```
````

```{literalinclude} ../../../rllib/examples/documentation/rllib_on_ray_readme.py
```{literalinclude} ../rllib/doc_code/rllib_on_ray_readme.py
:end-before: __quick_start_end__
:language: python
:start-after: __quick_start_begin__
Expand Down
6 changes: 3 additions & 3 deletions doc/source/rllib/package_ref/env.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,11 @@ For example, if you provide a custom `gym.Env <https://github.com/openai/gym>`_

Here is a simple example:

.. literalinclude:: ../../../../rllib/examples/documentation/custom_gym_env.py
.. literalinclude:: ../doc_code/custom_gym_env.py
:language: python

.. start-after: __sphinx_doc_model_construct_1_begin__
.. end-before: __sphinx_doc_model_construct_1_end__
.. start-after: __rllib-custom-gym-env-begin__
.. end-before: __rllib-custom-gym-env-end__

However, you may also conveniently sub-class any of the other supported RLlib-specific
environment types. The automated paths from those env types (or callables returning instances of those types) to
Expand Down
4 changes: 2 additions & 2 deletions doc/source/rllib/rllib-connector.rst
Original file line number Diff line number Diff line change
Expand Up @@ -234,7 +234,7 @@ With connectors essentially checkpointing all the transformations used during tr
policies can be easily restored without the original algorithm for local inference,
as demonstrated by the following Cartpole example:

.. literalinclude:: ../../../rllib/examples/connectors/v1/run_connector_policy.py
.. literalinclude:: ../../../rllib/examples/_old_api_stack/connectors/run_connector_policy.py
:language: python
:start-after: __sphinx_doc_begin__
:end-before: __sphinx_doc_end__
Expand All @@ -253,7 +253,7 @@ different environments to work together at the same time.
Here is an example demonstrating adaptation of a policy trained for the standard Cartpole environment
for a new mock Cartpole environment that returns additional features and requires extra action inputs.

.. literalinclude:: ../../../rllib/examples/connectors/v1/adapt_connector_policy.py
.. literalinclude:: ../../../rllib/examples/_old_api_stack/connectors/adapt_connector_policy.py
:language: python
:start-after: __sphinx_doc_begin__
:end-before: __sphinx_doc_end__
Expand Down
56 changes: 28 additions & 28 deletions doc/source/rllib/rllib-examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,23 +14,8 @@ Tuned Examples
--------------

- `Tuned examples <https://github.com/ray-project/ray/blob/master/rllib/tuned_examples>`__:
Collection of tuned hyperparameters by algorithm.
- `MuJoCo and Atari benchmarks <https://github.com/ray-project/rl-experiments>`__:
Collection of reasonably optimized Atari and MuJoCo results.
Collection of tuned hyperparameters sorted by algorithm.

Blog Posts
----------

- `Attention Nets and More with RLlib’s Trajectory View API <https://medium.com/distributed-computing-with-ray/attention-nets-and-more-with-rllibs-trajectory-view-api-d326339a6e65>`__:
This blog describes RLlib's new "trajectory view API" and how it enables implementations of GTrXL (attention net) architectures.
- `Reinforcement Learning with RLlib in the Unity Game Engine <https://medium.com/distributed-computing-with-ray/reinforcement-learning-with-rllib-in-the-unity-game-engine-1a98080a7c0d>`__:
A how-to on connecting RLlib with the Unity3D game engine for running visual- and physics-based RL experiments.
- `Lessons from Implementing 12 Deep RL Algorithms in TF and PyTorch <https://medium.com/distributed-computing-with-ray/lessons-from-implementing-12-deep-rl-algorithms-in-tf-and-pytorch-1b412009297d>`__:
Discussion on how we ported 12 of RLlib's algorithms from TensorFlow to PyTorch and what we learnt on the way.
- `Scaling Multi-Agent Reinforcement Learning <http://bair.berkeley.edu/blog/2018/12/12/rllib>`__:
This blog post is a brief tutorial on multi-agent RL and its design in RLlib.
- `Functional RL with Keras and TensorFlow Eager <https://medium.com/riselab/functional-rl-with-keras-and-tensorflow-eager-7973f81d6345>`__:
Exploration of a functional paradigm for implementing reinforcement learning (RL) algorithms.

Environments and Adapters
-------------------------
Expand All @@ -47,7 +32,7 @@ Environments and Adapters
Custom- and Complex Models
--------------------------

- `Custom Keras model <https://github.com/ray-project/ray/blob/master/rllib/examples/custom_keras_model.py>`__:
- `Custom Keras model <https://github.com/ray-project/ray/blob/master/rllib/examples/_old_api_stack/custom_keras_model.py>`__:
Example of using a custom Keras model.
- `Registering a custom model with supervised loss <https://github.com/ray-project/ray/blob/master/rllib/examples/custom_model_loss_and_metrics.py>`__:
Example of defining and registering a custom model with a supervised loss.
Expand Down Expand Up @@ -83,9 +68,9 @@ Training Workflows

Evaluation:
-----------
- `Custom evaluation function <https://github.com/ray-project/ray/blob/master/rllib/examples/custom_eval.py>`__:
- `Custom evaluation function <https://github.com/ray-project/ray/blob/master/rllib/examples/evaluation/custom_evaluation.py>`__:
Example of how to write a custom evaluation function that is called instead of the default behavior, which is running with the evaluation worker set through n episodes.
- `Parallel evaluation and training <https://github.com/ray-project/ray/blob/master/rllib/examples/parallel_evaluation_and_training.py>`__:
- `Parallel evaluation and training <https://github.com/ray-project/ray/blob/master/rllib/examples/evaluation/evaluation_parallel_to_training.py>`__:
Example showing how the evaluation workers and the "normal" rollout workers can run (to some extend) in parallel to speed up training.


Expand Down Expand Up @@ -113,23 +98,23 @@ Serving and Offline
Multi-Agent and Hierarchical
----------------------------

- `Simple independent multi-agent setup vs a PettingZoo env <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_independent_learning.py>`__:
- `Simple independent multi-agent setup vs a PettingZoo env <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_and_self_play/independent_learning.py>`__:
Setup RLlib to run any algorithm in (independent) multi-agent mode against a multi-agent environment.
- `More complex (shared-parameter) multi-agent setup vs a PettingZoo env <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_parameter_sharing.py>`__:
- `More complex (shared-parameter) multi-agent setup vs a PettingZoo env <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_and_self_play/parameter_sharing.py>`__:
Setup RLlib to run any algorithm in (shared-parameter) multi-agent mode against a multi-agent environment.
- `Rock-paper-scissors <https://github.com/ray-project/ray/blob/master/rllib/examples/rock_paper_scissors_multiagent.py>`__:
- `Rock-paper-scissors <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_and_self_play/rock_paper_scissors.py>`__:
Example of different heuristic and learned policies competing against each other in rock-paper-scissors.
- `Two-step game <https://github.com/ray-project/ray/blob/master/rllib/examples/two_step_game.py>`__:
- `Two-step game <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_and_self_play/two_step_game.py>`__:
Example of the two-step game from the `QMIX paper <https://arxiv.org/pdf/1803.11485.pdf>`__.
- `PettingZoo multi-agent example <https://github.com/Farama-Foundation/PettingZoo/blob/master/tutorials/Ray/rllib_pistonball.py>`__:
Example on how to use RLlib to learn in `PettingZoo <https://www.pettingzoo.ml>`__ multi-agent environments.
- `PPO with centralized critic on two-step game <https://github.com/ray-project/ray/blob/master/rllib/examples/centralized_critic.py>`__:
Example of customizing PPO to leverage a centralized value function.
- `Centralized critic in the env <https://github.com/ray-project/ray/blob/master/rllib/examples/centralized_critic_2.py>`__:
A simpler method of implementing a centralized critic by augmentating agent observations with global information.
- `Hand-coded policy <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_custom_policy.py>`__:
- `Hand-coded policy <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_and_self_play/custom_heuristic_rl_module.py>`__:
Example of running a custom hand-coded policy alongside trainable policies.
- `Weight sharing between policies <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_cartpole.py>`__:
- `Weight sharing between policies <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_and_self_play/multi_agent_cartpole.py>`__:
Example of how to define weight-sharing layers between two different policies.
- `Multiple algorithms <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_two_trainers.py>`__:
Example of alternating training between DQN and PPO.
Expand All @@ -140,11 +125,11 @@ Multi-Agent and Hierarchical
Special Action- and Observation Spaces
--------------------------------------

- `Nested action spaces <https://github.com/ray-project/ray/blob/master/rllib/examples/nested_action_spaces.py>`__:
- `Nested action spaces <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/connector_v2_nested_action_spaces.py>`__:
Learning in arbitrarily nested action spaces.
- `Parametric actions <https://github.com/ray-project/ray/blob/master/rllib/examples/parametric_actions_cartpole.py>`__:
- `Parametric actions <https://github.com/ray-project/ray/blob/master/rllib/examples/_old_api_stack/parametric_actions_cartpole.py>`__:
Example of how to handle variable-length or parametric action spaces.
- `Using the "Repeated" space of RLlib for variable lengths observations <https://github.com/ray-project/ray/blob/master/rllib/examples/complex_struct_space.py>`__:
- `Using the "Repeated" space of RLlib for variable lengths observations <https://github.com/ray-project/ray/blob/master/rllib/examples/_old_api_stack/complex_struct_space.py>`__:
How to use RLlib's `Repeated` space to handle variable length observations.
- `Autoregressive action distribution example <https://github.com/ray-project/ray/blob/master/rllib/examples/autoregressive_action_dist.py>`__:
Learning with auto-regressive action dependencies (e.g. 2 action components; distribution for 2nd component depends on the 1st component's actually sampled value).
Expand Down Expand Up @@ -185,3 +170,18 @@ Community Examples
Example of training in StarCraft2 maps with RLlib / multi-agent.
- `Traffic Flow <https://berkeleyflow.readthedocs.io/en/latest/flow_setup.html>`__:
Example of optimizing mixed-autonomy traffic simulations with RLlib / multi-agent.


Blog Posts
----------

- `Attention Nets and More with RLlib’s Trajectory View API <https://medium.com/distributed-computing-with-ray/attention-nets-and-more-with-rllibs-trajectory-view-api-d326339a6e65>`__:
This blog describes RLlib's new "trajectory view API" and how it enables implementations of GTrXL (attention net) architectures.
sven1977 marked this conversation as resolved.
Show resolved Hide resolved
- `Reinforcement Learning with RLlib in the Unity Game Engine <https://medium.com/distributed-computing-with-ray/reinforcement-learning-with-rllib-in-the-unity-game-engine-1a98080a7c0d>`__:
A how-to on connecting RLlib with the Unity3D game engine for running visual- and physics-based RL experiments.
sven1977 marked this conversation as resolved.
Show resolved Hide resolved
- `Lessons from Implementing 12 Deep RL Algorithms in TF and PyTorch <https://medium.com/distributed-computing-with-ray/lessons-from-implementing-12-deep-rl-algorithms-in-tf-and-pytorch-1b412009297d>`__:
Discussion on how we ported 12 of RLlib's algorithms from TensorFlow to PyTorch and what we learnt on the way.
sven1977 marked this conversation as resolved.
Show resolved Hide resolved
- `Scaling Multi-Agent Reinforcement Learning <http://bair.berkeley.edu/blog/2018/12/12/rllib>`__:
This blog post is a brief tutorial on multi-agent RL and its design in RLlib.
sven1977 marked this conversation as resolved.
Show resolved Hide resolved
- `Functional RL with Keras and TensorFlow Eager <https://medium.com/riselab/functional-rl-with-keras-and-tensorflow-eager-7973f81d6345>`__:
Exploration of a functional paradigm for implementing reinforcement learning (RL) algorithms.
10 changes: 5 additions & 5 deletions doc/source/rllib/rllib-replay-buffers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ Here are three ways of specifying a type:
.. dropdown:: **Changing a replay buffer configuration**
:animate: fade-in-slide-down

.. literalinclude:: ../../../rllib/examples/documentation/replay_buffer_demo.py
.. literalinclude:: doc_code/replay_buffer_demo.py
:language: python
:start-after: __sphinx_doc_replay_buffer_type_specification__begin__
:end-before: __sphinx_doc_replay_buffer_type_specification__end__
Expand Down Expand Up @@ -98,7 +98,7 @@ Advanced buffer types add functionality while trying to retain compatibility thr
The following is an example of the most basic scheme of interaction with a :py:class:`~ray.rllib.utils.replay_buffers.replay_buffer.ReplayBuffer`.


.. literalinclude:: ../../../rllib/examples/documentation/replay_buffer_demo.py
.. literalinclude:: doc_code/replay_buffer_demo.py
:language: python
:start-after: __sphinx_doc_replay_buffer_basic_interaction__begin__
:end-before: __sphinx_doc_replay_buffer_basic_interaction__end__
Expand All @@ -109,7 +109,7 @@ Building your own ReplayBuffer

Here is an example of how to implement your own toy example of a ReplayBuffer class and make SimpleQ use it:

.. literalinclude:: ../../../rllib/examples/documentation/replay_buffer_demo.py
.. literalinclude:: doc_code/replay_buffer_demo.py
:language: python
:start-after: __sphinx_doc_replay_buffer_own_buffer__begin__
:end-before: __sphinx_doc_replay_buffer_own_buffer__end__
Expand All @@ -128,7 +128,7 @@ When later calling the ``sample()`` method, num_items will relate to said storag

Here is a full example of how to modify the storage_unit and interact with a custom buffer:

.. literalinclude:: ../../../rllib/examples/documentation/replay_buffer_demo.py
.. literalinclude:: doc_code/replay_buffer_demo.py
:language: python
:start-after: __sphinx_doc_replay_buffer_advanced_usage_storage_unit__begin__
:end-before: __sphinx_doc_replay_buffer_advanced_usage_storage_unit__end__
Expand All @@ -141,7 +141,7 @@ the same way as the parent's config.
Here is an example of how to create an :py:class:`~ray.rllib.utils.replay_buffers.multi_agent_replay_buffer.MultiAgentReplayBuffer` with an alternative underlying :py:class:`~ray.rllib.utils.replay_buffers.replay_buffer.ReplayBuffer`.
The :py:class:`~ray.rllib.utils.replay_buffers.multi_agent_replay_buffer.MultiAgentReplayBuffer` can stay the same. We only need to specify our own buffer along with a default call argument:

.. literalinclude:: ../../../rllib/examples/documentation/replay_buffer_demo.py
.. literalinclude:: doc_code/replay_buffer_demo.py
:language: python
:start-after: __sphinx_doc_replay_buffer_advanced_usage_underlying_buffers__begin__
:end-before: __sphinx_doc_replay_buffer_advanced_usage_underlying_buffers__end__
Loading
Loading