Skip to content

Commit

Permalink
[RLlib] Cleanup examples folder #1. (#44067)
Browse files Browse the repository at this point in the history
  • Loading branch information
sven1977 authored Apr 2, 2024
1 parent 1dbcacb commit cb05540
Show file tree
Hide file tree
Showing 104 changed files with 3,973 additions and 6,516 deletions.
11 changes: 0 additions & 11 deletions .buildkite/rllib.rayci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -106,17 +106,6 @@ steps:
--test-env=RLLIB_NUM_GPUS=1
depends_on: rllibgpubuild

- label: ":brain: rllib: rlmodule tests"
tags: rllib_directly
instance_type: large
commands:
- bazel run //ci/ray_ci:test_in_docker -- //rllib/... rllib
--parallelism-per-worker 3
--only-tags rlm
--test-env RLLIB_ENABLE_RL_MODULE=1
--test-env RAY_USE_MULTIPROCESSING_CPU_COUNT=1
depends_on: rllibbuild

- label: ":brain: rllib: data tests"
if: build.branch != "master"
tags: data
Expand Down
2 changes: 1 addition & 1 deletion doc/source/ray-overview/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -303,7 +303,7 @@ pip install -U "ray[rllib]" tensorflow # or torch
```
````
```{literalinclude} ../../../rllib/examples/documentation/rllib_on_ray_readme.py
```{literalinclude} ../rllib/doc_code/rllib_on_ray_readme.py
:end-before: __quick_start_end__
:language: python
:start-after: __quick_start_begin__
Expand Down
File renamed without changes.
6 changes: 3 additions & 3 deletions doc/source/rllib/package_ref/env.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,11 +29,11 @@ For example, if you provide a custom `gym.Env <https://github.com/openai/gym>`_

Here is a simple example:

.. literalinclude:: ../../../../rllib/examples/documentation/custom_gym_env.py
.. literalinclude:: ../doc_code/custom_gym_env.py
:language: python

.. start-after: __sphinx_doc_model_construct_1_begin__
.. end-before: __sphinx_doc_model_construct_1_end__
.. start-after: __rllib-custom-gym-env-begin__
.. end-before: __rllib-custom-gym-env-end__
However, you may also conveniently sub-class any of the other supported RLlib-specific
environment types. The automated paths from those env types (or callables returning instances of those types) to
Expand Down
4 changes: 2 additions & 2 deletions doc/source/rllib/rllib-connector.rst
Original file line number Diff line number Diff line change
Expand Up @@ -236,7 +236,7 @@ With connectors essentially checkpointing all the transformations used during tr
policies can be easily restored without the original algorithm for local inference,
as demonstrated by the following Cartpole example:

.. literalinclude:: ../../../rllib/examples/connectors/v1/run_connector_policy.py
.. literalinclude:: ../../../rllib/examples/_old_api_stack/connectors/run_connector_policy.py
:language: python
:start-after: __sphinx_doc_begin__
:end-before: __sphinx_doc_end__
Expand All @@ -255,7 +255,7 @@ different environments to work together at the same time.
Here is an example demonstrating adaptation of a policy trained for the standard Cartpole environment
for a new mock Cartpole environment that returns additional features and requires extra action inputs.

.. literalinclude:: ../../../rllib/examples/connectors/v1/adapt_connector_policy.py
.. literalinclude:: ../../../rllib/examples/_old_api_stack/connectors/adapt_connector_policy.py
:language: python
:start-after: __sphinx_doc_begin__
:end-before: __sphinx_doc_end__
Expand Down
56 changes: 28 additions & 28 deletions doc/source/rllib/rllib-examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,23 +14,8 @@ Tuned Examples
--------------

- `Tuned examples <https://github.com/ray-project/ray/blob/master/rllib/tuned_examples>`__:
Collection of tuned hyperparameters by algorithm.
- `MuJoCo and Atari benchmarks <https://github.com/ray-project/rl-experiments>`__:
Collection of reasonably optimized Atari and MuJoCo results.
Collection of tuned hyperparameters sorted by algorithm.

Blog Posts
----------

- `Attention Nets and More with RLlib’s Trajectory View API <https://medium.com/distributed-computing-with-ray/attention-nets-and-more-with-rllibs-trajectory-view-api-d326339a6e65>`__:
This blog describes RLlib's new "trajectory view API" and how it enables implementations of GTrXL (attention net) architectures.
- `Reinforcement Learning with RLlib in the Unity Game Engine <https://medium.com/distributed-computing-with-ray/reinforcement-learning-with-rllib-in-the-unity-game-engine-1a98080a7c0d>`__:
A how-to on connecting RLlib with the Unity3D game engine for running visual- and physics-based RL experiments.
- `Lessons from Implementing 12 Deep RL Algorithms in TF and PyTorch <https://medium.com/distributed-computing-with-ray/lessons-from-implementing-12-deep-rl-algorithms-in-tf-and-pytorch-1b412009297d>`__:
Discussion on how we ported 12 of RLlib's algorithms from TensorFlow to PyTorch and what we learnt on the way.
- `Scaling Multi-Agent Reinforcement Learning <http://bair.berkeley.edu/blog/2018/12/12/rllib>`__:
This blog post is a brief tutorial on multi-agent RL and its design in RLlib.
- `Functional RL with Keras and TensorFlow Eager <https://medium.com/riselab/functional-rl-with-keras-and-tensorflow-eager-7973f81d6345>`__:
Exploration of a functional paradigm for implementing reinforcement learning (RL) algorithms.

Environments and Adapters
-------------------------
Expand All @@ -47,7 +32,7 @@ Environments and Adapters
Custom- and Complex Models
--------------------------

- `Custom Keras model <https://github.com/ray-project/ray/blob/master/rllib/examples/custom_keras_model.py>`__:
- `Custom Keras model <https://github.com/ray-project/ray/blob/master/rllib/examples/_old_api_stack/custom_keras_model.py>`__:
Example of using a custom Keras model.
- `Registering a custom model with supervised loss <https://github.com/ray-project/ray/blob/master/rllib/examples/custom_model_loss_and_metrics.py>`__:
Example of defining and registering a custom model with a supervised loss.
Expand Down Expand Up @@ -83,9 +68,9 @@ Training Workflows

Evaluation:
-----------
- `Custom evaluation function <https://github.com/ray-project/ray/blob/master/rllib/examples/custom_eval.py>`__:
- `Custom evaluation function <https://github.com/ray-project/ray/blob/master/rllib/examples/evaluation/custom_evaluation.py>`__:
Example of how to write a custom evaluation function that is called instead of the default behavior, which is running with the evaluation worker set through n episodes.
- `Parallel evaluation and training <https://github.com/ray-project/ray/blob/master/rllib/examples/parallel_evaluation_and_training.py>`__:
- `Parallel evaluation and training <https://github.com/ray-project/ray/blob/master/rllib/examples/evaluation/evaluation_parallel_to_training.py>`__:
Example showing how the evaluation workers and the "normal" rollout workers can run (to some extend) in parallel to speed up training.


Expand Down Expand Up @@ -113,23 +98,23 @@ Serving and Offline
Multi-Agent and Hierarchical
----------------------------

- `Simple independent multi-agent setup vs a PettingZoo env <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_independent_learning.py>`__:
- `Simple independent multi-agent setup vs a PettingZoo env <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_and_self_play/independent_learning.py>`__:
Setup RLlib to run any algorithm in (independent) multi-agent mode against a multi-agent environment.
- `More complex (shared-parameter) multi-agent setup vs a PettingZoo env <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_parameter_sharing.py>`__:
- `More complex (shared-parameter) multi-agent setup vs a PettingZoo env <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_and_self_play/parameter_sharing.py>`__:
Setup RLlib to run any algorithm in (shared-parameter) multi-agent mode against a multi-agent environment.
- `Rock-paper-scissors <https://github.com/ray-project/ray/blob/master/rllib/examples/rock_paper_scissors_multiagent.py>`__:
- `Rock-paper-scissors <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_and_self_play/rock_paper_scissors.py>`__:
Example of different heuristic and learned policies competing against each other in rock-paper-scissors.
- `Two-step game <https://github.com/ray-project/ray/blob/master/rllib/examples/two_step_game.py>`__:
- `Two-step game <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_and_self_play/two_step_game.py>`__:
Example of the two-step game from the `QMIX paper <https://arxiv.org/pdf/1803.11485.pdf>`__.
- `PettingZoo multi-agent example <https://github.com/Farama-Foundation/PettingZoo/blob/master/tutorials/Ray/rllib_pistonball.py>`__:
Example on how to use RLlib to learn in `PettingZoo <https://www.pettingzoo.ml>`__ multi-agent environments.
- `PPO with centralized critic on two-step game <https://github.com/ray-project/ray/blob/master/rllib/examples/centralized_critic.py>`__:
Example of customizing PPO to leverage a centralized value function.
- `Centralized critic in the env <https://github.com/ray-project/ray/blob/master/rllib/examples/centralized_critic_2.py>`__:
A simpler method of implementing a centralized critic by augmentating agent observations with global information.
- `Hand-coded policy <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_custom_policy.py>`__:
- `Hand-coded policy <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_and_self_play/custom_heuristic_rl_module.py>`__:
Example of running a custom hand-coded policy alongside trainable policies.
- `Weight sharing between policies <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_cartpole.py>`__:
- `Weight sharing between policies <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_and_self_play/multi_agent_cartpole.py>`__:
Example of how to define weight-sharing layers between two different policies.
- `Multiple algorithms <https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_two_trainers.py>`__:
Example of alternating training between DQN and PPO.
Expand All @@ -140,11 +125,11 @@ Multi-Agent and Hierarchical
Special Action- and Observation Spaces
--------------------------------------

- `Nested action spaces <https://github.com/ray-project/ray/blob/master/rllib/examples/nested_action_spaces.py>`__:
- `Nested action spaces <https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/connector_v2_nested_action_spaces.py>`__:
Learning in arbitrarily nested action spaces.
- `Parametric actions <https://github.com/ray-project/ray/blob/master/rllib/examples/parametric_actions_cartpole.py>`__:
- `Parametric actions <https://github.com/ray-project/ray/blob/master/rllib/examples/_old_api_stack/parametric_actions_cartpole.py>`__:
Example of how to handle variable-length or parametric action spaces.
- `Using the "Repeated" space of RLlib for variable lengths observations <https://github.com/ray-project/ray/blob/master/rllib/examples/complex_struct_space.py>`__:
- `Using the "Repeated" space of RLlib for variable lengths observations <https://github.com/ray-project/ray/blob/master/rllib/examples/_old_api_stack/complex_struct_space.py>`__:
How to use RLlib's `Repeated` space to handle variable length observations.
- `Autoregressive action distribution example <https://github.com/ray-project/ray/blob/master/rllib/examples/autoregressive_action_dist.py>`__:
Learning with auto-regressive action dependencies (e.g. 2 action components; distribution for 2nd component depends on the 1st component's actually sampled value).
Expand Down Expand Up @@ -185,3 +170,18 @@ Community Examples
Example of training in StarCraft2 maps with RLlib / multi-agent.
- `Traffic Flow <https://berkeleyflow.readthedocs.io/en/latest/flow_setup.html>`__:
Example of optimizing mixed-autonomy traffic simulations with RLlib / multi-agent.


Blog Posts
----------

- `Attention Nets and More with RLlib’s Trajectory View API <https://medium.com/distributed-computing-with-ray/attention-nets-and-more-with-rllibs-trajectory-view-api-d326339a6e65>`__:
Blog describing RLlib's new "trajectory view API" and how it enables implementations of GTrXL (attention net) architectures.
- `Reinforcement Learning with RLlib in the Unity Game Engine <https://medium.com/distributed-computing-with-ray/reinforcement-learning-with-rllib-in-the-unity-game-engine-1a98080a7c0d>`__:
How-To guide about connecting RLlib with the Unity3D game engine for running visual- and physics-based RL experiments.
- `Lessons from Implementing 12 Deep RL Algorithms in TF and PyTorch <https://medium.com/distributed-computing-with-ray/lessons-from-implementing-12-deep-rl-algorithms-in-tf-and-pytorch-1b412009297d>`__:
Discussion on how the Ray Team ported 12 of RLlib's algorithms from TensorFlow to PyTorch and the lessons learned.
- `Scaling Multi-Agent Reinforcement Learning <http://bair.berkeley.edu/blog/2018/12/12/rllib>`__:
Blog post of a brief tutorial on multi-agent RL and its design in RLlib.
- `Functional RL with Keras and TensorFlow Eager <https://medium.com/riselab/functional-rl-with-keras-and-tensorflow-eager-7973f81d6345>`__:
Exploration of a functional paradigm for implementing reinforcement learning (RL) algorithms.
10 changes: 5 additions & 5 deletions doc/source/rllib/rllib-replay-buffers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ Here are three ways of specifying a type:
.. dropdown:: **Changing a replay buffer configuration**
:animate: fade-in-slide-down

.. literalinclude:: ../../../rllib/examples/documentation/replay_buffer_demo.py
.. literalinclude:: doc_code/replay_buffer_demo.py
:language: python
:start-after: __sphinx_doc_replay_buffer_type_specification__begin__
:end-before: __sphinx_doc_replay_buffer_type_specification__end__
Expand Down Expand Up @@ -102,7 +102,7 @@ Advanced buffer types add functionality while trying to retain compatibility thr
The following is an example of the most basic scheme of interaction with a :py:class:`~ray.rllib.utils.replay_buffers.replay_buffer.ReplayBuffer`.


.. literalinclude:: ../../../rllib/examples/documentation/replay_buffer_demo.py
.. literalinclude:: doc_code/replay_buffer_demo.py
:language: python
:start-after: __sphinx_doc_replay_buffer_basic_interaction__begin__
:end-before: __sphinx_doc_replay_buffer_basic_interaction__end__
Expand All @@ -113,7 +113,7 @@ Building your own ReplayBuffer

Here is an example of how to implement your own toy example of a ReplayBuffer class and make SimpleQ use it:

.. literalinclude:: ../../../rllib/examples/documentation/replay_buffer_demo.py
.. literalinclude:: doc_code/replay_buffer_demo.py
:language: python
:start-after: __sphinx_doc_replay_buffer_own_buffer__begin__
:end-before: __sphinx_doc_replay_buffer_own_buffer__end__
Expand All @@ -132,7 +132,7 @@ When later calling the ``sample()`` method, num_items will relate to said storag

Here is a full example of how to modify the storage_unit and interact with a custom buffer:

.. literalinclude:: ../../../rllib/examples/documentation/replay_buffer_demo.py
.. literalinclude:: doc_code/replay_buffer_demo.py
:language: python
:start-after: __sphinx_doc_replay_buffer_advanced_usage_storage_unit__begin__
:end-before: __sphinx_doc_replay_buffer_advanced_usage_storage_unit__end__
Expand All @@ -145,7 +145,7 @@ the same way as the parent's config.
Here is an example of how to create an :py:class:`~ray.rllib.utils.replay_buffers.multi_agent_replay_buffer.MultiAgentReplayBuffer` with an alternative underlying :py:class:`~ray.rllib.utils.replay_buffers.replay_buffer.ReplayBuffer`.
The :py:class:`~ray.rllib.utils.replay_buffers.multi_agent_replay_buffer.MultiAgentReplayBuffer` can stay the same. We only need to specify our own buffer along with a default call argument:

.. literalinclude:: ../../../rllib/examples/documentation/replay_buffer_demo.py
.. literalinclude:: doc_code/replay_buffer_demo.py
:language: python
:start-after: __sphinx_doc_replay_buffer_advanced_usage_underlying_buffers__begin__
:end-before: __sphinx_doc_replay_buffer_advanced_usage_underlying_buffers__end__
Loading

0 comments on commit cb05540

Please sign in to comment.