[RLlib; docs] Redo `rllib-algorithms.rst` page. #46916

sven1977 · 2024-08-01T14:39:50Z

Redo rllib-algorithms.rst page.

New algo architecture diagrams.
New algo overview table (updated and more suited to new API stack).
Do-over of algo descriptions.

Why are these changes needed?

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…_redo_algorithms_page

Signed-off-by: sven1977 <svenmika1977@gmail.com>

simonsays1980

LGTM. SOme nits here and there. The descriptions are actually really good. Short, precise, straight-to-the-point.

simonsays1980 · 2024-08-01T16:33:32Z

doc/source/rllib/images/algos/ppo-architecture.svg

The gym.vector might led users to believe that this holds for SARL and MARL while the latter does not support vector envs, yet.

Ah, good catch. I think, we'll have to fix this limitation on MAEnvRunner soon :|

simonsays1980 · 2024-08-01T16:36:54Z

rllib/algorithms/ppo/ppo.py

+        config.training(
+            gamma=0.9, lr=0.01, kl_coeff=0.3, train_batch_size_per_learner=256
+        )
+        config.resources(num_gpus=0)


This is still ambiguous. We have num_gpus in resources, num_gpus_per_learner in learners and num_gpus_per_env_runner. What is this gpu for? Is it a gpu for the driver? Is it the total number of gpus available to env runners and learners?

simonsays1980 · 2024-08-01T16:38:47Z

doc/source/rllib/rllib-algorithms.rst

+-----------------------------------------------------------------------------+------------------------------+------------------------------------+--------------------------------+
+| **On-Policy**                                                                                                                                                                    |
+-----------------------------------------------------------------------------+------------------------------+------------------------------------+--------------------------------+
+| :ref:`PPO (Proximal Policy Optimization) <ppo>`                             | |single_agent| |multi_agent| | |multi_gpu| |multi_node_multi_gpu| | |cont_actions| |discr_actions| |


Make more clear, what is the difference between multi-gpu and multi-node-multi-gpu.

Good point!

simonsays1980 · 2024-08-01T16:39:46Z

doc/source/rllib/rllib-algorithms.rst

+-----------------------------------------------------------------------------+------------------------------+------------------------------------+--------------------------------+
+| :ref:`DQN/Rainbow (Deep Q Networks) <dqn>`                                  | |single_agent| |multi_agent| | |multi_gpu| |multi_node_multi_gpu| |                |discr_actions| |
+-----------------------------------------------------------------------------+------------------------------+------------------------------------+--------------------------------+
+| :ref:`SAC (Soft Actor Critic) <sac>`                                        | |single_agent| |multi_agent| | |multi_gpu| |multi_node_multi_gpu| | |cont_actions|                 |


That's actually not correct. Mulit-learner settings are not available for SAC - due to the multiple optimizers per learner.

simonsays1980 · 2024-08-01T16:42:06Z

doc/source/rllib/rllib-algorithms.rst

+    **PPO architecture:** In a training iteration, PPO performs three major steps: sampling a set of episodes or episode fragments (1),
+    converting these into a train batch and updating the model(s) using a clipped objective and multiple SGD passes over this batch (2),
+    and synching the weights from the Learners back to the EnvRunners (3).
+    PPO scales out on both axes, supporting multiple EnvRunners for sample collection and multiple GPU- or CPU-based Learner


Maybe Learner-S?

simonsays1980 · 2024-08-01T16:44:54Z

doc/source/rllib/rllib-algorithms.rst

+`Rainbow configuration <https://github.com/ray-project/ray/blob/master/rllib/tuned_examples/dqn/pong-rainbow.yaml>`__,
+`{BeamRider,Breakout,Qbert,SpaceInvaders}NoFrameskip-v4 <https://github.com/ray-project/ray/blob/master/rllib/tuned_examples/dqn/atari-dqn.yaml>`__,
+`with Dueling and Double-Q <https://github.com/ray-project/ray/blob/master/rllib/tuned_examples/dqn/atari-duel-ddqn.yaml>`__,
+`with Distributional DQN <https://github.com/ray-project/ray/blob/master/rllib/tuned_examples/dqn/atari-dist-dqn.yaml>`__.


Maybe we want to mention the rainbow architecture earlier above.

simonsays1980 · 2024-08-01T16:46:54Z

doc/source/rllib/rllib-algorithms.rst

+Tuned examples (continuous actions):
+`Pendulum-v1 <https://github.com/ray-project/ray/blob/master/rllib/tuned_examples/sac/pendulum-sac.yaml>`__,
+`HalfCheetah-v3 <https://github.com/ray-project/ray/blob/master/rllib/tuned_examples/sac/halfcheetah-sac.yaml>`__,
+Tuned examples (discrete actions):


Discrete actions are not implemented in the new stack.

True! Good catch.

The table has this correctly.

Ah alright. Must have overseen it there.

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…_redo_algorithms_page Signed-off-by: sven1977 <svenmika1977@gmail.com> # Conflicts: # rllib/examples/curiosity/inverse_dynamics_model_based_curiosity.py # rllib/examples/learners/classes/curiosity_ppo_torch_learner.py

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…ped the experiment). - maybe try to speed up things by increasing batch size and lrs. Signed-off-by: sven1977 <svenmika1977@gmail.com>

…_redo_algorithms_page

…ped the experiment). - maybe try to speed up things by increasing batch size and lrs. Signed-off-by: sven1977 <svenmika1977@gmail.com>

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…_redo_algorithms_page

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 added 4 commits July 20, 2024 17:51

wip

ca5ce76

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray into docs…

20273ba

…_redo_algorithms_page

wip

6dff892

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

a27972c

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 requested review from ArturNiederfahrenhorst, maxpumperla, simonsays1980 and a team as code owners August 1, 2024 14:39

wip

606e9c0

Signed-off-by: sven1977 <svenmika1977@gmail.com>

simonsays1980 approved these changes Aug 1, 2024

View reviewed changes

sven1977 added 14 commits August 1, 2024 20:55

wip

b4c585e

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

e2cbd94

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

f6e7094

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

066ed01

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

0d78e3e

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

b49b0b4

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

ddb2a1d

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

adde73d

Signed-off-by: sven1977 <svenmika1977@gmail.com>

learns to 0.3 reward within 800k ts (and continues learning, but stop…

8b481b8

…ped the experiment). - maybe try to speed up things by increasing batch size and lrs. Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray into docs…

7e165cb

…_redo_algorithms_page

learns to 0.3 reward within 600k ts (and continues learning, but stop…

e25b843

…ped the experiment). - maybe try to speed up things by increasing batch size and lrs. Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

4d92b26

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray into docs…

5cefdd6

…_redo_algorithms_page

sven1977 enabled auto-merge (squash) August 4, 2024 07:09

github-actions bot added the go add ONLY when ready to merge, run all tests label Aug 4, 2024

wip

d1255bf

Signed-off-by: sven1977 <svenmika1977@gmail.com>

github-actions bot disabled auto-merge August 4, 2024 07:50

sven1977 added 2 commits August 4, 2024 10:04

wip

dd7096b

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

6920d7f

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 enabled auto-merge (squash) August 4, 2024 09:10

sven1977 disabled auto-merge August 4, 2024 09:11

wip

00bf762

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 enabled auto-merge (squash) August 4, 2024 09:12

wip

f35c2b8

Signed-off-by: sven1977 <svenmika1977@gmail.com>

github-actions bot disabled auto-merge August 5, 2024 10:41

sven1977 enabled auto-merge (squash) August 5, 2024 12:08

sven1977 merged commit ccd7be4 into ray-project:master Aug 5, 2024
6 checks passed

sven1977 deleted the docs_redo_algorithms_page branch August 5, 2024 13:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib; docs] Redo `rllib-algorithms.rst` page. #46916

[RLlib; docs] Redo `rllib-algorithms.rst` page. #46916

sven1977 commented Aug 1, 2024 •

edited

Loading

simonsays1980 left a comment

simonsays1980 Aug 1, 2024

sven1977 Aug 1, 2024

simonsays1980 Aug 1, 2024

simonsays1980 Aug 1, 2024

sven1977 Aug 1, 2024

sven1977 Aug 2, 2024

simonsays1980 Aug 1, 2024

sven1977 Aug 1, 2024

simonsays1980 Aug 1, 2024

sven1977 Aug 1, 2024

simonsays1980 Aug 1, 2024

simonsays1980 Aug 1, 2024

sven1977 Aug 1, 2024

sven1977 Aug 1, 2024

simonsays1980 Aug 2, 2024

[RLlib; docs] Redo rllib-algorithms.rst page. #46916

[RLlib; docs] Redo rllib-algorithms.rst page. #46916

Conversation

sven1977 commented Aug 1, 2024 • edited Loading

Why are these changes needed?

Related issue number

Checks

simonsays1980 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

[RLlib; docs] Redo `rllib-algorithms.rst` page. #46916

[RLlib; docs] Redo `rllib-algorithms.rst` page. #46916

sven1977 commented Aug 1, 2024 •

edited

Loading