-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib; docs] RLlib documentation do-over (new API stack): Main index page. #48285
[RLlib; docs] RLlib documentation do-over (new API stack): Main index page. #48285
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@@ -147,7 +150,7 @@ | |||
) | |||
) | |||
|
|||
# Create the env to do inference in. | |||
# Create a env to do inference in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should bring in here loading the pipelines from checkpoint, too and using them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have another example, where we do that (the LSTM one, which requires the connector pipeline for a more sophisticated inference loop w/ state in/outs).
Signed-off-by: Sven Mika <sven@anyscale.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some style nits. Please correct if the rewrites are inaccurate, esp the ones changing passive voice to active voice.
@@ -147,7 +150,7 @@ | |||
) | |||
) | |||
|
|||
# Create the env to do inference in. | |||
# Create a env to do inference in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Create a env to do inference in. | |
# Create an env to do inference in. |
doc/source/rllib/index.rst
Outdated
RLlib is used in production by industry leaders in many different verticals, such as | ||
`gaming <https://www.anyscale.com/events/2021/06/22/using-reinforcement-learning-to-optimize-iap-offer-recommendations-in-mobile-games>`_, | ||
`robotics <https://www.anyscale.com/events/2021/06/23/introducing-amazon-sagemaker-kubeflow-reinforcement-learning-pipelines-for>`_, | ||
`finance <https://www.anyscale.com/events/2021/06/22/a-24x-speedup-for-reinforcement-learning-with-rllib-+-ray>`_, | ||
`climate control <https://www.anyscale.com/events/2021/06/23/applying-ray-and-rllib-to-real-life-industrial-use-cases>`_, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
climate control
and industrial control
links point to the same link. Is that intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed by merging them ...
<div class="termynal" data-termynal> | ||
<span data-ty="input">pip install "ray[rllib]" tensorflow torch</span> | ||
</div> | ||
For installation on computers running Apple Silicon (such as M1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For installation on computers running Apple Silicon (such as M1), | |
For installation on computers running Apple Silicon such as M1, |
<span data-ty="input">pip install "ray[rllib]" tensorflow torch</span> | ||
</div> | ||
For installation on computers running Apple Silicon (such as M1), | ||
`follow instructions here. <https://docs.ray.io/en/latest/ray-overview/installation.html#m1-mac-apple-silicon-support>`_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`follow instructions here. <https://docs.ray.io/en/latest/ray-overview/installation.html#m1-mac-apple-silicon-support>`_ | |
see `M1 Mac Support. <https://docs.ray.io/en/latest/ray-overview/installation.html#m1-mac-apple-silicon-support>`_ |
`here. <https://docs.ray.io/en/latest/ray-overview/installation.html#m1-mac-apple-silicon-support>`_ | ||
To be able to run our Atari examples, you should also install | ||
`pip install "gym[atari]" "gym[accept-rom-license]" atari_py`. | ||
To be able to run our Atari or MuJoCo examples, you also need to run: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be able to run our Atari or MuJoCo examples, you also need to run: | |
To run the Atari or MuJoCo examples, you also need to run: |
Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Signed-off-by: Sven Mika <sven@anyscale.io>
Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Signed-off-by: Sven Mika <sven@anyscale.io>
…' into documentation_do_over_index_page # Conflicts: # doc/source/rllib/rllib-algorithms.rst
…' into documentation_do_over_index_page # Conflicts: # doc/source/rllib/rllib-algorithms.rst
Signed-off-by: Sven Mika <sven@anyscale.io>
…' into documentation_do_over_index_page
Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Signed-off-by: Sven Mika <sven@anyscale.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing the comments. I must've missed the "off-policy'ness" in the first review. It'll be good to fix that if you can.
@@ -217,13 +217,13 @@ Asynchronous Proximal Policy Optimization (APPO) | |||
**APPO architecture:** APPO is an asynchronous variant of :ref:`Proximal Policy Optimization (PPO) <ppo>` based on the IMPALA architecture, | |||
but using a surrogate policy loss with clipping, allowing for multiple SGD passes per collected train batch. | |||
In a training iteration, APPO requests samples from all EnvRunners asynchronously and the collected episode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"RLlib" was my guess and the point was just to clarify who's doing the returning, if needed. If it's obvious to the reader, just ignore my suggestion.
Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Signed-off-by: Sven Mika <sven@anyscale.io>
… page. (ray-project#48285) Signed-off-by: JP-sDEV <jon.pablo80@gmail.com>
… page. (ray-project#48285) Signed-off-by: mohitjain2504 <mohit.jain@dream11.com>
Update, refactor, fix the main RLlib
index.html
page (for the new API stack).Why are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.