-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] RLModule API change: If "actions" key returned from forward_inference|exploration, use actions as-is. #36067
[RLlib] RLModule API change: If "actions" key returned from forward_inference|exploration, use actions as-is. #36067
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation looks good. We need to clean up the docs a little bit. Left some comments on my concerns.
Also I think these details should be now reflected as part of the RLModule API references under the three forward methods. That's probably where the advanced users will look for information on implementation details the most.
We need to be more clear on what the output will be treated in the doc-string of those functions (under Returns section)
doc/source/rllib/rllib-rlmodule.rst
Outdated
return a dictionary that either contains the key "actions" and/or the key "action_dist_inputs". | ||
|
||
If you return the "actions" key: | ||
* RLlib will use the actions provided thereunder directly and as-is. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The formatting is broken (these are not rendered as bullet points)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
doc/source/rllib/rllib-rlmodule.rst
Outdated
* RLlib will create a ``ray.rllib.models.distributions.Distribution`` object from the distribution parameters under that key and sample actions from the thus generated distribution. | ||
* In the case of ``forward_exploration()``, RLlib will also compute action probs and logp values from the sampled actions automatically. | ||
|
||
Note that in the case of ``forward_inference()``, the generated distributions (from returned key "action_dist_inputs") will always be made deterministic via |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make this a Note box via .. note::
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
doc/source/rllib/rllib-rlmodule.rst
Outdated
|
||
If you do not return the "actions" key: | ||
* You must return the "action_dist_inputs" key instead from your ``forward_inference()`` and ``forward_exploration()`` methods. | ||
* RLlib will create a ``ray.rllib.models.distributions.Distribution`` object from the distribution parameters under that key and sample actions from the thus generated distribution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of these classes we mention here have to be linked to their class definition API reference. You can use something like :py:class:~ray.rllib.models.distributions.Distribution
to achieve this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed all of these via :py:meth:..
and :py:class:..
links.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This added documentation is not specific to those who are migrating from policy API. We should put it under the right section. I think adding it somewhere close to Writing Custom Single Agent RL Modules would be the way to go (it requires getting rid of those policy specific numenclature)
Maybe we can consolidate your paragraph with the description of what needs to be implemented for each forward method shown here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, I moved this up into the suggested section and created a new table to explain the difference between returning the "actions" key and NOT returning the "actions" key.
Thanks for your review @kouroshHakha ! All very valid concerns and suggestions. I'll make these changes :) |
…odule_api_change_use_returned_actions_if_any
…odule_api_change_use_returned_actions_if_any
…nference|exploration, use actions as-is. (ray-project#36067) Signed-off-by: e428265 <arvind.chandramouli@lmco.com>
RLModule API change:
action_dist_inputs
are returned, use these to construct a distribution and compute action probs/logs from it.Changed the docs to reflect this.
Why are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.