Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib][Docs] Restructure Policy's API page #33344

Merged
merged 6 commits into from
Mar 17, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
257 changes: 247 additions & 10 deletions doc/source/rllib/package_ref/policy.rst
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
.. _policy-reference-docs:

Policies
========
Policy API
==========

The :py:class:`~ray.rllib.policy.policy.Policy` class contains functionality to compute
actions for decision making in an environment, as well as computing loss(es) and gradients,
updating a neural network model as well as postprocessing a collected environment trajectory.
One or more :py:class:`~ray.rllib.policy.policy.Policy` objects sit inside a
:py:class:`~ray.rllib.evaluation.RolloutWorker`'s :py:class:`~ray.rllib.policy.policy_map.PolicyMap` and
:py:class:`~ray.rllib.evaluation.rollout_worker.RolloutWorker`'s :py:class:`~ray.rllib.policy.policy_map.PolicyMap` and
are - if more than one - are selected based on a multi-agent ``policy_mapping_fn``,
which maps agent IDs to a policy ID.

Expand All @@ -21,15 +21,252 @@ which maps agent IDs to a policy ID.
by sub-classing either of the available, built-in classes, depending on your
needs.

.. include::
policy/custom_policies.rst

Policy API Reference
.. currentmodule:: ray.rllib

Base Policy classes
-------------------

.. autosummary::
:toctree: doc/
:template: autosummary/class_with_autosummary.rst

~policy.policy.Policy

~policy.eager_tf_policy_v2.EagerTFPolicyV2

~policy.torch_policy_v2.TorchPolicyV2


.. --------------------------------------------

Making models
--------------------

Base Policy
~~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: doc/

~policy.Policy.make_rl_module


Torch Policy
~~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: doc/

~policy.torch_policy_v2.TorchPolicyV2.make_model
~policy.torch_policy_v2.TorchPolicyV2.make_model_and_action_dist


Tensorflow Policy
~~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: doc/

~policy.eager_tf_policy_v2.EagerTFPolicyV2.make_model

.. --------------------------------------------

Inference
--------------------

Base Policy
~~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: doc/

~policy.policy.Policy.compute_actions
~policy.policy.Policy.compute_actions_from_input_dict
~policy.policy.Policy.compute_single_action

Torch Policy
~~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: doc/

~policy.torch_policy_v2.TorchPolicyV2.action_sampler_fn
~policy.torch_policy_v2.TorchPolicyV2.action_distribution_fn
~policy.torch_policy_v2.TorchPolicyV2.extra_action_out

Tensorflow Policy
~~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: doc/

~policy.eager_tf_policy_v2.EagerTFPolicyV2.action_sampler_fn
~policy.eager_tf_policy_v2.EagerTFPolicyV2.action_distribution_fn
~policy.eager_tf_policy_v2.EagerTFPolicyV2.extra_action_out_fn

.. --------------------------------------------

Computing, processing, and applying gradients
---------------------------------------------

Base Policy
~~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: doc/

~policy.Policy.compute_gradients
~policy.Policy.apply_gradients

Torch Policy
~~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: doc/

~policy.torch_policy_v2.TorchPolicyV2.extra_compute_grad_fetches
~policy.torch_policy_v2.TorchPolicyV2.extra_grad_process


Tensorflow Policy
~~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: doc/

~policy.eager_tf_policy_v2.EagerTFPolicyV2.grad_stats_fn
~policy.eager_tf_policy_v2.EagerTFPolicyV2.compute_gradients_fn
~policy.eager_tf_policy_v2.EagerTFPolicyV2.apply_gradients_fn
~policy.eager_tf_policy_v2.EagerTFPolicyV2.extra_learn_fetches_fn



.. --------------------------------------------

Updating the Policy's model
----------------------------


Base Policy
~~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: doc/

~policy.Policy.learn_on_batch
~policy.Policy.load_batch_into_buffer
~policy.Policy.learn_on_loaded_batch
~policy.Policy.learn_on_batch_from_replay_buffer
~policy.Policy.get_num_samples_loaded_into_buffer


.. --------------------------------------------

Loss, Logging, optimizers, and trajectory processing
----------------------------------------------------

Base Policy
~~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: doc/

~policy.Policy.loss
~policy.Policy.compute_log_likelihoods
~policy.Policy.on_global_var_update
~policy.Policy.postprocess_trajectory



Torch Policy
~~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: doc/

~policy.torch_policy_v2.TorchPolicyV2.optimizer
~policy.torch_policy_v2.TorchPolicyV2.get_tower_stats


Tensorflow Policy
~~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: doc/

~policy.eager_tf_policy_v2.EagerTFPolicyV2.optimizer
~policy.eager_tf_policy_v2.EagerTFPolicyV2.stats_fn


.. --------------------------------------------

Saving and restoring
--------------------

.. toctree::
:maxdepth: 1
Base Policy
~~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: doc/

~policy.Policy.from_checkpoint
~policy.Policy.export_checkpoint
~policy.Policy.export_model
~policy.Policy.from_state
~policy.Policy.get_weights
~policy.Policy.set_weights
~policy.Policy.get_state
~policy.Policy.set_state
~policy.Policy.import_model_from_h5

.. --------------------------------------------

Connectors
--------------------

Base Policy
~~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: doc/

~policy.Policy.reset_connectors
~policy.Policy.restore_connectors
~policy.Policy.get_connector_metrics

.. --------------------------------------------

Recurrent Policies
--------------------

Base Policy
~~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: doc/

Policy.get_initial_state
Policy.num_state_tensors
Policy.is_recurrent


.. --------------------------------------------

Miscellaneous
--------------------

Base Policy
~~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: doc/

~policy.Policy.apply
~policy.Policy.get_session
~policy.Policy.init_view_requirements
~policy.Policy.get_host
~policy.Policy.get_exploration_state


Torch Policy
~~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: doc/

~policy.torch_policy_v2.TorchPolicyV2.get_batch_divisibility_req


Tensorflow Policy
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: doc/

policy/policy.rst
policy/tf_policies.rst
policy/torch_policy.rst
policy/custom_policies.rst
~policy.eager_tf_policy_v2.EagerTFPolicyV2.variables
~policy.eager_tf_policy_v2.EagerTFPolicyV2.get_batch_divisibility_req

21 changes: 11 additions & 10 deletions doc/source/rllib/package_ref/policy/custom_policies.rst
Original file line number Diff line number Diff line change
@@ -1,26 +1,27 @@
.. _custom-policies-reference-docs:

Building Custom Policy Classes
==============================
------------------------------

.. currentmodule:: ray.rllib

.. warning::
As of Ray >= 1.9, it is no longer recommended to use the ``build_policy_class()`` or
``build_tf_policy()`` utility functions for creating custom Policy sub-classes.
Instead, follow the simple guidelines here for directly sub-classing from
either one of the built-in types:
:py:class:`~ray.rllib.policy.dynamic_tf_policy.DynamicTFPolicy`
:py:class:`~policy.eager_tf_policy_v2.EagerTFPolicyV2`
or
:py:class:`~ray.rllib.policy.torch_policy.TorchPolicy`
:py:class:`~policy.torch_policy_v2.TorchPolicyV2`

In order to create a custom Policy, sub-class :py:class:`~ray.rllib.policy.policy.Policy` (for a generic,
In order to create a custom Policy, sub-class :py:class:`~policy.policy.Policy` (for a generic,
framework-agnostic policy),
:py:class:`~ray.rllib.policy.torch_policy.TorchPolicy`
:py:class:`~policy.torch_policy_v2.TorchPolicyV2`
(for a PyTorch specific policy), or
:py:class:`~ray.rllib.policy.dynamic_tf_policy.DynamicTFPolicy`
:py:class:`~policy.eager_tf_policy_v2.EagerTFPolicyV2`
(for a TensorFlow specific policy) and override one or more of their methods. Those are in particular:

* :py:meth:`~ray.rllib.policy.policy.Policy.compute_actions_from_input_dict`
* :py:meth:`~ray.rllib.policy.policy.Policy.postprocess_trajectory`
* :py:meth:`~ray.rllib.policy.policy.Policy.loss`
* :py:meth:`~policy.policy.Policy.compute_actions_from_input_dict`
* :py:meth:`~policy.policy.Policy.postprocess_trajectory`
* :py:meth:`~policy.policy.Policy.loss`

`See here for an example on how to override TorchPolicy <https://github.com/ray-project/ray/blob/master/rllib/algorithms/ppo/ppo_torch_policy.py>`_.
10 changes: 0 additions & 10 deletions doc/source/rllib/package_ref/policy/policy.rst

This file was deleted.

20 changes: 0 additions & 20 deletions doc/source/rllib/package_ref/policy/tf_policies.rst

This file was deleted.

8 changes: 0 additions & 8 deletions doc/source/rllib/package_ref/policy/torch_policy.rst

This file was deleted.

Loading