Skip to content

Commit

Permalink
[RLlib][Docs] Restructure Policy's API page (ray-project#33344)
Browse files Browse the repository at this point in the history
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Jack He <jackhe2345@gmail.com>
  • Loading branch information
kouroshHakha authored and ProjectsByJackHe committed Mar 21, 2023
1 parent 4419777 commit 863c0ce
Show file tree
Hide file tree
Showing 6 changed files with 332 additions and 132 deletions.
257 changes: 247 additions & 10 deletions doc/source/rllib/package_ref/policy.rst
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
.. _policy-reference-docs:

Policies
========
Policy API
==========

The :py:class:`~ray.rllib.policy.policy.Policy` class contains functionality to compute
actions for decision making in an environment, as well as computing loss(es) and gradients,
updating a neural network model as well as postprocessing a collected environment trajectory.
One or more :py:class:`~ray.rllib.policy.policy.Policy` objects sit inside a
:py:class:`~ray.rllib.evaluation.RolloutWorker`'s :py:class:`~ray.rllib.policy.policy_map.PolicyMap` and
:py:class:`~ray.rllib.evaluation.rollout_worker.RolloutWorker`'s :py:class:`~ray.rllib.policy.policy_map.PolicyMap` and
are - if more than one - are selected based on a multi-agent ``policy_mapping_fn``,
which maps agent IDs to a policy ID.

Expand All @@ -21,15 +21,252 @@ which maps agent IDs to a policy ID.
by sub-classing either of the available, built-in classes, depending on your
needs.

.. include::
policy/custom_policies.rst

Policy API Reference
.. currentmodule:: ray.rllib

Base Policy classes
-------------------

.. autosummary::
:toctree: doc/
:template: autosummary/class_with_autosummary.rst

~policy.policy.Policy

~policy.eager_tf_policy_v2.EagerTFPolicyV2

~policy.torch_policy_v2.TorchPolicyV2


.. --------------------------------------------
Making models
--------------------

Base Policy
~~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: doc/

~policy.Policy.make_rl_module


Torch Policy
~~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: doc/

~policy.torch_policy_v2.TorchPolicyV2.make_model
~policy.torch_policy_v2.TorchPolicyV2.make_model_and_action_dist


Tensorflow Policy
~~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: doc/

~policy.eager_tf_policy_v2.EagerTFPolicyV2.make_model

.. --------------------------------------------
Inference
--------------------

Base Policy
~~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: doc/

~policy.policy.Policy.compute_actions
~policy.policy.Policy.compute_actions_from_input_dict
~policy.policy.Policy.compute_single_action

Torch Policy
~~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: doc/

~policy.torch_policy_v2.TorchPolicyV2.action_sampler_fn
~policy.torch_policy_v2.TorchPolicyV2.action_distribution_fn
~policy.torch_policy_v2.TorchPolicyV2.extra_action_out

Tensorflow Policy
~~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: doc/

~policy.eager_tf_policy_v2.EagerTFPolicyV2.action_sampler_fn
~policy.eager_tf_policy_v2.EagerTFPolicyV2.action_distribution_fn
~policy.eager_tf_policy_v2.EagerTFPolicyV2.extra_action_out_fn

.. --------------------------------------------
Computing, processing, and applying gradients
---------------------------------------------

Base Policy
~~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: doc/

~policy.Policy.compute_gradients
~policy.Policy.apply_gradients

Torch Policy
~~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: doc/

~policy.torch_policy_v2.TorchPolicyV2.extra_compute_grad_fetches
~policy.torch_policy_v2.TorchPolicyV2.extra_grad_process


Tensorflow Policy
~~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: doc/

~policy.eager_tf_policy_v2.EagerTFPolicyV2.grad_stats_fn
~policy.eager_tf_policy_v2.EagerTFPolicyV2.compute_gradients_fn
~policy.eager_tf_policy_v2.EagerTFPolicyV2.apply_gradients_fn
~policy.eager_tf_policy_v2.EagerTFPolicyV2.extra_learn_fetches_fn



.. --------------------------------------------
Updating the Policy's model
----------------------------


Base Policy
~~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: doc/

~policy.Policy.learn_on_batch
~policy.Policy.load_batch_into_buffer
~policy.Policy.learn_on_loaded_batch
~policy.Policy.learn_on_batch_from_replay_buffer
~policy.Policy.get_num_samples_loaded_into_buffer


.. --------------------------------------------
Loss, Logging, optimizers, and trajectory processing
----------------------------------------------------

Base Policy
~~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: doc/

~policy.Policy.loss
~policy.Policy.compute_log_likelihoods
~policy.Policy.on_global_var_update
~policy.Policy.postprocess_trajectory



Torch Policy
~~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: doc/

~policy.torch_policy_v2.TorchPolicyV2.optimizer
~policy.torch_policy_v2.TorchPolicyV2.get_tower_stats


Tensorflow Policy
~~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: doc/

~policy.eager_tf_policy_v2.EagerTFPolicyV2.optimizer
~policy.eager_tf_policy_v2.EagerTFPolicyV2.stats_fn


.. --------------------------------------------
Saving and restoring
--------------------

.. toctree::
:maxdepth: 1
Base Policy
~~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: doc/

~policy.Policy.from_checkpoint
~policy.Policy.export_checkpoint
~policy.Policy.export_model
~policy.Policy.from_state
~policy.Policy.get_weights
~policy.Policy.set_weights
~policy.Policy.get_state
~policy.Policy.set_state
~policy.Policy.import_model_from_h5

.. --------------------------------------------
Connectors
--------------------

Base Policy
~~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: doc/

~policy.Policy.reset_connectors
~policy.Policy.restore_connectors
~policy.Policy.get_connector_metrics

.. --------------------------------------------
Recurrent Policies
--------------------

Base Policy
~~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: doc/

Policy.get_initial_state
Policy.num_state_tensors
Policy.is_recurrent


.. --------------------------------------------
Miscellaneous
--------------------

Base Policy
~~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: doc/

~policy.Policy.apply
~policy.Policy.get_session
~policy.Policy.init_view_requirements
~policy.Policy.get_host
~policy.Policy.get_exploration_state


Torch Policy
~~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: doc/

~policy.torch_policy_v2.TorchPolicyV2.get_batch_divisibility_req


Tensorflow Policy
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: doc/

policy/policy.rst
policy/tf_policies.rst
policy/torch_policy.rst
policy/custom_policies.rst
~policy.eager_tf_policy_v2.EagerTFPolicyV2.variables
~policy.eager_tf_policy_v2.EagerTFPolicyV2.get_batch_divisibility_req

21 changes: 11 additions & 10 deletions doc/source/rllib/package_ref/policy/custom_policies.rst
Original file line number Diff line number Diff line change
@@ -1,26 +1,27 @@
.. _custom-policies-reference-docs:

Building Custom Policy Classes
==============================
------------------------------

.. currentmodule:: ray.rllib

.. warning::
As of Ray >= 1.9, it is no longer recommended to use the ``build_policy_class()`` or
``build_tf_policy()`` utility functions for creating custom Policy sub-classes.
Instead, follow the simple guidelines here for directly sub-classing from
either one of the built-in types:
:py:class:`~ray.rllib.policy.dynamic_tf_policy.DynamicTFPolicy`
:py:class:`~policy.eager_tf_policy_v2.EagerTFPolicyV2`
or
:py:class:`~ray.rllib.policy.torch_policy.TorchPolicy`
:py:class:`~policy.torch_policy_v2.TorchPolicyV2`

In order to create a custom Policy, sub-class :py:class:`~ray.rllib.policy.policy.Policy` (for a generic,
In order to create a custom Policy, sub-class :py:class:`~policy.policy.Policy` (for a generic,
framework-agnostic policy),
:py:class:`~ray.rllib.policy.torch_policy.TorchPolicy`
:py:class:`~policy.torch_policy_v2.TorchPolicyV2`
(for a PyTorch specific policy), or
:py:class:`~ray.rllib.policy.dynamic_tf_policy.DynamicTFPolicy`
:py:class:`~policy.eager_tf_policy_v2.EagerTFPolicyV2`
(for a TensorFlow specific policy) and override one or more of their methods. Those are in particular:

* :py:meth:`~ray.rllib.policy.policy.Policy.compute_actions_from_input_dict`
* :py:meth:`~ray.rllib.policy.policy.Policy.postprocess_trajectory`
* :py:meth:`~ray.rllib.policy.policy.Policy.loss`
* :py:meth:`~policy.policy.Policy.compute_actions_from_input_dict`
* :py:meth:`~policy.policy.Policy.postprocess_trajectory`
* :py:meth:`~policy.policy.Policy.loss`

`See here for an example on how to override TorchPolicy <https://github.com/ray-project/ray/blob/master/rllib/algorithms/ppo/ppo_torch_policy.py>`_.
10 changes: 0 additions & 10 deletions doc/source/rllib/package_ref/policy/policy.rst

This file was deleted.

20 changes: 0 additions & 20 deletions doc/source/rllib/package_ref/policy/tf_policies.rst

This file was deleted.

8 changes: 0 additions & 8 deletions doc/source/rllib/package_ref/policy/torch_policy.rst

This file was deleted.

Loading

0 comments on commit 863c0ce

Please sign in to comment.