[RLlib][Docs] Restructure Policy's API page (ray-project#33344)

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com> Signed-off-by: Jack He <jackhe2345@gmail.com>
ProjectsByJackHe · Mar 21, 2023 · 863c0ce · 863c0ce
1 parent 4419777
commit 863c0ce
Show file tree

Hide file tree

Showing 6 changed files with 332 additions and 132 deletions.
diff --git a/doc/source/rllib/package_ref/policy.rst b/doc/source/rllib/package_ref/policy.rst
@@ -1,13 +1,13 @@
 .. _policy-reference-docs:
 
-Policies
-========
+Policy API
+==========
 
 The :py:class:`~ray.rllib.policy.policy.Policy` class contains functionality to compute
 actions for decision making in an environment, as well as computing loss(es) and gradients,
 updating a neural network model as well as postprocessing a collected environment trajectory.
 One or more :py:class:`~ray.rllib.policy.policy.Policy` objects sit inside a
-:py:class:`~ray.rllib.evaluation.RolloutWorker`'s :py:class:`~ray.rllib.policy.policy_map.PolicyMap` and
+:py:class:`~ray.rllib.evaluation.rollout_worker.RolloutWorker`'s :py:class:`~ray.rllib.policy.policy_map.PolicyMap` and
 are - if more than one - are selected based on a multi-agent ``policy_mapping_fn``,
 which maps agent IDs to a policy ID.
 
@@ -21,15 +21,252 @@ which maps agent IDs to a policy ID.
     by sub-classing either of the available, built-in classes, depending on your
     needs.
 
+.. include::
+    policy/custom_policies.rst
 
-Policy API Reference
+.. currentmodule:: ray.rllib
+
+Base Policy classes
+-------------------
+
+.. autosummary::
+   :toctree: doc/
+   :template: autosummary/class_with_autosummary.rst
+
+    ~policy.policy.Policy
+
+    ~policy.eager_tf_policy_v2.EagerTFPolicyV2
+
+    ~policy.torch_policy_v2.TorchPolicyV2
+
+
+.. --------------------------------------------
+
+Making models
+--------------------
+
+Base Policy
+~~~~~~~~~~~~~~~~~~~~
+.. autosummary::
+    :toctree: doc/
+
+    ~policy.Policy.make_rl_module
+
+
+Torch Policy
+~~~~~~~~~~~~~~~~~~~~
+.. autosummary::
+    :toctree: doc/
+
+    ~policy.torch_policy_v2.TorchPolicyV2.make_model
+    ~policy.torch_policy_v2.TorchPolicyV2.make_model_and_action_dist
+
+
+Tensorflow Policy
+~~~~~~~~~~~~~~~~~~~~
+.. autosummary::
+    :toctree: doc/
+
+    ~policy.eager_tf_policy_v2.EagerTFPolicyV2.make_model
+
+.. --------------------------------------------
+
+Inference
+--------------------
+
+Base Policy
+~~~~~~~~~~~~~~~~~~~~
+.. autosummary::
+    :toctree: doc/
+
+    ~policy.policy.Policy.compute_actions
+    ~policy.policy.Policy.compute_actions_from_input_dict
+    ~policy.policy.Policy.compute_single_action
+
+Torch Policy
+~~~~~~~~~~~~~~~~~~~~
+.. autosummary::
+    :toctree: doc/
+
+    ~policy.torch_policy_v2.TorchPolicyV2.action_sampler_fn
+    ~policy.torch_policy_v2.TorchPolicyV2.action_distribution_fn
+    ~policy.torch_policy_v2.TorchPolicyV2.extra_action_out
+
+Tensorflow Policy
+~~~~~~~~~~~~~~~~~~~~
+.. autosummary::
+    :toctree: doc/
+
+    ~policy.eager_tf_policy_v2.EagerTFPolicyV2.action_sampler_fn
+    ~policy.eager_tf_policy_v2.EagerTFPolicyV2.action_distribution_fn
+    ~policy.eager_tf_policy_v2.EagerTFPolicyV2.extra_action_out_fn
+
+.. --------------------------------------------
+
+Computing, processing, and applying gradients
+---------------------------------------------   
+
+Base Policy
+~~~~~~~~~~~~~~~~~~~~
+.. autosummary::
+    :toctree: doc/
+
+    ~policy.Policy.compute_gradients
+    ~policy.Policy.apply_gradients
+
+Torch Policy
+~~~~~~~~~~~~~~~~~~~~
+.. autosummary::
+    :toctree: doc/
+
+    ~policy.torch_policy_v2.TorchPolicyV2.extra_compute_grad_fetches
+    ~policy.torch_policy_v2.TorchPolicyV2.extra_grad_process
+
+
+Tensorflow Policy
+~~~~~~~~~~~~~~~~~~~~
+.. autosummary::
+    :toctree: doc/
+
+    ~policy.eager_tf_policy_v2.EagerTFPolicyV2.grad_stats_fn
+    ~policy.eager_tf_policy_v2.EagerTFPolicyV2.compute_gradients_fn
+    ~policy.eager_tf_policy_v2.EagerTFPolicyV2.apply_gradients_fn
+    ~policy.eager_tf_policy_v2.EagerTFPolicyV2.extra_learn_fetches_fn
+
+
+
+.. --------------------------------------------
+
+Updating the Policy's model
+----------------------------
+
+
+Base Policy
+~~~~~~~~~~~~~~~~~~~~
+.. autosummary::
+    :toctree: doc/
+
+    ~policy.Policy.learn_on_batch
+    ~policy.Policy.load_batch_into_buffer
+    ~policy.Policy.learn_on_loaded_batch
+    ~policy.Policy.learn_on_batch_from_replay_buffer
+    ~policy.Policy.get_num_samples_loaded_into_buffer
+
+
+.. --------------------------------------------
+
+Loss, Logging, optimizers, and trajectory processing
+----------------------------------------------------
+
+Base Policy
+~~~~~~~~~~~~~~~~~~~~
+.. autosummary::
+    :toctree: doc/
+
+    ~policy.Policy.loss
+    ~policy.Policy.compute_log_likelihoods
+    ~policy.Policy.on_global_var_update
+    ~policy.Policy.postprocess_trajectory
+
+
+
+Torch Policy
+~~~~~~~~~~~~~~~~~~~~
+.. autosummary::
+    :toctree: doc/
+
+    ~policy.torch_policy_v2.TorchPolicyV2.optimizer
+    ~policy.torch_policy_v2.TorchPolicyV2.get_tower_stats
+
+
+Tensorflow Policy
+~~~~~~~~~~~~~~~~~~~~
+.. autosummary::
+    :toctree: doc/
+
+    ~policy.eager_tf_policy_v2.EagerTFPolicyV2.optimizer
+    ~policy.eager_tf_policy_v2.EagerTFPolicyV2.stats_fn
+
+
+.. --------------------------------------------
+
+Saving and restoring
 --------------------
 
-.. toctree::
-   :maxdepth: 1
+Base Policy
+~~~~~~~~~~~~~~~~~~~~
+.. autosummary::
+    :toctree: doc/
+
+    ~policy.Policy.from_checkpoint
+    ~policy.Policy.export_checkpoint
+    ~policy.Policy.export_model
+    ~policy.Policy.from_state
+    ~policy.Policy.get_weights
+    ~policy.Policy.set_weights
+    ~policy.Policy.get_state
+    ~policy.Policy.set_state
+    ~policy.Policy.import_model_from_h5
+
+.. --------------------------------------------
+
+Connectors
+--------------------
+
+Base Policy
+~~~~~~~~~~~~~~~~~~~~
+.. autosummary::
+    :toctree: doc/
+
+    ~policy.Policy.reset_connectors
+    ~policy.Policy.restore_connectors
+    ~policy.Policy.get_connector_metrics
+
+.. --------------------------------------------
+
+Recurrent Policies
+--------------------
+
+Base Policy
+~~~~~~~~~~~~~~~~~~~~
+.. autosummary::
+    :toctree: doc/
+
+    Policy.get_initial_state
+    Policy.num_state_tensors
+    Policy.is_recurrent
+
+
+.. --------------------------------------------
+
+Miscellaneous
+--------------------
+
+Base Policy
+~~~~~~~~~~~~~~~~~~~~
+.. autosummary::
+    :toctree: doc/
+
+    ~policy.Policy.apply
+    ~policy.Policy.get_session
+    ~policy.Policy.init_view_requirements
+    ~policy.Policy.get_host
+    ~policy.Policy.get_exploration_state
+
+
+Torch Policy
+~~~~~~~~~~~~~~~~~~~~
+.. autosummary::
+    :toctree: doc/
+
+    ~policy.torch_policy_v2.TorchPolicyV2.get_batch_divisibility_req
+
+
+Tensorflow Policy
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autosummary::
+    :toctree: doc/
 
-   policy/policy.rst
-   policy/tf_policies.rst
-   policy/torch_policy.rst
-   policy/custom_policies.rst
+    ~policy.eager_tf_policy_v2.EagerTFPolicyV2.variables
+    ~policy.eager_tf_policy_v2.EagerTFPolicyV2.get_batch_divisibility_req
 
diff --git a/doc/source/rllib/package_ref/policy/custom_policies.rst b/doc/source/rllib/package_ref/policy/custom_policies.rst
@@ -1,26 +1,27 @@
-.. _custom-policies-reference-docs:
 
 Building Custom Policy Classes
-==============================
+------------------------------
+
+.. currentmodule:: ray.rllib
 
 .. warning::
     As of Ray >= 1.9, it is no longer recommended to use the ``build_policy_class()`` or
     ``build_tf_policy()`` utility functions for creating custom Policy sub-classes.
     Instead, follow the simple guidelines here for directly sub-classing from
     either one of the built-in types:
-    :py:class:`~ray.rllib.policy.dynamic_tf_policy.DynamicTFPolicy`
+    :py:class:`~policy.eager_tf_policy_v2.EagerTFPolicyV2`
     or
-    :py:class:`~ray.rllib.policy.torch_policy.TorchPolicy`
+    :py:class:`~policy.torch_policy_v2.TorchPolicyV2`
 
-In order to create a custom Policy, sub-class :py:class:`~ray.rllib.policy.policy.Policy` (for a generic,
+In order to create a custom Policy, sub-class :py:class:`~policy.policy.Policy` (for a generic,
 framework-agnostic policy),
-:py:class:`~ray.rllib.policy.torch_policy.TorchPolicy`
+:py:class:`~policy.torch_policy_v2.TorchPolicyV2`
 (for a PyTorch specific policy), or
-:py:class:`~ray.rllib.policy.dynamic_tf_policy.DynamicTFPolicy`
+:py:class:`~policy.eager_tf_policy_v2.EagerTFPolicyV2`
 (for a TensorFlow specific policy) and override one or more of their methods. Those are in particular:
 
-* :py:meth:`~ray.rllib.policy.policy.Policy.compute_actions_from_input_dict`
-* :py:meth:`~ray.rllib.policy.policy.Policy.postprocess_trajectory`
-* :py:meth:`~ray.rllib.policy.policy.Policy.loss`
+* :py:meth:`~policy.policy.Policy.compute_actions_from_input_dict`
+* :py:meth:`~policy.policy.Policy.postprocess_trajectory`
+* :py:meth:`~policy.policy.Policy.loss`
 
 `See here for an example on how to override TorchPolicy <https://github.com/ray-project/ray/blob/master/rllib/algorithms/ppo/ppo_torch_policy.py>`_.
diff --git a/doc/source/rllib/package_ref/policy/policy.rst b/doc/source/rllib/package_ref/policy/policy.rst
diff --git a/doc/source/rllib/package_ref/policy/tf_policies.rst b/doc/source/rllib/package_ref/policy/tf_policies.rst
diff --git a/doc/source/rllib/package_ref/policy/torch_policy.rst b/doc/source/rllib/package_ref/policy/torch_policy.rst