diff --git a/doc/source/rllib/new-api-stack-migration-guide.rst b/doc/source/rllib/new-api-stack-migration-guide.rst index 5f5d2c8e7c36..525f6a984e83 100644 --- a/doc/source/rllib/new-api-stack-migration-guide.rst +++ b/doc/source/rllib/new-api-stack-migration-guide.rst @@ -6,26 +6,26 @@ .. _rllib-new-api-stack-migration-guide: -New API Stack Migration Guide +New API stack migration guide ============================= -This page explains - step by step - how to convert and translate your existing old API stack +This page explains, step by step, how to convert and translate your existing old API stack RLlib classes and code to RLlib's new API stack. -:ref:`Read here on what the new API stack is and why you should migrate to it soon `. +:ref:`Why you should migrate to the new API stack `. .. note:: - Even though the new API stack still rudimentary supports `TensorFlow `__, - RLlib is moving to `PyTorch `__ as the only supported deep learning - framework (dropping TensorFlow support entirely). - Note, though, that RLlib continues to be designed in a framework-agnostic fashion. + Even though the new API stack still provides rudimentary support for `TensorFlow `__, + RLlib supports a single deep learning framework, the `PyTorch `__ + framework, dropping TensorFlow support entirely. + Note, though, that the Ray team continues to design RLlib to be framework-agnostic. Change your AlgorithmConfig --------------------------- -The new API stack is turned off by default for all RLlib algorithms. To activate it, use the `api_stack()` method +RLlib turns off the new API stack by default for all RLlib algorithms. To activate it, use the `api_stack()` method in your `AlgorithmConfig` object like so: .. testcode:: @@ -35,10 +35,10 @@ in your `AlgorithmConfig` object like so: config = ( PPOConfig() # Switch both the new API stack flags to True (both False by default). - # This enables the use of - # a) RLModule (replaces ModelV2) and Learner (replaces Policy) - # b) the correct EnvRunner (replaces RolloutWorker) and - # ConnectorV2 pipelines (replaces old stack Connectors). + # This action enables the use of + # a) RLModule (replaces ModelV2) and Learner (replaces Policy). + # b) the correct EnvRunner, which replaces RolloutWorker, and + # ConnectorV2 pipelines, which replaces the old stack Connectors. .api_stack( enable_rl_module_and_learner=True, enable_env_runner_and_connector_v2=True, @@ -48,22 +48,21 @@ in your `AlgorithmConfig` object like so: Note that there are a few other differences between configuring an old API stack algorithm and its new stack counterpart. -Go through the following sections and make sure you are either translating the respective -settings. Settings no longer supported or needed in the new API stack should be -dropped altogether. +Go through the following sections and make sure you're translating the respective +settings. Remove settings that the new stack doesn't support or need. AlgorithmConfig.framework() ~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Even though the new API stack still rudimentary supports `TensorFlow `__, -RLlib is moving to `PyTorch `__ as the only supported deep learning framework. +Even though the new API stack still provides rudimentary support for `TensorFlow `__, +RLlib supports a single deep learning framework, the `PyTorch `__ framework. -The following framework-related settings are thus being deprecated when using the new API stack: +The new API stack deprecates the following framework-related settings: .. testcode:: - # Make sure your framework is always set to "torch" ... + # Make sure you always set the framework to "torch"... config.framework("torch") # ... and drop all tf-specific settings. @@ -78,7 +77,7 @@ The following framework-related settings are thus being deprecated when using th AlgorithmConfig.resources() ~~~~~~~~~~~~~~~~~~~~~~~~~~~ -The `num_gpus` and `_fake_gpus` settings have been deprecated. In order to place your +The `num_gpus` and `_fake_gpus` settings have been deprecated. To place your RLModule on one or more GPUs on the Learner side, do the following: .. testcode:: @@ -98,10 +97,10 @@ RLModule on one or more GPUs on the Learner side, do the following: For asynchronous algorithms like IMPALA or APPO, this setting should therefore always be >0. `See here for an example on how to train with fractional GPUs `__. -Also note that for fractional GPUs, you should always set `num_learners` to 0 or 1 +Also note that for fractional GPUs, you should always set `num_learners` to 0 or 1. -If you don't have GPUs available, but want to learn with more than one -:py:class:`~ray.rllib.core.learner.learner.Learner` in a multi-**CPU** fashion, you can do: +If GPUs aren't available, but you want to learn with more than one +:py:class:`~ray.rllib.core.learner.learner.Learner` in a multi-**CPU** fashion, you can do the following: .. testcode:: @@ -121,14 +120,14 @@ The setting `num_cpus_for_local_worker` has been renamed to `num_cpus_for_main_p AlgorithmConfig.training() ~~~~~~~~~~~~~~~~~~~~~~~~~~ -Train Batch Size +Train batch size ................ Due to the new API stack's :py:class:`~ray.rllib.core.learner.learner.Learner` worker -architecture (training may be distributed over n -:py:class:`~ray.rllib.core.learner.learner.Learner` workers), the train batch size is -now provided per individual :py:class:`~ray.rllib.core.learner.learner.Learner`. -The `train_batch_size` setting should no longer be used: +architecture, training may be distributed over n +:py:class:`~ray.rllib.core.learner.learner.Learner` workers, so RLlib provides the train batch size +per individual :py:class:`~ray.rllib.core.learner.learner.Learner`. +You should no longer use the `train_batch_size` setting: .. testcode:: @@ -137,8 +136,8 @@ The `train_batch_size` setting should no longer be used: train_batch_size_per_learner=512, ) -This way, you won't need to change this setting, even when increasing the number of -:py:class:`~ray.rllib.core.learner.learner.Learner` (through `config.learners(num_learners=...)`). +You don't need to change this setting, even when increasing the number of +:py:class:`~ray.rllib.core.learner.learner.Learner`, through `config.learners(num_learners=...)`. Note that a good rule of thumb for scaling on the learner axis is to keep the `train_batch_size_per_learner` value constant with a growing number of Learners and @@ -147,34 +146,34 @@ to increase the learning rate as follows: `lr = [original_lr] * ([num_learners] ** 0.5)` -Neural Network Configuration +Neural network configuration ............................ The old stack's `config.training(model=...)` is no longer supported on the new API stack. Instead, use the new :py:meth:`~ray.rllib.algorithms.algorithm_config.AlgorithmConfig.rl_module` method to configure RLlib's default :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule` -or specify (and configure) a custom :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule`. +or specify and configure a custom :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule`. -:ref:`See here for more a general guide on the RLModule API `, also explaining +See :ref:`RLModules API `, a general guide that also explains the use of the `config.rl_module()` method. -If you have an old stack `ModelV2` and would like to migrate the whole NN logic over to the -new stack, :ref:`see here for more details on how to do so `. +If you have an old stack `ModelV2` and want to migrate the entire NN logic to the +new stack, see :ref:`ModelV2 to RLModule ` for migration instructions. -Learning Rate- and Coefficient Schedules +Learning rate- and coefficient schedules ........................................ -If you are using schedules for your learning rate or other coefficients (for example the -`entropy_coeff` setting in PPO), provide scheduling information directly in the respective setting. -There is no specific, separate setting anymore for scheduling behavior. +If you're using schedules for learning rate or other coefficients, for example, the +`entropy_coeff` setting in PPO, provide scheduling information directly in the respective setting. +Scheduling behavior doesn't require a specific, separate setting anymore. When defining a schedule, provide a list of 2-tuples, where the first item is the global timestep -(*num_env_steps_sampled_lifetime* in the reported metrics) and the second item is the value that should -be reached at that timestep. Always start the first 2-tuple with timestep 0. Note that values between -two provided timesteps are linearly interpolated. +(*num_env_steps_sampled_lifetime* in the reported metrics) and the second item is the value that the learning rate should reach at that timestep. +Always start the first 2-tuple with timestep 0. Note that RLlib linearly interpolates values between +two provided timesteps. -For example, to create a learning rate schedule that starts with a value of 1e-5, then increases over 1M timesteps to 1e-4 and stays constant after that, do: +For example, to create a learning rate schedule that starts with a value of 1e-5, then increases over 1M timesteps to 1e-4 and stays constant after that, do the following: .. testcode:: @@ -186,10 +185,10 @@ For example, to create a learning rate schedule that starts with a value of 1e-5 ) -In the preceding example, the value after 500k timesteps is roughly `5e-5` (linear interpolation). +In the preceding example, the value after 500k timesteps is roughly `5e-5` from linear interpolation. -Another example: To create a entropy coefficient schedule that starts with a value of 0.05, then increases over 1M timesteps to 0.1 and -then suddenly drops to 0 (after the 1Mth timestep), do: +Another example is to create an entropy coefficient schedule that starts with a value of 0.05, then increases over 1M timesteps to 0.1 and +then suddenly drops to 0, after the 1Mth timestep, do the following: .. testcode:: @@ -202,17 +201,17 @@ then suddenly drops to 0 (after the 1Mth timestep), do: ) In case you need to configure a more complex learning rate scheduling behavior or chain different schedulers -into a pipeline, you can use the still experimental `_torch_lr_schedule_classes` config property. -`See this example script here for how to do this `__. -Note that this only covers learning rate schedules, but not any other coefficients. +into a pipeline, you can use the experimental `_torch_lr_schedule_classes` config property. +See `this example script `__ for more details. +Note that this example only covers learning rate schedules, but not any other coefficients. AlgorithmConfig.learners() ~~~~~~~~~~~~~~~~~~~~~~~~~~ -This method isn't used on the old API stack (b/c the old stack doesn't use Learner workers). +This method isn't used on the old API stack because the old stack doesn't use Learner workers. -It allows you to specify .. +It allows you to specify: 1) the number of `Learner` workers through `.learners(num_learners=...)`. 1) the resources per learner; use `.learners(num_gpus_per_learner=1)` for GPU training and `.learners(num_gpus_per_learner=0)` for CPU training. @@ -256,9 +255,9 @@ In case you were using the `observation_filter` setting, perform the following t .. testcode:: - # For `observation_filter="NoFilter"`, do not set anything in particular. This is the default. + # For `observation_filter="NoFilter"`, don't set anything in particular. This is the default. - # For `observation_filter="MeanStdFilter"`, do: + # For `observation_filter="MeanStdFilter"`, do the following: from ray.rllib.connectors.env_to_module import MeanStdFilter config.env_runners( @@ -269,10 +268,10 @@ In case you were using the `observation_filter` setting, perform the following t AlgorithmConfig.exploration() ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Only the `explore` setting remains supported on the new stack. -It determines, whether the :py:meth:`~ray.rllib.core.rl_module.rl_module.RLModule._forward_exploration` (in case `explore=True`) -or the :py:meth:`~ray.rllib.core.rl_module.rl_module.RLModule._forward_inference` (in case `explore=False`) method -is called on your :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule` +The new stack only supports the `explore` setting. +It determines whether the :py:meth:`~ray.rllib.core.rl_module.rl_module.RLModule._forward_exploration`, in the case `explore=True`, +or the :py:meth:`~ray.rllib.core.rl_module.rl_module.RLModule._forward_inference`, in the case `explore=False`, is the method +your :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule` calls inside the :py:class:`~ray.rllib.env.env_runner.EnvRunner`. .. testcode:: @@ -281,26 +280,26 @@ inside the :py:class:`~ray.rllib.env.env_runner.EnvRunner`. The `exploration_config` setting is deprecated and no longer used. Instead, determine the exact exploratory -behavior (for example, sample an action from a distribution) inside the overridden +behavior, for example, sample an action from a distribution, inside the overridden :py:meth:`~ray.rllib.core.rl_module.rl_module.RLModule._forward_exploration` method of your :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule`. -Custom Callbacks +Custom callbacks ---------------- -If you are using custom callbacks on the old API stack, you are subclassing the :py:class`~ray.rllib.algorithms.callbacks.DefaultCallbacks` class. -You can continue doing so on the new API stack and also pass your custom subclass to your config like so: +If you're using custom callbacks on the old API stack, you're subclassing the :py:class`~ray.rllib.algorithms.callbacks.DefaultCallbacks` class. +You can continue this approach with the new API stack and also pass your custom subclass to your config like the following: .. testcode:: # config.callbacks(YourCallbacksClass) -However, if you are overriding those methods triggered on the EnvRunner side (`on_episode_start/stop/step/etc..`), +However, if you're overriding those methods that the EnvRunner side triggered, for example,`on_episode_start/stop/step/etc...`, you might have to do a small amount of translation, because the -arguments passed into many of these methods may have changed. +EnvRunner may have changed the arguments that RLlib passes to many of these methods. -Here is a 1:1 translation guide for those types of Callbacks methods: +The following is a one-to-one translation guide for these types of Callbacks methods: .. testcode:: @@ -318,34 +317,34 @@ Here is a 1:1 translation guide for those types of Callbacks methods: env_index, rl_module, - # Old API stack args; you should no longer use/access these inside your method code. + # Old API stack args; don't use or access these inside your method code. worker=None, base_env=None, policies=None, **kwargs, ): - # The `SingleAgentEpisode` or `MultiAgentEpisode` that has just been started. - # See here for more details: https://docs.ray.io/en/latest/rllib/single-agent-episode.html + # The `SingleAgentEpisode` or `MultiAgentEpisode` that RLlib has just started. + # See https://docs.ray.io/en/latest/rllib/single-agent-episode.html for more details: print(episode) - # The `EnvRunner` class where the episode in question is being collected. - # This used to be a `RolloutWorker`, now on the new stack, this is either a + # The `EnvRunner` class that collects the episode in question. + # This class used to be a `RolloutWorker`. On the new stack, this class is either a # `SingleAgentEnvRunner` or a `MultiAgentEnvRunner` holding the gymnasium Env, - # the RLModule, and the 2 connector pipelines (env-to-module and module-to-env). + # the RLModule, and the 2 connector pipelines, env-to-module and module-to-env. print(env_runner) - # The MetricsLogger object on the EnvRunner (documentation is wip). + # The MetricsLogger object on the EnvRunner (documentation is a WIP). print(metrics_logger.peek("episode_return_mean", default=0.0)) - # The gymnasium env used for sample collection. Note that this may be a + # The gymnasium env that sample collection uses. Note that this env may be a # gymnasium.vector.VectorEnv. print(env) - # The env index (in case of a vector env) that handles the `episode`. + # The env index, in case of a vector env, that handles the `episode`. print(env_index) - # The RLModule used on this EnvRunner. Note that this may be a "plain" (single-agent) - # `RLModule` or a `MultiRLModule` (in the multi-agent case). + # The RL Module that this EnvRunner uses. Note that this module may be a "plain", single-agent + # `RLModule`, or a `MultiRLModule` in the multi-agent case. print(rl_module) # Change similarly: @@ -359,13 +358,11 @@ The following callback methods are no longer available on the new API stack: **`on_sub_environment_created()`**: The new API stack uses `Farama's gymnasium `__ vector Envs leaving no control for RLlib to call a callback on each individual env-index's creation. -**`on_create_policy()`**: This method is only called on :py:class:`~ray.rllib.evaluation.rollout_worker.RolloutWorker` -and thus no longer available on the new API stack. +**`on_create_policy()`**: This method is no longer available on the new API stack because only :py:class:`~ray.rllib.evaluation.rollout_worker.RolloutWorker` calls it. -**`on_postprocess_trajectory()`**: This method is no longer triggered and called on the new API stack, -since trajectory processing is handled entirely through :py:class:`~ray.rllib.connectors.connector_v2.ConnectorV2` -pipelines now. -The :py:class:`~ray.rllib.connectors.connector_v2.ConnectorV2` documentation is work in progress and linked from here shortly. +**`on_postprocess_trajectory()`**: The new API stack no longer triggers and calls this method, +because :py:class:`~ray.rllib.connectors.connector_v2.ConnectorV2` pipelines handle trajectory processing entirely. +The documention for :py:class:`~ray.rllib.connectors.connector_v2.ConnectorV2` documentation is under development. .. _rllib-modelv2-to-rlmodule: @@ -373,59 +370,60 @@ The :py:class:`~ray.rllib.connectors.connector_v2.ConnectorV2` documentation is ModelV2 to RLModule ------------------- -In case you are using a custom :py:class:`~ray.rllib.models.modelv2.ModelV2` class and would like to translate -the entire NN architecture and possibly action distribution logic to the new API stack, take a look at -the :ref:`RLModule documentation ` first, then come back to this location here. +If you're using a custom :py:class:`~ray.rllib.models.modelv2.ModelV2` class and want to translate +the entire NN architecture and possibly action distribution logic to the new API stack, see +:ref:`RL Modules ` in addition to this section. -Here are also helpful example scripts on `how to write a custom CNN-containing RLModule `__ -and `how to write a custom LSTM-containing RLModule `__. +See these example scripts on `how to write a custom CNN-containing RL Module `__ +and `how to write a custom LSTM-containing RL Module `__. -Also, there are different options for translating an existing, custom :py:class:`~ray.rllib.models.modelv2.ModelV2` (old API stack) +There are various options for translating an existing, custom :py:class:`~ray.rllib.models.modelv2.ModelV2` from the old API stack, to the new API stack's :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule`: -1) You lift your ModelV2 code and drop it into a new, custom RLModule class (see the :ref:`RLModule documentation ` for details). -1) You use an Algorithm checkpoint or a Policy checkpoint that you have from an old API stack training run and use this with the `new stack RLModule convenience wrapper `__. -1) You have an :py:class:`~ray.rllib.algorithms.algorithm_config.AlgorithmConfig` object from an old API stack training run and use this with the `new stack RLModule convenience wrapper `__. +1) Move your ModelV2 code to a new, custom `RLModule` class. See :ref:`RL Modules ` for details). +1) Use an Algorithm checkpoint or a Policy checkpoint that you have from an old API stack training run and use this checkpoint with the `new stack RL Module convenience wrapper `__. +1) Use an existing :py:class:`~ray.rllib.algorithms.algorithm_config.AlgorithmConfig` object from an old API stack training run, with the `new stack RL Module convenience wrapper `__. -Custom Loss Functions and/or Policies +Custom loss functions and policies ------------------------------------- -In case you are using one or more custom loss functions and/or custom (PyTorch) optimizers to train your models, instead of doing these -customizations inside the old stack's Policy class, you need to move these logic into the new API stack's +If you're using one or more custom loss functions or custom (PyTorch) optimizers to train your models, instead of doing these +customizations inside the old stack's Policy class, you need to move the logic into the new API stack's :py:class:`~ray.rllib.core.learner.learner.Learner` class. -:ref:`See here for more details on how to write a custom Learner `. +See :ref:`Learner ` for details on how to write a custom Learner . -Here are also helpful example scripts on `how to write a simple custom loss function `__ -and `how to write a custom Learner with 2 optimizers and different learning rates for each `__. +The following example scripts show how to write: +- `a simple custom loss function `__ +- `a custom Learner with 2 optimizers and different learning rates for each `__. -Note that the Policy class is no longer supported in the new API stack. This class used to hold a -neural network (now moved into :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule`), -a (old stack) connector (now moved into :py:class:`~ray.rllib.connector.connector_v2.ConnectorV2`), -and one or more optimizers and losses (now moved into :py:class:`~ray.rllib.core.learner.learner.Learner`). +Note that the new API stack doesn't support the Policy class. In the old stack, this class holds a +neural network, which is the :py:class:`~ray.rllib.core.rl_module.rl_module.RLModule` in the new API stack, +an old stack connector, which is the :py:class:`~ray.rllib.connector.connector_v2.ConnectorV2` in the new API stack, +and one or more optimizers and losses, which are the :py:class:`~ray.rllib.core.learner.learner.Learner` class in the new API stack. -The RLModule API is much more flexible than the old stack's Policy API and -provides a cleaner separation-of-concerns experience (things related to action -inference run on the EnvRunners, things related to updating run on the Learner workers) -as well as superior scalability allowing training in a multi-GPU setup in any Ray cluster -and multi-node + multi-GPU training on the `Anyscale `__ platform. +The RL Module API is much more flexible than the old stack's Policy API and +provides a cleaner separation-of-concerns experience. Things related to action +inference run on the EnvRunners, and things related to updating run on the Learner workers +It also provides superior scalability, allowing training in a multi-GPU setup in any Ray cluster +and multi-node with multi-GPU training on the `Anyscale `__ platform. -Custom (old-stack) Connectors +Custom connectors (old-stack) ----------------------------- -If you are using custom (old API stack) connectors, you have to lift your logic into the +If you're using custom connectors from the old API stack, move your logic into the new :py:class:`~ray.rllib.connectors.connector_v2.ConnectorV2` API. Translate your agent connectors into env-to-module ConnectorV2 pieces and your action connectors into module-to-env ConnectorV2 pieces. -The :py:class:`~ray.rllib.connectors.connector_v2.ConnectorV2` documentation is work in progress and linked from here shortly. +The :py:class:`~ray.rllib.connectors.connector_v2.ConnectorV2` documentation is under development. -In the meantime, take a look at some examples on how to write ConnectorV2 pieces for the +The following are some examples on how to write ConnectorV2 pieces for the different pipelines: -1) `Example on how to perform observation frame-stacking `__. -1) `Example on how to add the most recent action and reward to the RLModule's input `__. -1) `Example on how to do mean-std filtering on all observations `__. -1) `Example on how to flatten any complex observation space to a 1D space `__. +1) `Observation frame-stacking `__. +1) `Add the most recent action and reward to the RL Module's input `__. +1) `Mean-std filtering on all observations `__. +1) `Flatten any complex observation space to a 1D space `__. diff --git a/doc/source/rllib/rllib-new-api-stack.rst b/doc/source/rllib/rllib-new-api-stack.rst index 39bf56650e0c..37094aba2881 100644 --- a/doc/source/rllib/rllib-new-api-stack.rst +++ b/doc/source/rllib/rllib-new-api-stack.rst @@ -4,19 +4,19 @@ .. _rllib-new-api-stack-guide: -RLlib's New API Stack +RLlib's new API stack ===================== .. hint:: - This section describes in detail what the new API stack is and why you should migrate to it - (in case you have old API stack custom code). :ref:`See here for a comprehensive migration guide `. + This section describes the new API stack and why you should migrate to it + if you have old API stack custom code. See the :ref:`migration guide ` for details. Overview -------- -Starting in Ray 2.10, you can opt-in to the alpha version of a "new API stack", a fundamental overhaul from the ground +Starting in Ray 2.10, you can opt-in to the alpha version of the "new API stack", a fundamental overhaul from the ground up with respect to architecture, design principles, code base, and user facing APIs. The following select algorithms and setups are available. @@ -64,14 +64,14 @@ The following select algorithms and setups are available. Over the next few months, the RLlib Team continues to document, test, benchmark, bug-fix, and further polish these new APIs as well as rollout more algorithms -that you can run in the new stack (with a focus on offline RL). +that you can run in the new stack, with a focus on offline RL. -Also know that you are able to continue using your custom classes and setups -on the old API stack for the foreseeable future (beyond Ray 3.0), however, we do recommend you -to migrate to the new stack :ref:`using the migration guide here ` +You can continue using custom classes and setups +on the old API stack for the foreseeable future, beyond Ray 3.0. However, you should +migrate to the new stack with the :ref:`migration guide ` -What is the New API Stack? +New API stack -------------------------- The new API stack is the result of re-writing RLlib's core APIs from scratch and reducing @@ -98,17 +98,16 @@ The :py:class:`~ray.rllib.algorithm.algorithm_config.AlgorithmConfig` and :py:cl Who should use the new API stack? --------------------------------- -We recommend to cut over your code from old- to new API stack as soon as possible, as the -classes and APIs have reached sufficient stability at this point and we only expect very minor changes -from here on. +Migrate your code from the old to new API stack as soon as possible. +The classes and APIs are sufficiently stable. The Ray team expects very minor changes. -See here for a comprehensive migration guide allowing you to step-by-step translate your code from -old- to new API stack in no time. +See the :ref:`New API stack migration guide ` for a comprehensive migration guide with step-by-step instructions on translating your code from the +old to new API stack. -Also, the following section, lists more compelling reasons to migrate to the new stack. +A comparison of the old to new API stack provides additional motivation for migrating to the new stack. -Comparison to the Old API Stack +Comparison to the old API stack ------------------------------- This table compares features and design choices between the new and old API stack: @@ -122,7 +121,7 @@ This table compares features and design choices between the new and old API stac - **Old API Stack** * - Multi-GPU and multi-node/multi-GPU - Yes - - Yes & No + - Yes and No * - Support for shared (multi-agent) model components (e.g., communication channels, shared value functions, etc.) - Yes - No @@ -149,6 +148,6 @@ This table compares features and design choices between the new and old API stac How to Use the New API Stack? ----------------------------- -:ref:`See here for our complete and comprehensive migration guide ` -describing in detail which steps you have to take and which changes you have to apply to your -custom RLlib classes in order to migrate from the old- to the new stack. +See :ref:`New API stack migration guide ` for a complete and comprehensive migration guide +with detailed steps and changes to apply to your +custom RLlib classes to migrate from the old to the new stack. diff --git a/rllib/algorithms/algorithm_config.py b/rllib/algorithms/algorithm_config.py index 85ba7af7d872..df94b2a2177b 100644 --- a/rllib/algorithms/algorithm_config.py +++ b/rllib/algorithms/algorithm_config.py @@ -1770,7 +1770,7 @@ def env_runners( sample_timeout_s: The timeout in seconds for calling `sample()` on remote EnvRunner workers. Results (episode list) from workers that take longer than this time are discarded. Only used by algorithms that sample - synchronously in turn with their update step (e.g. PPO or DQN). Not + synchronously in turn with their update step (e.g., PPO or DQN). Not relevant for any algos that sample asynchronously, such as APPO or IMPALA. max_requests_in_flight_per_env_runner: Max number of inflight requests @@ -1780,8 +1780,8 @@ def env_runners( large sample batches, where there is the risk that the object store may fill up, causing spilling of objects to disk. This can cause any asynchronous requests to become very slow, making your experiment run - slow as well. You can inspect the object store during your experiment - via a call to ray memory on your headnode, and by using the ray + slowly as well. You can inspect the object store during your experiment + via a call to Ray memory on your head node, and by using the Ray dashboard. If you're seeing that the object store is filling up, turn down the number of remote requests in flight or enable compression. sample_collector: For the old API stack only. The SampleCollector class to @@ -3408,11 +3408,11 @@ def experimental( classes or a dictionary mapping module IDs to such a list of respective scheduler classes. Multiple scheduler classes can be applied in sequence and will be stepped in the same sequence as defined here. Note, most - learning rate schedulers need arguments to be configured, i.e. you might + learning rate schedulers need arguments to be configured, that is, you might have to partially initialize the schedulers in the list(s) using `functools.partial`. _tf_policy_handles_more_than_one_loss: Experimental flag. - If True, TFPolicy will handle more than one loss/optimizer. + If True, TFPolicy handles more than one loss or optimizer. Set this to True, if you would like to return more than one loss term from your `loss_fn` and an equal number of optimizers from your `optimizer_fn`. In the future, the default for this will be diff --git a/rllib/examples/learners/ppo_with_torch_lr_schedulers.py b/rllib/examples/learners/ppo_with_torch_lr_schedulers.py index 91e87288971c..f40b4d0d23b8 100644 --- a/rllib/examples/learners/ppo_with_torch_lr_schedulers.py +++ b/rllib/examples/learners/ppo_with_torch_lr_schedulers.py @@ -5,7 +5,7 @@ optimizer. In this way even more complex learning rate schedules can be assembled. This example shows: - - how to configure multiple learning rate schedulers (as a chained pipeline) in + - how to configure multiple learning rate schedulers, as a chained pipeline, in PyTorch using partial initialization with `functools.partial`.