[RLlib] Add log-std clipping to 'MLPHead's. #47827

simonsays1980 · 2024-09-26T10:14:52Z

Why are these changes needed?

Many implementation of continuous control algorithms suffer from instabilities in training where the log standard deviation takes on extreme values (most often very small values) and lead to numerical overflow in backward calculations (see this discussion).

These instabilities can be partially controlled for by using a log standard deviation that can freely move and acts like a bias (i.e. it is not trained within the neural networks, but still optimized during training). Nevertheless, clipping standard deviations still is an often used technique to stabilize training further.

This PR proposes a clip parameter for the logg standard deviation and applies it in all MLPHeads and therefore in all algorithms that use continuous actions. More specifically:

It introduces a new parameter log_std_clip_param in the rllib/models/catalog.py.
It sets this parameter by default to inf, i.e. factually no clipping
It passes this model config parameter into any continuous actions policy head (i.e. in PPO, APPO, IMPALA, SAC, BC, and MARWIL; note, DreamerV3 already uses log std clipping)

Related issue number

Closes #46442

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

… that can use continuous action distributions. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

sven1977 · 2024-09-26T12:38:36Z

rllib/core/models/configs.py

@@ -181,6 +181,9 @@ class _MLPConfig(ModelConfig):
    output_layer_bias_initializer: Optional[Union[str, Callable]] = None
    output_layer_bias_initializer_config: Optional[Dict] = None

+    # Optional clip parameter for the log standard deviation.
+    log_std_clip_param: float = float("inf")


Could we set this to 20 by default?

Computer says no ... Yes, we can :)

sven1977 · 2024-09-26T12:39:17Z

rllib/algorithms/bc/bc_catalog.py

@@ -95,6 +95,7 @@ def build_pi_head(self, framework: str) -> Model:
            hidden_layer_activation=self.pi_head_activation,
            output_layer_dim=required_output_dim,
            output_layer_activation="linear",
+            log_std_clip_param=self._model_config_dict["log_std_clip_param"],


What if the user doesn't define this in model_config_dict?

Then the default sets in doesn't it? (rllib/models/catalog.py)

rllib/models/catalog.py

sven1977 · 2024-09-26T12:41:54Z

rllib/models/catalog.py

+    # very small or large log standard deviations leading to numerical instabilities
+    # which can turn network outputs to `nan`. The default is infinity, i.e. no
+    # clipping.
+    "log_std_clip_param": float("inf"),


Ah, ok, here it is. But still, as we'll soon get rid of the old stack model config dict, we should be defensive against users bringing their own model_config_dict to BC or other algos.

@sven1977 you are right. I am a bit reluctant to bring up an intermediate solution that does not hold in general for all other attributes of the model_config_dict. I thought with the AlgorithmConfig._model_config_auto_includes we solved the problem - it still uses the old rllib/models/catalog.py defaults but can in near future be replaced by another logic (imo we will not be able to get around some default model config to ensure that (a) users do not need to provide all inputs and (b) to enable generality such that we do not have to add all configs to all algorithms from anew.

We could add log_std_clip_param to the overridden model_config_auto_includes. We also have a default value in the MLPHeadConfig.

sven1977

Looks good to me. Just 1-2 nits/change requests (default should be 20, not inf).

…'clip_log_std' to enable no clipping for value heads. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

…e that we have no categorical distribution when applying log std clipping. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

…tant is on the same device as the network output. Furthermore, fixed a bug where the newly registered buffer was not used. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

…MODEL_DEFAULT'. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>

Added log-std clipping to 'MLPHeads', '_MLPConfig' and all algorithms…

9a6b2dd

… that can use continuous action distributions. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

simonsays1980 marked this pull request as ready for review September 26, 2024 10:15

simonsays1980 requested review from sven1977 and ArturNiederfahrenhorst as code owners September 26, 2024 10:15

simonsays1980 changed the title ~~[RLlib] - Added log-std clipping to 'MLPHead's.~~ [RLlib] - Add log-std clipping to 'MLPHead's. Sep 26, 2024

sven1977 reviewed Sep 26, 2024

View reviewed changes

rllib/models/catalog.py Outdated Show resolved Hide resolved

sven1977 reviewed Sep 26, 2024

View reviewed changes

sven1977 approved these changes Sep 26, 2024

View reviewed changes

sven1977 enabled auto-merge (squash) September 26, 2024 13:16

sven1977 changed the title ~~[RLlib] - Add log-std clipping to 'MLPHead's.~~ [RLlib] Add log-std clipping to 'MLPHead's. Sep 26, 2024

github-actions bot added the go add ONLY when ready to merge, run all tests label Sep 26, 2024

sven1977 disabled auto-merge September 26, 2024 13:16

simonsays1980 added 10 commits September 26, 2024 15:49

Switched default clip parameter to 20 and added additional parameter …

0fdba72

…'clip_log_std' to enable no clipping for value heads. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Added checks for diagonal Gaussian distirbutions in catalogs to ensur…

508c18f

…e that we have no categorical distribution when applying log std clipping. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Put 'log_std_clip_param' on available device.

17efbf5

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Added device also to 'MLPHead'.

06f3b23

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Merge branch 'master' into add-std-clipping-to-stabilize-learning

73554dd

Registered a buffer in PyTorch head such that device mapping is handled.

6565920

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Registered buffer also in 'FreeLogStdTorchMLPHead' to ensure the cons…

f17b6b6

…tant is on the same device as the network output. Furthermore, fixed a bug where the newly registered buffer was not used. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Used registered buffer in clipping.

50f1535

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Overhauled the docstring for std clipping.

57f6195

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Added safe guard for (old) model configs that might not rely on our '…

38627e0

…MODEL_DEFAULT'. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

sven1977 merged commit 788aecd into ray-project:master Sep 30, 2024
5 checks passed

ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024

[RLlib] Add log-std clipping to 'MLPHead's. (ray-project#47827)

e966649

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>

ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024

[RLlib] Add log-std clipping to 'MLPHead's. (ray-project#47827)

e5b078a

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>

ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024

[RLlib] Add log-std clipping to 'MLPHead's. (ray-project#47827)

79611ef

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>

ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024

[RLlib] Add log-std clipping to 'MLPHead's. (ray-project#47827)

6ac4615

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>

ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024

[RLlib] Add log-std clipping to 'MLPHead's. (ray-project#47827)

dc044c6

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>

ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024

[RLlib] Add log-std clipping to 'MLPHead's. (ray-project#47827)

5acec92

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>

ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024

[RLlib] Add log-std clipping to 'MLPHead's. (ray-project#47827)

4260777

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>

ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024

[RLlib] Add log-std clipping to 'MLPHead's. (ray-project#47827)

43a8a1d

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>

ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024

[RLlib] Add log-std clipping to 'MLPHead's. (ray-project#47827)

554195d

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Add log-std clipping to 'MLPHead's. #47827

[RLlib] Add log-std clipping to 'MLPHead's. #47827

simonsays1980 commented Sep 26, 2024 •

edited

Loading

sven1977 Sep 26, 2024

simonsays1980 Sep 26, 2024

sven1977 Sep 26, 2024

simonsays1980 Sep 26, 2024

sven1977 Sep 26, 2024

simonsays1980 Sep 26, 2024

sven1977 left a comment

[RLlib] Add log-std clipping to 'MLPHead's. #47827

[RLlib] Add log-std clipping to 'MLPHead's. #47827

Conversation

simonsays1980 commented Sep 26, 2024 • edited Loading

Why are these changes needed?

Related issue number

Checks

sven1977 Sep 26, 2024

Choose a reason for hiding this comment

simonsays1980 Sep 26, 2024

Choose a reason for hiding this comment

sven1977 Sep 26, 2024

Choose a reason for hiding this comment

simonsays1980 Sep 26, 2024

Choose a reason for hiding this comment

sven1977 Sep 26, 2024

Choose a reason for hiding this comment

simonsays1980 Sep 26, 2024

Choose a reason for hiding this comment

sven1977 left a comment

Choose a reason for hiding this comment

simonsays1980 commented Sep 26, 2024 •

edited

Loading