Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Add log-std clipping to 'MLPHead's. #47827

Merged

Conversation

simonsays1980
Copy link
Collaborator

@simonsays1980 simonsays1980 commented Sep 26, 2024

Why are these changes needed?

Many implementation of continuous control algorithms suffer from instabilities in training where the log standard deviation takes on extreme values (most often very small values) and lead to numerical overflow in backward calculations (see this discussion).

These instabilities can be partially controlled for by using a log standard deviation that can freely move and acts like a bias (i.e. it is not trained within the neural networks, but still optimized during training). Nevertheless, clipping standard deviations still is an often used technique to stabilize training further.

This PR proposes a clip parameter for the logg standard deviation and applies it in all MLPHeads and therefore in all algorithms that use continuous actions. More specifically:

  • It introduces a new parameter log_std_clip_param in the rllib/models/catalog.py.
  • It sets this parameter by default to inf, i.e. factually no clipping
  • It passes this model config parameter into any continuous actions policy head (i.e. in PPO, APPO, IMPALA, SAC, BC, and MARWIL; note, DreamerV3 already uses log std clipping)

Related issue number

Closes #46442

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

… that can use continuous action distributions.

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
@simonsays1980 simonsays1980 added enhancement Request for new feature and/or capability rllib RLlib related issues rllib-algorithms An RLlib algorithm/Trainer is not learning. rllib-models An issue related to RLlib (default or custom) Models. rllib-catalog issues related to RLlib's model catalog labels Sep 26, 2024
@simonsays1980 simonsays1980 marked this pull request as ready for review September 26, 2024 10:15
@simonsays1980 simonsays1980 changed the title [RLlib] - Added log-std clipping to 'MLPHead's. [RLlib] - Add log-std clipping to 'MLPHead's. Sep 26, 2024
@@ -181,6 +181,9 @@ class _MLPConfig(ModelConfig):
output_layer_bias_initializer: Optional[Union[str, Callable]] = None
output_layer_bias_initializer_config: Optional[Dict] = None

# Optional clip parameter for the log standard deviation.
log_std_clip_param: float = float("inf")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we set this to 20 by default?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Computer says no ... Yes, we can :)

@@ -95,6 +95,7 @@ def build_pi_head(self, framework: str) -> Model:
hidden_layer_activation=self.pi_head_activation,
output_layer_dim=required_output_dim,
output_layer_activation="linear",
log_std_clip_param=self._model_config_dict["log_std_clip_param"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the user doesn't define this in model_config_dict?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then the default sets in doesn't it? (rllib/models/catalog.py)

rllib/models/catalog.py Outdated Show resolved Hide resolved
# very small or large log standard deviations leading to numerical instabilities
# which can turn network outputs to `nan`. The default is infinity, i.e. no
# clipping.
"log_std_clip_param": float("inf"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok, here it is. But still, as we'll soon get rid of the old stack model config dict, we should be defensive against users bringing their own model_config_dict to BC or other algos.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sven1977 you are right. I am a bit reluctant to bring up an intermediate solution that does not hold in general for all other attributes of the model_config_dict. I thought with the AlgorithmConfig._model_config_auto_includes we solved the problem - it still uses the old rllib/models/catalog.py defaults but can in near future be replaced by another logic (imo we will not be able to get around some default model config to ensure that (a) users do not need to provide all inputs and (b) to enable generality such that we do not have to add all configs to all algorithms from anew.

We could add log_std_clip_param to the overridden model_config_auto_includes. We also have a default value in the MLPHeadConfig.

Copy link
Contributor

@sven1977 sven1977 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Just 1-2 nits/change requests (default should be 20, not inf).

@sven1977 sven1977 enabled auto-merge (squash) September 26, 2024 13:16
@sven1977 sven1977 changed the title [RLlib] - Add log-std clipping to 'MLPHead's. [RLlib] Add log-std clipping to 'MLPHead's. Sep 26, 2024
@github-actions github-actions bot added the go add ONLY when ready to merge, run all tests label Sep 26, 2024
…'clip_log_std' to enable no clipping for value heads.

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
…e that we have no categorical distribution when applying log std clipping.

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
…tant is on the same device as the network output. Furthermore, fixed a bug where the newly registered buffer was not used.

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
…MODEL_DEFAULT'.

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
@sven1977 sven1977 merged commit 788aecd into ray-project:master Sep 30, 2024
5 checks passed
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Request for new feature and/or capability go add ONLY when ready to merge, run all tests rllib RLlib related issues rllib-algorithms An RLlib algorithm/Trainer is not learning. rllib-catalog issues related to RLlib's model catalog rllib-models An issue related to RLlib (default or custom) Models.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RLlib]: PPO agent training error: Invalid NaN values in Normal distribution parameters
2 participants