[RLlib] Update autoregressive actions example. #47829

simonsays1980 · 2024-09-26T17:21:58Z

Why are these changes needed?

The autoregressive actions example had an environment in which the agent ould cheat by looking only on the state when defining both actions, a1 and a2. This PR proposes a new environment to test autoregressive actions modules in which the agent has to watch both the state and the action a1 to define the action a2 optimally. Rewards are based on the absolute negative deviance between the desired action for a2 and its actual counterpart.

Furthermore, this PR introduces the ValueFunctionAPI for the AutoregressiveActionsRLM in the corresponding example which simplifies code and fixes actually an error due to the old _compute_values definition.

Related issue number

Closes #44662

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

sven1977 · 2024-09-26T17:38:22Z

rllib/core/models/configs.py

@@ -160,6 +160,13 @@ class _MLPConfig(ModelConfig):
            "_" are allowed.
        output_layer_bias_initializer_config: Configuration to pass into the
            initializer defined in `output_layer_bias_initializer`.
+        clip_log_std: If the log std should be clipped by `log_std_clip_param`.


nit: I feel like this comment is confusing. We should write that clipping is only applied to those action distribution parameters that encode the log-std for a DiagGaussian action distribution. Any other node's output (or if there is no DiagGaussian) is not clipped.

Mentioning the value function makes it confusing.

sven1977 · 2024-09-26T17:39:03Z

rllib/algorithms/sac/sac_catalog.py

@@ -187,6 +194,8 @@ def build_pi_head(self, framework: str) -> Model:
            hidden_layer_activation=self.pi_and_qf_head_activation,
            output_layer_dim=required_output_dim,
            output_layer_activation="linear",
+            clip_log_std=is_diag_gaussian,
+            log_std_clip_param=self._model_config_dict["log_std_clip_param"],


Should we do .get here to be defensive against any custom models that use custom model_config_dicts that are NOT derived from our gigantic (old) model config?

rllib/examples/envs/classes/correlated_actions_env.py

sven1977 · 2024-09-26T17:55:13Z

rllib/examples/rl_modules/autoregressive_actions_rl_module.py

@@ -100,7 +102,7 @@
    # exceeds 150 in evaluation.
    stop = {
        f"{NUM_ENV_STEPS_SAMPLED_LIFETIME}": 100000,
-        f"{EVALUATION_RESULTS}/{ENV_RUNNER_RESULTS}/{EPISODE_RETURN_MEAN}": 150.0,
+        f"{EVALUATION_RESULTS}/{ENV_RUNNER_RESULTS}/{EPISODE_RETURN_MEAN}": -0.012,


Where does it roughly start?

It roughly starts at around -0.55 - -0.6

sven1977 · 2024-09-26T17:56:31Z

rllib/examples/envs/classes/correlated_actions_env.py

+        super().reset(seed=seed)
+
+        # Randomly initialize the state between -1 and 1
+        self.state = np.random.uniform(-1, 1, size=(1,))


Nice that this can be negative, too. Makes sense!

…annot only watch the state but needs to also watch the first action. Furthermore, implemented the 'ValueFunctionAPI' in the 'AutoregressiveActionsRLM' and ran some tests. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

sven1977

LGTM! Thanks @simonsays1980 :)

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

…w 'AutoregressiveActionsEnv'. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>

simonsays1980 marked this pull request as ready for review September 26, 2024 17:22

simonsays1980 requested review from sven1977 and ArturNiederfahrenhorst as code owners September 26, 2024 17:22

sven1977 changed the title ~~[RLlib] - Update autoregressive actions example~~ [RLlib] Update autoregressive actions example. Sep 26, 2024

sven1977 reviewed Sep 26, 2024

View reviewed changes

rllib/examples/envs/classes/correlated_actions_env.py Show resolved Hide resolved

sven1977 reviewed Sep 26, 2024

View reviewed changes

simonsays1980 added 2 commits September 27, 2024 12:00

Fixed bug in stop criterium.

2b04f6c

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

simonsays1980 force-pushed the update-autoregressive-actions-setup branch from 5083233 to 2b04f6c Compare September 27, 2024 10:01

Added short comment as suggested by @sven1977.

f217156

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

sven1977 enabled auto-merge (squash) September 27, 2024 17:34

github-actions bot added the go add ONLY when ready to merge, run all tests label Sep 27, 2024

sven1977 approved these changes Sep 27, 2024

View reviewed changes

Fixed old stack autoregressive actions example.

ff0c4a1

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

github-actions bot disabled auto-merge September 28, 2024 08:40

simonsays1980 added rllib RLlib related issues rllib-models An issue related to RLlib (default or custom) Models. labels Sep 28, 2024

simonsays1980 added 3 commits September 30, 2024 08:08

Adapted stop reward in old stack autoregressive actions example to ne…

738a6f1

…w 'AutoregressiveActionsEnv'. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Merged Master

5e2f9bd

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

CHanged stop rewards also in BUILD file.

797c16b

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

sven1977 merged commit c8aa7f1 into ray-project:master Sep 30, 2024
5 checks passed

ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024

[RLlib] Update autoregressive actions example. (ray-project#47829)

1642d61

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>

ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024

[RLlib] Update autoregressive actions example. (ray-project#47829)

1344ec0

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>

ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024

[RLlib] Update autoregressive actions example. (ray-project#47829)

c63b8f8

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>

ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024

[RLlib] Update autoregressive actions example. (ray-project#47829)

dffa5d5

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>

ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024

[RLlib] Update autoregressive actions example. (ray-project#47829)

2a91a6a

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>

ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024

[RLlib] Update autoregressive actions example. (ray-project#47829)

0f9086f

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>

ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024

[RLlib] Update autoregressive actions example. (ray-project#47829)

9b55915

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>

ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024

[RLlib] Update autoregressive actions example. (ray-project#47829)

cfbda91

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>

ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024

[RLlib] Update autoregressive actions example. (ray-project#47829)

041874d

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Update autoregressive actions example. #47829

[RLlib] Update autoregressive actions example. #47829

simonsays1980 commented Sep 26, 2024 •

edited

Loading

sven1977 Sep 26, 2024

sven1977 Sep 26, 2024

sven1977 Sep 26, 2024

sven1977 Sep 26, 2024

simonsays1980 Sep 27, 2024

simonsays1980 Sep 27, 2024

sven1977 Sep 27, 2024

sven1977 Sep 26, 2024

sven1977 left a comment

[RLlib] Update autoregressive actions example. #47829

[RLlib] Update autoregressive actions example. #47829

Conversation

simonsays1980 commented Sep 26, 2024 • edited Loading

Why are these changes needed?

Related issue number

Checks

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sven1977 left a comment

Choose a reason for hiding this comment

simonsays1980 commented Sep 26, 2024 •

edited

Loading