[RLlib] Action masking example for new API stack. #46146

simonsays1980 · 2024-06-19T15:45:44Z

Why are these changes needed?

This PR adds an example for using action masking in the new API stack to the repository. In addition it makes a small change to the SingleAgentEnvRunner to deal with Dict observation spaces.

Related issue number

Closes #44780 #44452

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

…add_env_reset' calls in 'SingleAgentEnvRunner' to deal with 'Dict' observation spaces. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

rllib/examples/rl_modules/classes/action_masking_rlm.py

sven1977

Super cool PR! One more example script TODO down.

We need to add this example to BUILD as well.

sven1977 · 2024-06-19T20:40:00Z

rllib/env/single_agent_env_runner.py

@@ -447,7 +447,7 @@ def _sample_episodes(
        obs, infos = self.env.reset()
        for env_index in range(self.num_envs):
            episodes[env_index].add_env_reset(
-                observation=obs[env_index],
+                observation=unbatch(obs)[env_index],


Strange: Shouldn't this already have caused a bug in the existing flatten obs example (which also uses a dict obs space)?

I think, the flatten obs example silently snuck around it: The unbatch(obs) was only forgotten in _sample_episodes and not in _sample_timesteps (has cost me a lot of time to figure it out thoug). Because the flatten obs does no evaluation - it never ran into it :D

Ahhh, yes, makes perfect sense. Thanks for catching this!

With ray==2.24.0, batch_mode="complete_episodes" will cause the bug. And obs = unbatch(obs) should outter the for-loop like what _sample_timesteps do:

obs, infos = self.env.reset() obs = unbatch(obs) for env_index in range(self.num_envs): episodes[env_index].add_env_reset( observation=obs[env_index],

…ngs and comments. Added action-masking and autoregressive actions examples to the BUILD. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

…ll be deprecated in very near future. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

sven1977

Thanks for the additional fixes and answers @simonsays1980 . Very nice PR!

…on masking module naming. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

…nter. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

aslonnie

approval for CI files.

…ong name in the BUILD file. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Implemented example for action masking. Added an 'unbatch(obs)' for '…

243d239

…add_env_reset' calls in 'SingleAgentEnvRunner' to deal with 'Dict' observation spaces. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

sven1977 changed the title ~~[RLlib] - Action masking example for new API stack.~~ [RLlib] Action masking example for new API stack. Jun 19, 2024

sven1977 marked this pull request as ready for review June 19, 2024 19:35

sven1977 requested review from sven1977 and ArturNiederfahrenhorst as code owners June 19, 2024 19:35