[RLlib] Add `learner_only` flag to RLModuleConfig/Spec and simplify creation of RLModule specs from algo-config. #46900

sven1977 · 2024-07-31T12:29:36Z

Add learner_only flag to RLModuleConfig/Spec and simplify creation of RLModule specs from algo-config.

The simplest (and only) path to get a (Multi)RLModuleSpec from a algo config should be:

config.get_rl_module_spec() -> single module case (or if on EnvRunner with only one policy and filtered out other modules not needed on EnvRunners, e.g. a learner_only module).
config.get_multi_rl_module_spec() -> multi-module cases (e.g. multi-agent or learners with 1 or more learner_only modules, e.g. ICM-based curiosity).

Both EnvRunnerGroup and EnvRunner now have-a get_spaces() API that allows components that do NOT have access to an env to infer a Module's obs- and action-spaces to get this information in a more unified and structured way.

Why are these changes needed?

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…learner_only_flag_to_rl_module_config

Signed-off-by: sven1977 <svenmika1977@gmail.com>

simonsays1980

LGTM. Big changes. I am not that convinced, yet, of the learner_only flag in parallel with the inference_only one. Unsure, if this brings a bit of confusion in for users.

simonsays1980 · 2024-07-31T14:52:59Z

rllib/algorithms/algorithm.py

@@ -784,42 +784,24 @@ def setup(self, config: AlgorithmConfig) -> None:
            # TODO (Rohan138): Refactor this and remove deprecated methods
            # Need to add back method_type in case Algorithm is restored from checkpoint
            method_config["type"] = method_type
-        self.learner_group = None


Hopefully, this does not error out in the Offline API. Have you tested tuned_examples/bc/cartpole_bc.py with it?

We already define this property above in the ctor, so this was actually duplicate code. Should have no effect on anything, I think.

simonsays1980 · 2024-07-31T14:53:41Z

rllib/algorithms/algorithm.py

                module_spec: MultiRLModuleSpec = self.config.get_multi_rl_module_spec(
-                    policy_dict=policy_dict
+                    spaces=self.env_runner_group.get_spaces(),


Ah, very nice!

simonsays1980 · 2024-07-31T14:55:14Z

rllib/algorithms/algorithm_config.py

+        self.algorithm_config_overrides_per_module = {}
+        # Cached, actual AlgorithmConfig objects derived from
+        # `self.algorithm_config_overrides_per_module`.
+        self._per_module_overrides: Dict[ModuleID, "AlgorithmConfig"] = {}


How does this look with hunfreds of agents?

Not sure :) It's a) cached and b) only those configs are cloned (and altered) that actually have differences wrt the "main" config. Yes, we'll absolutely have to future proof the new API stack for supporting 100s/1000s agents/modules.

simonsays1980 · 2024-07-31T14:57:36Z

rllib/algorithms/algorithm_config.py

@@ -3081,6 +3068,18 @@ def rl_module(
                observation_space, action_space, catalog_class, or the model config is
                not specified it will be inferred from the env and other parts of the
                algorithm config object.
+            algorithm_config_overrides_per_module: Only used if


Shouldn't this be better placed into multi_agent? I am a bit unsure b/c these overrides would probably only take place in a multi-agent scenario. A single agent case would simply define the config for this single module.

Good point, but in this case here, it actually does set something (the lr of the curiosity ICM) that has nothing to do with multi-agent, which is why I moved it to rl_module. Maybe a good rule for separating these config methods from here on is:

if the setting is only used in multi-agent, keep in .multi_agent() method. E.g. policy_mapping_fn.

if the setting is not limited to multi-agent cases, move to .rl_module() method. E.g. rl_module_spec (which should replace policies on the new API stack as of this PR) or algorithm_config_overrides_per_module .

simonsays1980 · 2024-07-31T15:00:13Z

rllib/algorithms/algorithm_config.py

+        if (
+            self.rollout_fragment_length != "auto"
+            and not self.in_evaluation
+            and self.total_train_batch_size > 0


I often saw that, if train_batch_size was defined, the train_batch_size_per_learner was None. The same holds true if train_batch_size is not user-defined at all. Can we fill this gap?

We should go always with train_batch_size_per_learner on the new stack.

The default behavior is:

if train_batch_size_per_learner is None, derive it from train_batch_size via: train_batch_size_per_learner = train_batch_size // num_learners.

if train_batch_size_per_learner is defined, use that on the new API stack (ignore on the old API stack)

We should maybe hard-deprecate (error) completely train_batch_size on the new API stack.

simonsays1980 · 2024-07-31T15:32:36Z

rllib/env/env_runner_group.py

        """
        # Get ID of the first remote worker.
-        worker_id = self._worker_manager.actor_ids()[0]
+        remote_worker_ids = (


simonsays1980 · 2024-07-31T15:33:38Z

rllib/env/multi_agent_env_runner.py

-            return None
+        # Create an instance of the `MultiRLModule`.
+        module_spec: MultiRLModuleSpec = self.config.get_multi_rl_module_spec(
+            env=self.env, spaces=self.get_spaces(), inference_only=True


simonsays1980 · 2024-07-31T15:33:53Z

rllib/env/single_agent_env_runner.py

-        except NotImplementedError:
-            self.module = None
+        # Create an instance of the `RLModule`.
+        module_spec: RLModuleSpec = self.config.get_rl_module_spec(


So much better!

simonsays1980 · 2024-07-31T15:34:36Z

rllib/env/single_agent_env_runner.py

+        return {
+            "__env__": (self.env.observation_space, self.env.action_space),
+            DEFAULT_MODULE_ID: (
+                self._env_to_module.observation_space,


simonsays1980 · 2024-07-31T15:40:18Z

rllib/examples/curiosity/inverse_dynamics_model_based_curiosity.py

+                        module_class=InverseDynamicsModel,
+                        # Only create the ICM on the Learner workers, NOT on the
+                        # EnvRunners.
+                        learner_only=True,


Ah I get it there are modules that have nothing which would be inference-only. Therefore they should not even be build in inference. Still these double flags are maybe not the best way to put it. It is a bit confusing - at least at the beginning.

I get your confusion. Yes, these two flags are not complementary (inference_only=False does not mean learner_only=True or vice-versa).

I can try to describe this better in the docstrings. Maybe along the lines of:

inference_only: Users have chance to streamline their custom RLModules by providing an inference_only mode that is then used by RLlib on EnvRunners. inference_only is NOT a config setting that users should need to set (or unset). Only RLlib should decide when an RLModule should be built in its inference_only=True|False state. The only exception is maybe when a user wants to do inference in production and loads the module from a checkpoint (with inference_only=True to save memory).

learner_only: This is a setting that a user can set to indicate to RLlib that the RLModule will only be used on the Learners and should never be built on any EnvRunners.

…learner_only_flag_to_rl_module_config

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…learner_only_flag_to_rl_module_config

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…learner_only_flag_to_rl_module_config

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 added 2 commits July 31, 2024 09:31

wip

aac8079

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

1e770eb

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 requested review from ArturNiederfahrenhorst and simonsays1980 as code owners July 31, 2024 12:29

sven1977 assigned simonsays1980 Jul 31, 2024

sven1977 added 3 commits July 31, 2024 15:10

wip

33e64c0

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray into add_…

7e94231

…learner_only_flag_to_rl_module_config

wip

ae53a70

Signed-off-by: sven1977 <svenmika1977@gmail.com>

simonsays1980 approved these changes Jul 31, 2024

View reviewed changes

sven1977 added 4 commits August 1, 2024 13:23

Merge branch 'master' of https://github.com/ray-project/ray into add_…

e04778b

…learner_only_flag_to_rl_module_config

wip

29e3e76

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

7c2539d

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

423853c

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 enabled auto-merge (squash) August 1, 2024 14:38

github-actions bot added the go add ONLY when ready to merge, run all tests label Aug 1, 2024

sven1977 added 2 commits August 1, 2024 18:51

fix

607a39b

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray into add_…

1e75d62

…learner_only_flag_to_rl_module_config

github-actions bot disabled auto-merge August 1, 2024 16:51

sven1977 added 2 commits August 1, 2024 23:11

wip

7e7b725

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray into add_…

2f28dba

…learner_only_flag_to_rl_module_config

sven1977 enabled auto-merge (squash) August 1, 2024 21:15

wip

4733770

Signed-off-by: sven1977 <svenmika1977@gmail.com>

github-actions bot disabled auto-merge August 2, 2024 08:01

sven1977 merged commit 3762dbb into ray-project:master Aug 2, 2024
4 of 5 checks passed

sven1977 deleted the add_learner_only_flag_to_rl_module_config branch August 2, 2024 09:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Add `learner_only` flag to RLModuleConfig/Spec and simplify creation of RLModule specs from algo-config. #46900

[RLlib] Add `learner_only` flag to RLModuleConfig/Spec and simplify creation of RLModule specs from algo-config. #46900

sven1977 commented Jul 31, 2024 •

edited

Loading

simonsays1980 left a comment

simonsays1980 Jul 31, 2024

sven1977 Aug 1, 2024

simonsays1980 Jul 31, 2024

simonsays1980 Jul 31, 2024

sven1977 Aug 1, 2024

simonsays1980 Jul 31, 2024

sven1977 Aug 1, 2024

simonsays1980 Jul 31, 2024

sven1977 Aug 1, 2024

sven1977 Aug 1, 2024

simonsays1980 Jul 31, 2024

simonsays1980 Jul 31, 2024

simonsays1980 Jul 31, 2024

simonsays1980 Jul 31, 2024

simonsays1980 Jul 31, 2024

sven1977 Aug 1, 2024

[RLlib] Add learner_only flag to RLModuleConfig/Spec and simplify creation of RLModule specs from algo-config. #46900

[RLlib] Add learner_only flag to RLModuleConfig/Spec and simplify creation of RLModule specs from algo-config. #46900

Conversation

sven1977 commented Jul 31, 2024 • edited Loading

Why are these changes needed?

Related issue number

Checks

simonsays1980 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

[RLlib] Add `learner_only` flag to RLModuleConfig/Spec and simplify creation of RLModule specs from algo-config. #46900

[RLlib] Add `learner_only` flag to RLModuleConfig/Spec and simplify creation of RLModule specs from algo-config. #46900

sven1977 commented Jul 31, 2024 •

edited

Loading