Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Add learner_only flag to RLModuleConfig/Spec and simplify creation of RLModule specs from algo-config. #46900

Merged

Conversation

sven1977
Copy link
Contributor

@sven1977 sven1977 commented Jul 31, 2024

Add learner_only flag to RLModuleConfig/Spec and simplify creation of RLModule specs from algo-config.

The simplest (and only) path to get a (Multi)RLModuleSpec from a algo config should be:

  • config.get_rl_module_spec() -> single module case (or if on EnvRunner with only one policy and filtered out other modules not needed on EnvRunners, e.g. a learner_only module).
  • config.get_multi_rl_module_spec() -> multi-module cases (e.g. multi-agent or learners with 1 or more learner_only modules, e.g. ICM-based curiosity).

Both EnvRunnerGroup and EnvRunner now have-a get_spaces() API that allows components that do NOT have access to an env to infer a Module's obs- and action-spaces to get this information in a more unified and structured way.

Why are these changes needed?

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Copy link
Collaborator

@simonsays1980 simonsays1980 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Big changes. I am not that convinced, yet, of the learner_only flag in parallel with the inference_only one. Unsure, if this brings a bit of confusion in for users.

@@ -784,42 +784,24 @@ def setup(self, config: AlgorithmConfig) -> None:
# TODO (Rohan138): Refactor this and remove deprecated methods
# Need to add back method_type in case Algorithm is restored from checkpoint
method_config["type"] = method_type
self.learner_group = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully, this does not error out in the Offline API. Have you tested tuned_examples/bc/cartpole_bc.py with it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already define this property above in the ctor, so this was actually duplicate code. Should have no effect on anything, I think.

module_spec: MultiRLModuleSpec = self.config.get_multi_rl_module_spec(
policy_dict=policy_dict
spaces=self.env_runner_group.get_spaces(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, very nice!

self.algorithm_config_overrides_per_module = {}
# Cached, actual AlgorithmConfig objects derived from
# `self.algorithm_config_overrides_per_module`.
self._per_module_overrides: Dict[ModuleID, "AlgorithmConfig"] = {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this look with hunfreds of agents?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure :) It's a) cached and b) only those configs are cloned (and altered) that actually have differences wrt the "main" config. Yes, we'll absolutely have to future proof the new API stack for supporting 100s/1000s agents/modules.

@@ -3081,6 +3068,18 @@ def rl_module(
observation_space, action_space, catalog_class, or the model config is
not specified it will be inferred from the env and other parts of the
algorithm config object.
algorithm_config_overrides_per_module: Only used if
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be better placed into multi_agent? I am a bit unsure b/c these overrides would probably only take place in a multi-agent scenario. A single agent case would simply define the config for this single module.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, but in this case here, it actually does set something (the lr of the curiosity ICM) that has nothing to do with multi-agent, which is why I moved it to rl_module. Maybe a good rule for separating these config methods from here on is:

  • if the setting is only used in multi-agent, keep in .multi_agent() method. E.g. policy_mapping_fn.
  • if the setting is not limited to multi-agent cases, move to .rl_module() method. E.g. rl_module_spec (which should replace policies on the new API stack as of this PR) or algorithm_config_overrides_per_module .

if (
self.rollout_fragment_length != "auto"
and not self.in_evaluation
and self.total_train_batch_size > 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I often saw that, if train_batch_size was defined, the train_batch_size_per_learner was None. The same holds true if train_batch_size is not user-defined at all. Can we fill this gap?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should go always with train_batch_size_per_learner on the new stack.

The default behavior is:

if train_batch_size_per_learner is None, derive it from train_batch_size via: train_batch_size_per_learner = train_batch_size // num_learners.

if train_batch_size_per_learner is defined, use that on the new API stack (ignore on the old API stack)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should maybe hard-deprecate (error) completely train_batch_size on the new API stack.

"""
# Get ID of the first remote worker.
worker_id = self._worker_manager.actor_ids()[0]
remote_worker_ids = (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!

return None
# Create an instance of the `MultiRLModule`.
module_spec: MultiRLModuleSpec = self.config.get_multi_rl_module_spec(
env=self.env, spaces=self.get_spaces(), inference_only=True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!!

except NotImplementedError:
self.module = None
# Create an instance of the `RLModule`.
module_spec: RLModuleSpec = self.config.get_rl_module_spec(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So much better!

return {
"__env__": (self.env.observation_space, self.env.action_space),
DEFAULT_MODULE_ID: (
self._env_to_module.observation_space,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tricky :)

module_class=InverseDynamicsModel,
# Only create the ICM on the Learner workers, NOT on the
# EnvRunners.
learner_only=True,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I get it there are modules that have nothing which would be inference-only. Therefore they should not even be build in inference. Still these double flags are maybe not the best way to put it. It is a bit confusing - at least at the beginning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get your confusion. Yes, these two flags are not complementary (inference_only=False does not mean learner_only=True or vice-versa).

I can try to describe this better in the docstrings. Maybe along the lines of:

  • inference_only: Users have chance to streamline their custom RLModules by providing an inference_only mode that is then used by RLlib on EnvRunners. inference_only is NOT a config setting that users should need to set (or unset). Only RLlib should decide when an RLModule should be built in its inference_only=True|False state. The only exception is maybe when a user wants to do inference in production and loads the module from a checkpoint (with inference_only=True to save memory).
  • learner_only: This is a setting that a user can set to indicate to RLlib that the RLModule will only be used on the Learners and should never be built on any EnvRunners.

Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
@sven1977 sven1977 enabled auto-merge (squash) August 1, 2024 14:38
@github-actions github-actions bot added the go add ONLY when ready to merge, run all tests label Aug 1, 2024
Signed-off-by: sven1977 <svenmika1977@gmail.com>
@github-actions github-actions bot disabled auto-merge August 1, 2024 16:51
Signed-off-by: sven1977 <svenmika1977@gmail.com>
@sven1977 sven1977 enabled auto-merge (squash) August 1, 2024 21:15
Signed-off-by: sven1977 <svenmika1977@gmail.com>
@github-actions github-actions bot disabled auto-merge August 2, 2024 08:01
@sven1977 sven1977 merged commit 3762dbb into ray-project:master Aug 2, 2024
4 of 5 checks passed
@sven1977 sven1977 deleted the add_learner_only_flag_to_rl_module_config branch August 2, 2024 09:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants