-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Add learner_only
flag to RLModuleConfig/Spec and simplify creation of RLModule specs from algo-config.
#46900
[RLlib] Add learner_only
flag to RLModuleConfig/Spec and simplify creation of RLModule specs from algo-config.
#46900
Conversation
…learner_only_flag_to_rl_module_config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Big changes. I am not that convinced, yet, of the learner_only
flag in parallel with the inference_only
one. Unsure, if this brings a bit of confusion in for users.
@@ -784,42 +784,24 @@ def setup(self, config: AlgorithmConfig) -> None: | |||
# TODO (Rohan138): Refactor this and remove deprecated methods | |||
# Need to add back method_type in case Algorithm is restored from checkpoint | |||
method_config["type"] = method_type | |||
self.learner_group = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully, this does not error out in the Offline API. Have you tested tuned_examples/bc/cartpole_bc.py
with it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already define this property above in the ctor, so this was actually duplicate code. Should have no effect on anything, I think.
module_spec: MultiRLModuleSpec = self.config.get_multi_rl_module_spec( | ||
policy_dict=policy_dict | ||
spaces=self.env_runner_group.get_spaces(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, very nice!
self.algorithm_config_overrides_per_module = {} | ||
# Cached, actual AlgorithmConfig objects derived from | ||
# `self.algorithm_config_overrides_per_module`. | ||
self._per_module_overrides: Dict[ModuleID, "AlgorithmConfig"] = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this look with hunfreds of agents?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure :) It's a) cached and b) only those configs are cloned (and altered) that actually have differences wrt the "main" config. Yes, we'll absolutely have to future proof the new API stack for supporting 100s/1000s agents/modules.
@@ -3081,6 +3068,18 @@ def rl_module( | |||
observation_space, action_space, catalog_class, or the model config is | |||
not specified it will be inferred from the env and other parts of the | |||
algorithm config object. | |||
algorithm_config_overrides_per_module: Only used if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be better placed into multi_agent
? I am a bit unsure b/c these overrides would probably only take place in a multi-agent scenario. A single agent case would simply define the config for this single module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, but in this case here, it actually does set something (the lr of the curiosity ICM) that has nothing to do with multi-agent, which is why I moved it to rl_module
. Maybe a good rule for separating these config methods from here on is:
- if the setting is only used in multi-agent, keep in
.multi_agent()
method. E.g.policy_mapping_fn
. - if the setting is not limited to multi-agent cases, move to
.rl_module()
method. E.g.rl_module_spec
(which should replacepolicies
on the new API stack as of this PR) oralgorithm_config_overrides_per_module
.
if ( | ||
self.rollout_fragment_length != "auto" | ||
and not self.in_evaluation | ||
and self.total_train_batch_size > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I often saw that, if train_batch_size
was defined, the train_batch_size_per_learner
was None
. The same holds true if train_batch_size
is not user-defined at all. Can we fill this gap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should go always with train_batch_size_per_learner
on the new stack.
The default behavior is:
if train_batch_size_per_learner
is None, derive it from train_batch_size
via: train_batch_size_per_learner = train_batch_size // num_learners
.
if train_batch_size_per_learner
is defined, use that on the new API stack (ignore on the old API stack)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should maybe hard-deprecate (error) completely train_batch_size
on the new API stack.
""" | ||
# Get ID of the first remote worker. | ||
worker_id = self._worker_manager.actor_ids()[0] | ||
remote_worker_ids = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice!
rllib/env/multi_agent_env_runner.py
Outdated
return None | ||
# Create an instance of the `MultiRLModule`. | ||
module_spec: MultiRLModuleSpec = self.config.get_multi_rl_module_spec( | ||
env=self.env, spaces=self.get_spaces(), inference_only=True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!!
rllib/env/single_agent_env_runner.py
Outdated
except NotImplementedError: | ||
self.module = None | ||
# Create an instance of the `RLModule`. | ||
module_spec: RLModuleSpec = self.config.get_rl_module_spec( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So much better!
return { | ||
"__env__": (self.env.observation_space, self.env.action_space), | ||
DEFAULT_MODULE_ID: ( | ||
self._env_to_module.observation_space, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tricky :)
module_class=InverseDynamicsModel, | ||
# Only create the ICM on the Learner workers, NOT on the | ||
# EnvRunners. | ||
learner_only=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I get it there are modules that have nothing which would be inference-only. Therefore they should not even be build in inference. Still these double flags are maybe not the best way to put it. It is a bit confusing - at least at the beginning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get your confusion. Yes, these two flags are not complementary (inference_only=False does not mean learner_only=True or vice-versa).
I can try to describe this better in the docstrings. Maybe along the lines of:
inference_only
: Users have chance to streamline their custom RLModules by providing aninference_only
mode that is then used by RLlib on EnvRunners.inference_only
is NOT a config setting that users should need to set (or unset). Only RLlib should decide when an RLModule should be built in itsinference_only=True|False
state. The only exception is maybe when a user wants to do inference in production and loads the module from a checkpoint (withinference_only=True
to save memory).learner_only
: This is a setting that a user can set to indicate to RLlib that the RLModule will only be used on the Learners and should never be built on any EnvRunners.
…learner_only_flag_to_rl_module_config
…learner_only_flag_to_rl_module_config
…learner_only_flag_to_rl_module_config
Add
learner_only
flag to RLModuleConfig/Spec and simplify creation of RLModule specs from algo-config.The simplest (and only) path to get a (Multi)RLModuleSpec from a algo config should be:
config.get_rl_module_spec()
-> single module case (or if on EnvRunner with only one policy and filtered out other modules not needed on EnvRunners, e.g. alearner_only
module).config.get_multi_rl_module_spec()
-> multi-module cases (e.g. multi-agent or learners with 1 or morelearner_only
modules, e.g. ICM-based curiosity).Both
EnvRunnerGroup
andEnvRunner
now have-aget_spaces()
API that allows components that do NOT have access to an env to infer a Module's obs- and action-spaces to get this information in a more unified and structured way.Why are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.