-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] New ConnectorV2 API #03: Introduce actual ConnectorV2
API. (#41074)
#41212
[RLlib] New ConnectorV2 API #03: Introduce actual ConnectorV2
API. (#41074)
#41212
Conversation
ConnectorV2
API. (#41074)
…runner_support_connectors_03_connectorv2_api
…runner_support_connectors_03_connectorv2_api
…runner_support_connectors_03_connectorv2_api
…runner_support_connectors_03_connectorv2_api
…v2_api Signed-off-by: Sven Mika <svenmika1977@gmail.com>
…_connectorv2_api' into env_runner_support_connectors_03_connectorv2_api # Conflicts: # rllib/algorithms/algorithm_config.py
@@ -0,0 +1,116 @@ | |||
from functools import partial |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Built-in framestacking connector to use for Atari.
Optionally move this into examples
folder as RLlib does not use this automatically (user has to explicitly configure this connector via config.env_to_module_connector = lambda env: FrameStackingEnvToModule(env=env, num_frames=4)
).
@@ -0,0 +1,134 @@ | |||
from functools import partial |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Built-in prev action/reward connector.
Optionally move this into examples folder as RLlib does not use this automatically (user has to explicitly configure this connector via config.env_to_module_connector = lambda env: PrevRewardPrevActionEnvToModule(env=env, n_prev_actions=1, n_prev_rewards=10)).
@@ -0,0 +1,75 @@ | |||
from enum import Enum |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: We'll have to see once we write the default multi-agent env-to-module and module-to-env connector logic, whether we even need these input/output types.
I'm not that sure anymore. Maybe a simple input space -> output space (as it already exists) will be sufficient. With input_space being a Dict mapping agentIDs to individual agent spaces and output space being another Dict mapping moduleIDs to individual spaces.
…runner_support_connectors_03_connectorv2_api
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am sort of stamping here @sven1977 unfortunately don’t have the time to review this big PR. I just noticed there are some variable name and documentation inconsistencies early on in the review. Please take another pass on them. The base class of connectors looks good and is inline with what we discussed. Hopefully, the next example PRs are gonna be smaller and I can see more concretely see how these tie up to each other. Thanks.
Args: | ||
num_frames: The number of observation frames to stack up (into a single | ||
observation) for the RLModule's forward pass. | ||
as_preprocessor: Whether this connector should simply postprocess the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not defined in the signature?
rllib/connectors/connector_v2.py
Outdated
self, | ||
*, | ||
rl_module: RLModule, | ||
input_: Any, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call data instead of input_?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, will change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
rllib/connectors/connector_v2.py
Outdated
explore: Whether `explore` is currently on. Per convention, if True, the | ||
RLModule's `forward_exploration` method should be called, if False, the | ||
EnvRunner should call `forward_inference` instead. | ||
persistent_data: Optional additional context data that needs to be exchanged |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call ir something else? Maybe shared_data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, will change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
…runner_support_connectors_03_connectorv2_api
…runner_support_connectors_03_connectorv2_api
READY FOR INITIAL REVIEW AND FEEDBACK. TEST CASES PENDING (TODO).
Introduce new ConnectorV2 API:
The new
ConnectorV2
API will replace the existing Connector API and introduce the following enhancements and changes:EnvRunners
to transform environment outputs (i.e.SingleAgentEpisodes
) into RLModule'sforward_exploration|inference()
batches to compute the next action.EnvRunners
to transform RLModuleforward_exploration|inference()
outputs into action(s) that can be sent to agym.Env
.Learner.update()
call.STATE_OUT
as nextSTATE_IN
within the resulting RLModule input batch. Alternatively, at the beginning of an episode,STATE_IN
will be the RLModule's initial state (stacked/repeated by the batch size). Also, for the Learner pipeline, the RLlib default connector piece will make sure all data will have the proper time-rank added.RLModule
, b) the gym.vector.Env, c) the currentexplore
(True|False) setting, d) and any data possibly passed by previous connectors (even from another pipeline). For example, anenv->module
connector might want to pick the particular single-agent RLModule to be used for a given agent ID and then let the followingmodule->env
connector pipeline know, how it picked, such that it can properly convert back from module output to actions. This way, we can eventually replace thepolicy_mapping_fn
functionality currently hardcoded intoRolloutWorker
by a ConnectorV2.AlgorithmConfig
. These callables take agym.Env
or a set of obs-action-spaces and return the respective connector pipeline (used on theEnvRunners
and theLearners
, respectively).To list some of the advantages that this new design will offer our users:
Policy.postprocess_trajectory
, trajectory view API, tons of hard-coded logic on action clipping, reward clipping, Atari frame stacking, RNN-time-rank handling and zero padding, etc... A dummy example showing how frame-stacking can be achieved with connectors is part of this PR.forward_exploration|inference()
implementation about what the algo's training step might or might not need. For example, in PPO, we can now offload the vf-predictions entirely onto the learner side and perform vf computations (including bootstrap value computations at the truncation edges of episodes) in a batched and distributed (multi-GPU) fashion.EnvRunner
main loop will simplify to:Why are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.