[RLlib] New ConnectorV2 API #03: Introduce actual `ConnectorV2` API. (#41074) #41212

sven1977 · 2023-11-16T19:23:15Z

READY FOR INITIAL REVIEW AND FEEDBACK. TEST CASES PENDING (TODO).

Introduce new ConnectorV2 API:

The new ConnectorV2 API will replace the existing Connector API and introduce the following enhancements and changes:

Keep as-is: A "connector" remains a callable, pluggable piece within a "connector pipeline" (which itself is a connector).
Keep as-is: A "connector pipeline" can be assembled and disassembled by adding, inserting, removing connector pieces at/from arbitrary locations.
New design: There are now 3 sub-types of connector pipelines to increase explicitness:
- Env-to-Module pipeline: Used in EnvRunners to transform environment outputs (i.e. SingleAgentEpisodes) into RLModule's forward_exploration|inference() batches to compute the next action.
- Module-to-Env pipeline: Used in EnvRunners to transform RLModule forward_exploration|inference() outputs into action(s) that can be sent to a gym.Env.
- Learner pipeline: Used inside a Learner actor to transform incoming sample batches and/or episode objects (from EnvRunners or buffers) into the final train batch used for the Learner.update() call.
New design: RLlib will automatically provide a single default connector piece at the end of each of the above 3 pipelines to always ensure that a) we will have at least the most recent observation available in the batch (under the SampleBatch.OBS key) and b) iff the RLModule is stateful, we will have the most recent RLModule output STATE_OUT as next STATE_IN within the resulting RLModule input batch. Alternatively, at the beginning of an episode, STATE_IN will be the RLModule's initial state (stacked/repeated by the batch size). Also, for the Learner pipeline, the RLlib default connector piece will make sure all data will have the proper time-rank added.
Changes in design: When calling a "connector", we pass as call args the previous connector's (within the same pipeline) outputs as well as all currently sampled Episode objects. This way, any connector has access to all data that already got stored previously in the ongoing episode, for example previous rewards/actions, recent observations or - in a multi-agent setting - the observations made by agents other than ourselves. This unlocks a range of new usecases that were previously not supported.
Changes in design: When calling a "connector", it has access to a) the current RLModule, b) the gym.vector.Env, c) the current explore (True|False) setting, d) and any data possibly passed by previous connectors (even from another pipeline). For example, an env->module connector might want to pick the particular single-agent RLModule to be used for a given agent ID and then let the following module->env connector pipeline know, how it picked, such that it can properly convert back from module output to actions. This way, we can eventually replace the policy_mapping_fn functionality currently hardcoded into RolloutWorker by a ConnectorV2.
New design: Users can now more easily configure their custom connectors, something that previously was only possible through complex callbacks and extracting the Policy object from deep inside the algo. Instead, callables can now be defined and added to the AlgorithmConfig. These callables take a gym.Env or a set of obs-action-spaces and return the respective connector pipeline (used on the EnvRunners and the Learners, respectively).

To list some of the advantages that this new design will offer our users:

Many old stack APIs will simply be replaced by connectors: Filters, Preprocessors, Policy.postprocess_trajectory, trajectory view API, tons of hard-coded logic on action clipping, reward clipping, Atari frame stacking, RNN-time-rank handling and zero padding, etc... A dummy example showing how frame-stacking can be achieved with connectors is part of this PR.
The clear separation of EnvRunner connectors (env-to-module and module-to-env) on one side and Learner connectors on the other side will allow us to NOT worry anymore within an RLModule's forward_exploration|inference() implementation about what the algo's training step might or might not need. For example, in PPO, we can now offload the vf-predictions entirely onto the learner side and perform vf computations (including bootstrap value computations at the truncation edges of episodes) in a batched and distributed (multi-GPU) fashion.
With these new connector's help, the EnvRunner main loop will simplify to:

while ts < num_timesteps:
  # Connector from env to RLModule.
  to_module = self.env_to_module(episodes=self._episodes, explore=..., rl_module=...)

  # RLModule forward calls.
  if explore:
    mod_out = self.module.forward_exploration(to_module)
  else:
    mod_out = self.module.forward_inference(to_module)

  # Connector from RLModule back to env.
  to_env = self.module_to_env(
    input=mod_out, episodes=self._episodes, explore=..., rl_module=...
  )

  # Native gym.vector.Env `step()` call.
  obs, rewards, terminateds, truncateds, infos = (
    self.env.step(to_env[SampleBatch.ACTIONS])
  )
  ts += self.num_envs

Why are these changes needed?

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…runner_support_connectors_03_connectorv2_api

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…runner_support_connectors_03_connectorv2_api

…v2_api Signed-off-by: Sven Mika <svenmika1977@gmail.com>

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…_connectorv2_api' into env_runner_support_connectors_03_connectorv2_api # Conflicts: # rllib/algorithms/algorithm_config.py

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 · 2023-12-19T17:40:16Z

rllib/connectors/env_to_module/frame_stacking.py

@@ -0,0 +1,116 @@
+from functools import partial


Built-in framestacking connector to use for Atari.

Optionally move this into examples folder as RLlib does not use this automatically (user has to explicitly configure this connector via config.env_to_module_connector = lambda env: FrameStackingEnvToModule(env=env, num_frames=4)).

sven1977 · 2023-12-19T17:41:17Z

rllib/connectors/env_to_module/prev_action_prev_reward.py

@@ -0,0 +1,134 @@
+from functools import partial


Built-in prev action/reward connector.

Optionally move this into examples folder as RLlib does not use this automatically (user has to explicitly configure this connector via config.env_to_module_connector = lambda env: PrevRewardPrevActionEnvToModule(env=env, n_prev_actions=1, n_prev_rewards=10)).

sven1977 · 2023-12-19T17:42:38Z

rllib/connectors/input_output_types.py

@@ -0,0 +1,75 @@
+from enum import Enum


Note: We'll have to see once we write the default multi-agent env-to-module and module-to-env connector logic, whether we even need these input/output types.
I'm not that sure anymore. Maybe a simple input space -> output space (as it already exists) will be sufficient. With input_space being a Dict mapping agentIDs to individual agent spaces and output space being another Dict mapping moduleIDs to individual spaces.

…runner_support_connectors_03_connectorv2_api

Signed-off-by: sven1977 <svenmika1977@gmail.com>

kouroshHakha

I am sort of stamping here @sven1977 unfortunately don’t have the time to review this big PR. I just noticed there are some variable name and documentation inconsistencies early on in the review. Please take another pass on them. The base class of connectors looks good and is inline with what we discussed. Hopefully, the next example PRs are gonna be smaller and I can see more concretely see how these tie up to each other. Thanks.

kouroshHakha · 2023-12-20T16:10:56Z

rllib/connectors/common/frame_stacking.py

+        Args:
+            num_frames: The number of observation frames to stack up (into a single
+                observation) for the RLModule's forward pass.
+            as_preprocessor: Whether this connector should simply postprocess the


This is not defined in the signature?

kouroshHakha · 2023-12-20T17:19:49Z

rllib/connectors/connector_v2.py

+        self,
+        *,
+        rl_module: RLModule,
+        input_: Any,


Call data instead of input_?

Ok, will change.

kouroshHakha · 2023-12-20T17:22:10Z

rllib/connectors/connector_v2.py

+            explore: Whether `explore` is currently on. Per convention, if True, the
+                RLModule's `forward_exploration` method should be called, if False, the
+                EnvRunner should call `forward_inference` instead.
+            persistent_data: Optional additional context data that needs to be exchanged


Call ir something else? Maybe shared_data?

Ok, will change.

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…runner_support_connectors_03_connectorv2_api

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…runner_support_connectors_03_connectorv2_api

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…rV2` API. (ray-project#41074) (ray-project#41212)

wip

57e79f9

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 requested review from avnishn, ArturNiederfahrenhorst, smorad, maxpumperla and kouroshHakha as code owners November 16, 2023 19:23

sven1977 assigned kouroshHakha Nov 16, 2023

sven1977 changed the title ~~[RLlib] Preparatory PR: Make EnvRunners use (enhanced) Connector API (#03: introduce ConnectorV2 API)~~ [RLlib] New ConnectorV2 API #03: Introduce actual ConnectorV2 API. (#41074) Nov 17, 2023

sven1977 added 16 commits November 17, 2023 11:47

wip

99d9019

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

d3dca2f

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray into env_…

009a7fd

…runner_support_connectors_03_connectorv2_api

Merge branch 'master' of https://github.com/ray-project/ray into env_…

b84b544

…runner_support_connectors_03_connectorv2_api

LINT

b0b3c37

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

4df7dfe

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

1de7ebb

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

a9acbee

Signed-off-by: sven1977 <svenmika1977@gmail.com>

LINT

5fe97e1

Signed-off-by: sven1977 <svenmika1977@gmail.com>

LINT

213f0d1

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray into env_…

50b7fc6

…runner_support_connectors_03_connectorv2_api

Merge branch 'master' of https://github.com/ray-project/ray into env_…

90e9c34

…runner_support_connectors_03_connectorv2_api

Merge branch 'master' into env_runner_support_connectors_03_connector…

91b4399

…v2_api Signed-off-by: Sven Mika <svenmika1977@gmail.com>

merge

3102238

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge remote-tracking branch 'origin/env_runner_support_connectors_03…

7618d52

…_connectorv2_api' into env_runner_support_connectors_03_connectorv2_api # Conflicts: # rllib/algorithms/algorithm_config.py

wip

4958597

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 commented Dec 19, 2023

View reviewed changes

sven1977 added 2 commits December 19, 2023 21:02

Merge branch 'master' of https://github.com/ray-project/ray into env_…

c40f5b0

…runner_support_connectors_03_connectorv2_api

wip

bdf803d

Signed-off-by: sven1977 <svenmika1977@gmail.com>

kouroshHakha approved these changes Dec 20, 2023

View reviewed changes

sven1977 added 6 commits December 21, 2023 09:00

wip

2649e70

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray into env_…

34f8827

…runner_support_connectors_03_connectorv2_api

wip

b58ad31

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

7bc0ac6

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray into env_…

8b8cf06

…runner_support_connectors_03_connectorv2_api

wip

f7dde73

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 merged commit bd555a0 into ray-project:master Dec 21, 2023
9 checks passed

sven1977 deleted the env_runner_support_connectors_03_connectorv2_api branch December 21, 2023 13:51

vickytsang pushed a commit to ROCm/ray that referenced this pull request Jan 12, 2024

[RLlib] New ConnectorV2 API ray-project#3: Introduce actual `Connecto…

ad4e256

…rV2` API. (ray-project#41074) (ray-project#41212)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] New ConnectorV2 API #03: Introduce actual `ConnectorV2` API. (#41074) #41212

[RLlib] New ConnectorV2 API #03: Introduce actual `ConnectorV2` API. (#41074) #41212

sven1977 commented Nov 16, 2023 •

edited

Loading

sven1977 Dec 19, 2023

sven1977 Dec 19, 2023

sven1977 Dec 19, 2023

kouroshHakha left a comment

kouroshHakha Dec 20, 2023

kouroshHakha Dec 20, 2023

sven1977 Dec 21, 2023

sven1977 Dec 21, 2023

kouroshHakha Dec 20, 2023

sven1977 Dec 21, 2023

sven1977 Dec 21, 2023

[RLlib] New ConnectorV2 API #03: Introduce actual ConnectorV2 API. (#41074) #41212

[RLlib] New ConnectorV2 API #03: Introduce actual ConnectorV2 API. (#41074) #41212

Conversation

sven1977 commented Nov 16, 2023 • edited Loading

Why are these changes needed?

Related issue number

Checks

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kouroshHakha left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

[RLlib] New ConnectorV2 API #03: Introduce actual `ConnectorV2` API. (#41074) #41212

[RLlib] New ConnectorV2 API #03: Introduce actual `ConnectorV2` API. (#41074) #41212

sven1977 commented Nov 16, 2023 •

edited

Loading