[RLlib; Offline RL] RLUnplugged example on new API stack. #46792

simonsays1980 · 2024-07-25T12:01:58Z

Why are these changes needed?

The new Offline RL API uses native ray.data.Datasets to scale to massive datasets with different formats and encodings. To ensure that learning indeed scales to massive data and the API can be used with the most notable Offline RL datasets this PR applies the new Offline RL API to the RLUnplugged data. In the tuned example data from Atari Pong is used with a single TfRecords file from the GCS bucket.

The data is downloaded automatically into a temporary folder. To decode the PNG images this example defines a ConnectorV2 that decodes the images in the learner connector pipeline and stacks frames.

This example show specifically how to scale out on massive data and use the AlgorithmConfig.offline_data's arguments to fine-tune scaling with ray.data. See for more infos ray.data.

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

…class into separate files. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

…gorithmConfig' and 'OfflineData' to make the data pipeline better configurable and tuneable. Tested single-and multi-learner sertups with BC. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Co-authored-by: Sven Mika <sven@anyscale.io> Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

…n class. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

…of Atari Pong. Needs to be tuned. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

…defined. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

…-new-stack

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

sven1977 · 2024-07-31T12:30:48Z

rllib/tuned_examples/bc/benchmark_atari_pong_bc.py

+            **kwargs,
+        )
+
+        self._multi_agent = multi_agent


Do we need these options?

No, we don't. It was not clear to me, yet, that this does not need to be set when in single-agent mode.

sven1977 · 2024-07-31T12:31:44Z

rllib/tuned_examples/bc/benchmark_atari_pong_bc.py

+
+# Make the learner connector.
+def _make_learner_connector(observation_space, action_space):
+    return DecodeObservations()


Let's get rid of this by doing in the config below:

config.training( ... learner_connector: lambda obs_space, act_space: DecodeObservations(), ... )

I like this more! Thanks!

sven1977 · 2024-07-31T12:32:24Z

rllib/tuned_examples/bc/benchmark_atari_pong_bc.py

+    )
+)
+
+# TODO (simon): Change to use the `run_rllib_example` function as soon as tuned.


nit: TODO? Or still WIP?

WIP :) It is still not running smoothly and for debugging I am not using tune.

sven1977 · 2024-07-31T12:35:33Z

rllib/tuned_examples/bc/benchmark_atari_pong_bc.py

+
+# Make the temporary directory for the downloaded data.
+tmp_path = "/tmp/atari"
+Path(tmp_path).joinpath(game).mkdir(exist_ok=True, parents=True)


super nit: should we use the "/" op of pathlib.Path here?

tmp_path = Path("/tmp/atari") / game tmp_path.mkdir(...) destination_file_name = tmp_path / "run_...."

Yeah, let's do it this is better to read.

sven1977 · 2024-07-31T12:35:56Z

rllib/tuned_examples/bc/benchmark_atari_pong_bc.py

+# can be chosen by users. To use all data use a list of file paths (see
+# `num_shards`) and its usage further below.
+run_number = 1
+# num_shards = 1


Should the be commented out? # num_shards = 1

This is actually a number that is now hard coded into the path. We have many runs and for each multiple shards in the bucket - each is a file. For the example I use only a single of these files.

sven1977 · 2024-07-31T12:37:42Z

rllib/tuned_examples/bc/benchmark_atari_pong_bc.py

+def _env_creator(cfg):
+    return wrap_atari_for_new_api_stack(
+        gym.make("ALE/Pong-v5", **cfg),
+        # Perform frame-stacking through ConnectorV2 API.


Wait, if we say: framestack=4 here then we do NOT perform frame-stacking through ConnectorV2 here.

-> Change comment by removing the ConnectorV2 statement.

Note: At this point, the gain in performance when using the connector is minimal, so can probably be neglected for simplicity (easier for user to do framestacking in env-wrapper).

Thanks for the hint. WIll remove this comment.

sven1977 · 2024-07-31T12:38:22Z

rllib/tuned_examples/bc/benchmark_atari_pong_bc.py

+
+
+class DecodeObservations(ConnectorV2):
+    def __init__(


Add a (small) docstring here that explains what this connector does.

sven1977 · 2024-07-31T12:39:09Z

rllib/tuned_examples/bc/benchmark_atari_pong_bc.py

@@ -0,0 +1,257 @@
+"""
+schema={


Is this needed here? If this is an explanation of the schema, maybe move this down into the config section (where we explain the schema translation)?

sven1977 · 2024-07-31T12:39:22Z

rllib/algorithms/algorithm_config.py

@@ -2499,6 +2499,10 @@ def offline_data(
            self.input_read_method_kwargs = input_read_method_kwargs
        if input_read_schema is not NotProvided:
            self.input_read_schema = input_read_schema
+        if map_batches_kwargs is not NotProvided:


Ah, yes, this was missing :)

sven1977

Looks great! Just some nits, mostly enhancing/adding comments and explanations.

sven1977 · 2024-07-31T12:40:32Z

rllib/tuned_examples/bc/benchmark_atari_pong_bc.py

@@ -0,0 +1,257 @@
+"""


Should we call this file simply: pong_bc.py?

I think we should add a hint that it is pong data from rl_unplugged. What do you think?

sven1977

Looks good to me now! Thanks @simonsays1980

simonsays1980 and others added 9 commits July 24, 2024 16:25

Added annotations and seprated 'OfflineData' and 'OfflinePreLearner' …

4d40a17

…class into separate files. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Added 'map_batches_kwargs' and 'iter_batches_kwargs' arguments to 'Al…

147a512

…gorithmConfig' and 'OfflineData' to make the data pipeline better configurable and tuneable. Tested single-and multi-learner sertups with BC. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Update rllib/algorithms/algorithm_config.py

dfbff75

Co-authored-by: Sven Mika <sven@anyscale.io> Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Added comments and docstring as requested in @sven1977's review.

d8738da

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Merge branch 'master' into add-user-defined-pre-learner

27145d9

Fixed an import bug due to separating 'OfflinePreLearner' into its ow…

211668a

…n class. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Added example for using new stack offline RL API on RLUnplugged data …

521de61

…of Atari Pong. Needs to be tuned. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Added a safeguard for the case 'train_batch_size_per_learner' is not …

119c475

…defined. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Merge branch 'add-user-defined-pre-learner' into rl-unplugged-example…

463773e

…-new-stack

sven1977 marked this pull request as ready for review July 25, 2024 17:55

sven1977 requested review from sven1977 and ArturNiederfahrenhorst as code owners July 25, 2024 17:55

sven1977 changed the title ~~[RLlib - Offline RL] - RLUnplugged example on new API stack.~~ [RLlib; Offline RL] RLUnplugged example on new API stack. Jul 25, 2024

simonsays1980 added 4 commits July 26, 2024 12:36

Merge branch 'master' into rl-unplugged-example-new-stack

11a3594

Added missing assignments for 'map_batches' and 'iter_batches'.

e353ce8

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Fixed small bug in blob name.

9d2c168

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Added learner attributes to config.

865177c

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

sven1977 reviewed Jul 31, 2024

View reviewed changes

sven1977 enabled auto-merge (squash) July 31, 2024 12:39

github-actions bot added the go add ONLY when ready to merge, run all tests label Jul 31, 2024

sven1977 reviewed Jul 31, 2024

View reviewed changes

sven1977 approved these changes Aug 9, 2024

View reviewed changes

sven1977 merged commit f34600d into ray-project:master Aug 9, 2024
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib; Offline RL] RLUnplugged example on new API stack. #46792

[RLlib; Offline RL] RLUnplugged example on new API stack. #46792

simonsays1980 commented Jul 25, 2024 •

edited

Loading

sven1977 Jul 31, 2024

simonsays1980 Jul 31, 2024

sven1977 Jul 31, 2024

simonsays1980 Jul 31, 2024

sven1977 Jul 31, 2024

simonsays1980 Jul 31, 2024

sven1977 Jul 31, 2024

simonsays1980 Jul 31, 2024

sven1977 Jul 31, 2024

simonsays1980 Jul 31, 2024

sven1977 Jul 31, 2024

simonsays1980 Jul 31, 2024

sven1977 Jul 31, 2024

sven1977 Jul 31, 2024

sven1977 Jul 31, 2024

sven1977 left a comment

sven1977 Jul 31, 2024

simonsays1980 Jul 31, 2024

sven1977 left a comment

		@@ -0,0 +1,257 @@
		"""
		schema={

[RLlib; Offline RL] RLUnplugged example on new API stack. #46792

[RLlib; Offline RL] RLUnplugged example on new API stack. #46792

Conversation

simonsays1980 commented Jul 25, 2024 • edited Loading

Why are these changes needed?

Related issue number

Checks

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sven1977 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sven1977 left a comment

Choose a reason for hiding this comment

simonsays1980 commented Jul 25, 2024 •

edited

Loading