[RLlib; Offline RL] - Enable reading old-stack `SampleBatch` data in new stack Offline RL. #47359

simonsays1980 · 2024-08-27T16:45:33Z

Why are these changes needed?

Right now the new Offline RL stack does not allow using old stack record data. Many users have costly recorded data from the old stack (i.e. in SampleBatch format). This PR proposes an option to read old stack SampleBatch recordings via the OfflinePreLearner. It does come in its actual form to some limitations, which might be removed in future PRs:

If there are multiple episodes inside of single SampleBatches recorded, the data wil be packed into a single SingleAgentEpisode.
If tain_batch_size_per_learner is defined, this argument defines the number of SampleBatches pulled from the offline data per training iteration and NOT the agent/env steps recorded. For example a train_batch_size_per_learner=2000 and recorded SampleBatches with 200 agent steps inside of each batch would result in an actual training batch of 2000 * 200 agent/env steps.

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

…n offline data recorded with the old stack. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

…-data

…' to new stack 'EpisodeType' (only 'SingleAgentEpisode' for now). This enables users to use their old recorded agent data for Offline RL. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

rllib/offline/offline_data.py

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

…an error in building the docs. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

…'SingleAgentEpisode' to BUILD file. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

…-data

rllib/offline/offline_prelearner.py

sven1977 · 2024-08-28T09:21:46Z

rllib/offline/offline_prelearner.py

@@ -344,3 +358,111 @@ def convert(sample, space):
            episodes.append(episode)
        # Note, `map_batches` expects a `Dict` as return value.
        return {"episodes": episodes}
+
+    def _map_batch_to_episode(


rename to: _map_sample_batch_to_episode

Also: dumb question, but is it necessary to return this dict here with the "episodes" key? We resolve this key anyways right away after calling this method, correct?

I agree, this makes it clear that this is not a batch from ray.data.

In regard to the return value: the idea was to keep this flexible, meaning that users can use the OfflinePreLearner in Pipelines where they need simply episodes. But you are right, it is simpler to override the OfflinePreLearner and then use this static method in its __call__ method.

Another use for it is to pass this static method into a map_batches when converting column data or SampleBatch data to episodes. I will try out, if this works this simple.

Yup, so this works:

"""Put this script into your `ray` root folder.""" import functools from pathlib import Path import ray from ray.rllib.offline.offline_prelearner import OfflinePreLearner, SCHEMA base_path = Path(__file__).parent / "rllib" sample_batch_data_path = base_path / "tests/data/cartpole/large.json" ds = ray.data.read_json(sample_batch_data_path.as_posix()) ds = ds.map_batches( functools.partial(OfflinePreLearner._map_batch_to_episode, False) ) batch = ds.take_batch(10) print(batch)

rllib/offline/tests/test_offline_data.py

rllib/tuned_examples/bc/cartpole_bc.py

sven1977 · 2024-08-28T09:25:47Z

rllib/offline/offline_prelearner.py

@@ -288,7 +298,7 @@ def convert(sample, space):
                            unpack_if_needed(batch[schema[Columns.NEXT_OBS]][i]),
                            observation_space,
                        )
-                        if Columns.NEXT_OBS in input_compress_columns
+                        if Columns.OBS in input_compress_columns


Just to be 100% safe, should we even allow NEXT_OBS to be present in input_compress_columns, then? If we ignore it anyways.

Great question. Right now we "assume" only that a user will never put Columns.NEXT_OBS into input_compress_columns AND and the same time NOT Columns.OBS, but this could of course be different.

My idea is also to ONLY ALLOW Columns constants in input_compress_columns b/c if different column names are used, the user has to nevertheless use a custom input_read_schema that maps Columns constants to her custom column names.

sven1977

Looks great. Thanks for this very important PR @simonsays1980 , making it much simpler for our old stack users to migrate.

Just a few nits and questions before we can merge.

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

…new stack Offline RL. (ray-project#47359) Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>

simonsays1980 added 3 commits August 24, 2024 11:53

Started to write converter from old 'SampleBatch' to new 'Episodes' i…

15b3ea7

…n offline data recorded with the old stack. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Merge branch 'master' into offline-rl-add-option-to-read-sample-batch…

20c23f9

…-data

Added option to 'OfflinePreLearner' to convert old stack 'SampleBatch…

8e88c46

…' to new stack 'EpisodeType' (only 'SingleAgentEpisode' for now). This enables users to use their old recorded agent data for Offline RL. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

simonsays1980 marked this pull request as ready for review August 27, 2024 16:45

simonsays1980 requested review from sven1977 and ArturNiederfahrenhorst as code owners August 27, 2024 16:45

simonsays1980 commented Aug 27, 2024

View reviewed changes

rllib/offline/offline_data.py Outdated Show resolved Hide resolved

simonsays1980 added 4 commits August 28, 2024 10:57

Removed some relicts from earlier tryouts in 'OfflineData'.

1eca84b

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

CHanged some comment and added a missing apostrophe that was raising …

abcd066

…an error in building the docs. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Added missing JSON file for testing conversion from 'SampleBatch' to …

54b4724

…'SingleAgentEpisode' to BUILD file. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Merge branch 'master' into offline-rl-add-option-to-read-sample-batch…

7afe903

…-data