Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib; Offline RL] - Enable reading old-stack SampleBatch data in new stack Offline RL. #47359

Conversation

simonsays1980
Copy link
Collaborator

@simonsays1980 simonsays1980 commented Aug 27, 2024

Why are these changes needed?

Right now the new Offline RL stack does not allow using old stack record data. Many users have costly recorded data from the old stack (i.e. in SampleBatch format). This PR proposes an option to read old stack SampleBatch recordings via the OfflinePreLearner. It does come in its actual form to some limitations, which might be removed in future PRs:

  • If there are multiple episodes inside of single SampleBatches recorded, the data wil be packed into a single SingleAgentEpisode.
  • If tain_batch_size_per_learner is defined, this argument defines the number of SampleBatches pulled from the offline data per training iteration and NOT the agent/env steps recorded. For example a train_batch_size_per_learner=2000 and recorded SampleBatches with 200 agent steps inside of each batch would result in an actual training batch of 2000 * 200 agent/env steps.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

…n offline data recorded with the old stack.

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
…' to new stack 'EpisodeType' (only 'SingleAgentEpisode' for now). This enables users to use their old recorded agent data for Offline RL.

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
@simonsays1980 simonsays1980 marked this pull request as ready for review August 27, 2024 16:45
Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
…an error in building the docs.

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
…'SingleAgentEpisode' to BUILD file.

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
@@ -344,3 +358,111 @@ def convert(sample, space):
episodes.append(episode)
# Note, `map_batches` expects a `Dict` as return value.
return {"episodes": episodes}

def _map_batch_to_episode(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to: _map_sample_batch_to_episode

Also: dumb question, but is it necessary to return this dict here with the "episodes" key? We resolve this key anyways right away after calling this method, correct?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this makes it clear that this is not a batch from ray.data.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In regard to the return value: the idea was to keep this flexible, meaning that users can use the OfflinePreLearner in Pipelines where they need simply episodes. But you are right, it is simpler to override the OfflinePreLearner and then use this static method in its __call__ method.

Another use for it is to pass this static method into a map_batches when converting column data or SampleBatch data to episodes. I will try out, if this works this simple.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, so this works:

"""Put this script into your `ray` root folder."""
import functools
from pathlib import Path

import ray
from ray.rllib.offline.offline_prelearner import OfflinePreLearner, SCHEMA

base_path = Path(__file__).parent / "rllib"
sample_batch_data_path = base_path / "tests/data/cartpole/large.json"

ds = ray.data.read_json(sample_batch_data_path.as_posix())

ds = ds.map_batches(
    functools.partial(OfflinePreLearner._map_batch_to_episode, False) 
)

batch = ds.take_batch(10)

print(batch)

@@ -288,7 +298,7 @@ def convert(sample, space):
unpack_if_needed(batch[schema[Columns.NEXT_OBS]][i]),
observation_space,
)
if Columns.NEXT_OBS in input_compress_columns
if Columns.OBS in input_compress_columns
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be 100% safe, should we even allow NEXT_OBS to be present in input_compress_columns, then? If we ignore it anyways.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question. Right now we "assume" only that a user will never put Columns.NEXT_OBS into input_compress_columns AND and the same time NOT Columns.OBS, but this could of course be different.

My idea is also to ONLY ALLOW Columns constants in input_compress_columns b/c if different column names are used, the user has to nevertheless use a custom input_read_schema that maps Columns constants to her custom column names.

Copy link
Contributor

@sven1977 sven1977 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Thanks for this very important PR @simonsays1980 , making it much simpler for our old stack users to migrate.

Just a few nits and questions before we can merge.

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
@sven1977 sven1977 enabled auto-merge (squash) August 29, 2024 11:19
@github-actions github-actions bot added the go add ONLY when ready to merge, run all tests label Aug 29, 2024
@sven1977 sven1977 merged commit ccd3ba5 into ray-project:master Aug 29, 2024
7 checks passed
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 12, 2024
…new stack Offline RL. (ray-project#47359)

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
…new stack Offline RL. (ray-project#47359)

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
…new stack Offline RL. (ray-project#47359)

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
…new stack Offline RL. (ray-project#47359)

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
…new stack Offline RL. (ray-project#47359)

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
@simonsays1980 simonsays1980 deleted the offline-rl-add-option-to-read-sample-batch-data branch November 22, 2024 10:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants