-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib; Offline RL] - Enable reading old-stack SampleBatch
data in new stack Offline RL.
#47359
[RLlib; Offline RL] - Enable reading old-stack SampleBatch
data in new stack Offline RL.
#47359
Conversation
…n offline data recorded with the old stack. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
…' to new stack 'EpisodeType' (only 'SingleAgentEpisode' for now). This enables users to use their old recorded agent data for Offline RL. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
…an error in building the docs. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
…'SingleAgentEpisode' to BUILD file. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
rllib/offline/offline_prelearner.py
Outdated
@@ -344,3 +358,111 @@ def convert(sample, space): | |||
episodes.append(episode) | |||
# Note, `map_batches` expects a `Dict` as return value. | |||
return {"episodes": episodes} | |||
|
|||
def _map_batch_to_episode( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to: _map_sample_batch_to_episode
Also: dumb question, but is it necessary to return this dict here with the "episodes" key? We resolve this key anyways right away after calling this method, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, this makes it clear that this is not a batch from ray.data
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In regard to the return value: the idea was to keep this flexible, meaning that users can use the OfflinePreLearner
in Pipelines where they need simply episodes. But you are right, it is simpler to override the OfflinePreLearner
and then use this static method in its __call__
method.
Another use for it is to pass this static method into a map_batches
when converting column data or SampleBatch
data to episodes. I will try out, if this works this simple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, so this works:
"""Put this script into your `ray` root folder."""
import functools
from pathlib import Path
import ray
from ray.rllib.offline.offline_prelearner import OfflinePreLearner, SCHEMA
base_path = Path(__file__).parent / "rllib"
sample_batch_data_path = base_path / "tests/data/cartpole/large.json"
ds = ray.data.read_json(sample_batch_data_path.as_posix())
ds = ds.map_batches(
functools.partial(OfflinePreLearner._map_batch_to_episode, False)
)
batch = ds.take_batch(10)
print(batch)
@@ -288,7 +298,7 @@ def convert(sample, space): | |||
unpack_if_needed(batch[schema[Columns.NEXT_OBS]][i]), | |||
observation_space, | |||
) | |||
if Columns.NEXT_OBS in input_compress_columns | |||
if Columns.OBS in input_compress_columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be 100% safe, should we even allow NEXT_OBS to be present in input_compress_columns
, then? If we ignore it anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great question. Right now we "assume" only that a user will never put Columns.NEXT_OBS
into input_compress_columns
AND and the same time NOT Columns.OBS
, but this could of course be different.
My idea is also to ONLY ALLOW Columns
constants in input_compress_columns
b/c if different column names are used, the user has to nevertheless use a custom input_read_schema
that maps Columns
constants to her custom column names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Thanks for this very important PR @simonsays1980 , making it much simpler for our old stack users to migrate.
Just a few nits and questions before we can merge.
Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
…new stack Offline RL. (ray-project#47359) Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
…new stack Offline RL. (ray-project#47359) Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
…new stack Offline RL. (ray-project#47359) Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
…new stack Offline RL. (ray-project#47359) Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
…new stack Offline RL. (ray-project#47359) Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
Why are these changes needed?
Right now the new Offline RL stack does not allow using old stack record data. Many users have costly recorded data from the old stack (i.e. in
SampleBatch
format). This PR proposes an option to read old stackSampleBatch
recordings via theOfflinePreLearner
. It does come in its actual form to some limitations, which might be removed in future PRs:SampleBatch
es recorded, the data wil be packed into a singleSingleAgentEpisode
.tain_batch_size_per_learner
is defined, this argument defines the number ofSampleBatch
es pulled from the offline data per training iteration and NOT the agent/env steps recorded. For example atrain_batch_size_per_learner=2000
and recordedSampleBatch
es with200
agent steps inside of each batch would result in an actual training batch of2000 * 200
agent/env steps.Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.