[RLlib; Offline RL] - Enable buffering episodes. #47501

simonsays1980 · 2024-09-05T15:28:53Z

Why are these changes needed?

Sampling exactly train_batch_size_per_learner when using offline data with old API stack SampleBatch or new stack Episode records is not possible, yet, because each sample batch or episode could contain more than a single timestep. This PR proposes a way to enable sampling with exactly the requested batch size using replay buffers.

The user can define the replay buffer class to use and its kwargs. The OfflinePreLearner keeps a replay buffer that buffers episodes and samples from this buffer. The replay buffer serves multiple functions:

It buffers multiple episodes from SampleBatch or Episode data batches and ensures that the requested batch size is sampled.
It concatenates different chunks from same episodes and thereby enables the next two points.
It enables n_step sampling, if needed.
It later will enable us to sample data for stateful modules.

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

…'SampleBatch ' data and with stateful modules. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

…thod into algorithms. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

…sode-buffer

…into offline-rl-enable-buffering-episodes

…AlgorithmConfig'. Furthermore added tests for 'OfflinePreLearner' and moved tests from 'OfflineData' over. Added further tests top 'test_offline_data.py'. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

…sode data. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

sven1977 · 2024-09-06T12:31:11Z

rllib/algorithms/algorithm_config.py

@@ -844,6 +846,8 @@ def validate(self) -> None:
        self._validate_input_settings()
        # Check evaluation specific settings.
        self._validate_evaluation_settings()
+        # Check offline specific settings (new API stack).


sven1977 · 2024-09-06T12:31:55Z

rllib/algorithms/algorithm_config.py

            prelearner_module_synch_period: The period (number of batches converted)
                after which the `RLModule` held by the `PreLearner` should sync weights.
                The `PreLearner` is used to preprocess batches for the learners. The
                higher this value the more off-policy the `PreLearner`'s module will be.
                Values too small will force the `PreLearner` to sync more frequently
                and thus might slow down the data pipeline. The default value chosen
                by the `OfflinePreLearner` is 10.
-            dataset_num_iters_per_learner: Number of iterations to run in each learner
+            dataset_num_iters_per_learner: Number of updates to run in each learner


I'll leave this up to you to decide: Would dataset_num_batches_per_learner be more accurate? Or would it add more confusion?

Or: dataset_num_batches_per_learner_update 🤔 maybe too long ...

I admit, dataset_num_iters_per_learner is not straight to the point here. dataset_num_updates_per_learner is not better in my opinion. Even though dataset_num_batches_per_learner describes better that these many different! batches are pulled per learner, but does not point out that it is actually deeply related to an DataIterator that iterates these many times. I am not sure, yet. As long as I am not sure, I will leave it as is ;)

sven1977 · 2024-09-06T12:32:24Z

rllib/algorithms/dqn/dqn.py

@@ -719,6 +719,7 @@ def _training_step_new_api_stack(self, *, with_noise_reset) -> ResultDict:
                        n_step=self.config.n_step,
                        gamma=self.config.gamma,
                        beta=self.config.replay_buffer_config.get("beta"),
+                        sample_episodes=True,


Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

sven1977 · 2024-09-06T13:05:25Z

rllib/offline/tests/test_offline_prelearner.py

@@ -0,0 +1,232 @@
+import functools


Wow, thanks for adding all these tests.

sven1977

Awesome PR @simonsays1980 !
Thanks for adding all these tests as well. Offline RL getting stronger by the day.

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>

simonsays1980 added 6 commits September 4, 2024 16:40

Added an episode buffer to OfflinePreLearner to handle sampling from …

af248d9

…'SampleBatch ' data and with stateful modules. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Added episode sampling to 'EpisodeReplayBuffer' and integrated new me…

8b69f9f

…thod into algorithms. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Merge branch 'master' into replay-buffers-add-episode-sampling-to-epi…

e97c8a7

…sode-buffer

Merge branch 'replay-buffers-add-episode-sampling-to-episode-buffer' …

569b9c0

…into offline-rl-enable-buffering-episodes

Added buffers to 'OfflinePreLearner' and configuration arguments to '…

7073ead

…AlgorithmConfig'. Furthermore added tests for 'OfflinePreLearner' and moved tests from 'OfflineData' over. Added further tests top 'test_offline_data.py'. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Added replay buffer also to the case when we sample directly from epi…

5c0631b

…sode data. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

simonsays1980 marked this pull request as ready for review September 5, 2024 15:35

simonsays1980 requested review from sven1977 and ArturNiederfahrenhorst as code owners September 5, 2024 15:35

simonsays1980 added 6 commits September 5, 2024 17:59

added tests to BUILD file.

edd1fd2

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

LINTER.

9cff765

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Merge branch 'master' into offline-rl-enable-buffering-episodes

ef4e24a

Converted all 'assertEquals' to 'assertEqual'.

1af33de

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Added to the docstrings.

2e07545

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Added missing end-string.

f2bc734

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

simonsays1980 assigned sven1977 Sep 6, 2024

simonsays1980 added rllib RLlib related issues rllib-offline-rl Offline RL problems labels Sep 6, 2024

sven1977 reviewed Sep 6, 2024

View reviewed changes

Changed an escape sequence that was wrong to a raw string.

a28fd35

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

sven1977 reviewed Sep 6, 2024

View reviewed changes

sven1977 approved these changes Sep 6, 2024

View reviewed changes

simonsays1980 added 4 commits September 6, 2024 16:39

Fixed a test for 'OfflineData'.

3278e37

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Fixed some problems with the docstring and rewrote some of the text.

c0e5d27

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Merge branch 'master' into offline-rl-enable-buffering-episodes

a289cfa

Merge branch 'master' into offline-rl-enable-buffering-episodes

5de8864

sven1977 enabled auto-merge (squash) September 9, 2024 16:01

github-actions bot added the go add ONLY when ready to merge, run all tests label Sep 9, 2024

ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024

[RLlib; Offline RL] Enable buffering episodes. (ray-project#47501)

aea9ebe

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>

ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024

[RLlib; Offline RL] Enable buffering episodes. (ray-project#47501)

f13ba5d

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>

ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024

[RLlib; Offline RL] Enable buffering episodes. (ray-project#47501)

4248c40

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>

ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024

[RLlib; Offline RL] Enable buffering episodes. (ray-project#47501)

83eaec0

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>

ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024

[RLlib; Offline RL] Enable buffering episodes. (ray-project#47501)

b09fea6

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>

ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024

[RLlib; Offline RL] Enable buffering episodes. (ray-project#47501)

a37f80e

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>

ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024

[RLlib; Offline RL] Enable buffering episodes. (ray-project#47501)

d174125

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>

ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024

[RLlib; Offline RL] Enable buffering episodes. (ray-project#47501)

1be8209

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>

ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024

[RLlib; Offline RL] Enable buffering episodes. (ray-project#47501)

f0e2b6b

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>

ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024

[RLlib; Offline RL] Enable buffering episodes. (ray-project#47501)

5e31ddd

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>

ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024

[RLlib; Offline RL] Enable buffering episodes. (ray-project#47501)

dfa3ab2

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>

ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024

[RLlib; Offline RL] Enable buffering episodes. (ray-project#47501)

0a42aa9

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>

ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024

[RLlib; Offline RL] Enable buffering episodes. (ray-project#47501)

e6c6bbf

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>

ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024

[RLlib; Offline RL] Enable buffering episodes. (ray-project#47501)

c496af2

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>

ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024

[RLlib; Offline RL] Enable buffering episodes. (ray-project#47501)

4f63a05

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>

ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024

[RLlib; Offline RL] Enable buffering episodes. (ray-project#47501)

d738010

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>

ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024

[RLlib; Offline RL] Enable buffering episodes. (ray-project#47501)

6c165c2

Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib; Offline RL] - Enable buffering episodes. #47501

[RLlib; Offline RL] - Enable buffering episodes. #47501

simonsays1980 commented Sep 5, 2024 •

edited

Loading

sven1977 Sep 6, 2024

sven1977 Sep 6, 2024

sven1977 Sep 6, 2024

simonsays1980 Sep 6, 2024

sven1977 Sep 6, 2024

sven1977 Sep 6, 2024

sven1977 left a comment

[RLlib; Offline RL] - Enable buffering episodes. #47501

[RLlib; Offline RL] - Enable buffering episodes. #47501

Conversation

simonsays1980 commented Sep 5, 2024 • edited Loading

Why are these changes needed?

Related issue number

Checks

sven1977 Sep 6, 2024

Choose a reason for hiding this comment

sven1977 Sep 6, 2024

Choose a reason for hiding this comment

sven1977 Sep 6, 2024

Choose a reason for hiding this comment

simonsays1980 Sep 6, 2024

Choose a reason for hiding this comment

sven1977 Sep 6, 2024

Choose a reason for hiding this comment

sven1977 Sep 6, 2024

Choose a reason for hiding this comment

sven1977 left a comment

Choose a reason for hiding this comment

simonsays1980 commented Sep 5, 2024 •

edited

Loading