[RLlib] Add "shuffle batch per epoch" option. #47458

sven1977 · 2024-09-03T07:56:03Z

Add "shuffle batch per epoch" option.

For PPO and any other algo using minibatching AND more than 1 epochs AND per train batch.

Why are these changes needed?

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: sven1977 <svenmika1977@gmail.com>

simonsays1980

LGTM. For the future we might need to inspect critically if we can reuse code we have written elsewhere and if we can replace a lot of iterating by ray.data dataset iteration. This reduces code further and places the logic where it belongs, i.e. iterating through data to ray.data.

simonsays1980 · 2024-09-03T14:45:00Z

rllib/algorithms/algorithm_config.py

@@ -2103,6 +2113,15 @@ def training(
                stack, this setting should no longer be used. Instead, use
                `train_batch_size_per_learner` (in combination with
                `num_learners`).
+            num_epochs: The number of complete passes over the entire train batch (per


Awesome! For Offline RL we might want to add here that an epoch might loop over the entire dataset?

simonsays1980 · 2024-09-03T14:47:05Z

rllib/algorithms/appo/appo.py

@@ -185,7 +188,7 @@ def training(
            target_network_update_freq: The frequency to update the target policy and
                tune the kl loss coefficients that are used during training. After
                setting this parameter, the algorithm waits for at least
-                `target_network_update_freq * minibatch_size * num_sgd_iter` number of
+                `target_network_update_freq * minibatch_size * num_epochs` number of


I might not completely understand this, but isn't minibatch_size the size of a minibatch and not necessarily the number of minibatches per epoch?

Ah, maybe this means the number of samples that have been trained on until we update the target networks, correct?

Oh wait, great catch. I think this comment here is incorrect.
When we update e.g. PPO with a batch of 4000, the num_env_steps_trained_lifetime counter only(!) gets increased by that 4000, and NOT by: num_epochs * 4000. So for APPO here, this is also wrong. Will fix the comment and clarify.

simonsays1980 · 2024-09-03T14:52:10Z

rllib/algorithms/impala/impala.py

@@ -734,6 +705,9 @@ def training_step(self) -> ResultDict:
                                NUM_ENV_STEPS_SAMPLED_LIFETIME, default=0
                            ),
                        },
+                        num_epochs=self.config.num_epochs,
+                        minibatch_size=self.config.minibatch_size,
+                        shuffle_batch_per_epoch=self.config.shuffle_batch_per_epoch,
                    )
                else:
                    learner_results = self.learner_group.update_from_episodes(


I wonder: isn't it possible to just turn over a ray.data.DataIterator ti the learner via update_from_iterator and then iterate over the train batch (as a materialized dataset) in minibatch_size batches?

We could run all of this (in the new stack) through the PreLearner to prefetch and make the learner connector run.

simonsays1980 · 2024-09-03T14:59:55Z

rllib/algorithms/marwil/marwil.py

@@ -398,7 +398,7 @@ class (multi-/single-learner setup) and evaluation on
            learner_results = self.learner_group.update_from_batch(
                batch,
                minibatch_size=self.config.train_batch_size_per_learner,
-                num_iters=self.config.dataset_num_iters_per_learner,
+                num_epochs=self.config.dataset_num_iters_per_learner,


Here for example this is a bit confusing: the dataset_num_iters_per_learner is here irrelevant because we pass over a batch that is trained on as a whole if this is a single learner. In the multi learner setup we pass an iterator and dataset_num_iters_per_learner defines many batches should be pulled from it in a single RLlib training iteration (this is set to None by default which would mean it runs over the entire dataset once - so only a single epoch - during a single RLlib training iteration).

I know this is somehow still messy, but due to the different entries of the learner API not really aligned with offline RL.

Great catch. Clarified the arg names and added this protection to Learner.update_from_iterator for now:

if "num_epochs" in kwargs: raise ValueError( "`num_epochs` arg NOT supported by Learner.update_from_iterator! Use " "`num_iters` instead." )

such that it cannot be confused with num_epochs passed in by accident.

simonsays1980 · 2024-09-03T15:03:53Z

rllib/core/learner/learner.py

-                    MiniBatchCyclicIterator,
-                    uses_new_env_runners=True,
-                    num_total_mini_batches=num_total_mini_batches,
+                    MiniBatchCyclicIterator, _uses_new_env_runners=True


Do we need this here still? I thought we deprecate the hybrid stack?

Officially, not yet. PR is still pending ...

simonsays1980 · 2024-09-03T15:14:38Z

rllib/utils/minibatch_utils.py

-        self._mini_batch_count = 0
-        self._num_total_mini_batches = num_total_mini_batches
+        self._minibatch_count = 0
+        self._num_total_minibatches = num_total_minibatches

    def __iter__(self):


In the long run we might want to override DataIterator from ray.data to build batches from MultiAgentEpisodes. Less code.

simonsays1980 · 2024-09-03T15:17:12Z

rllib/utils/minibatch_utils.py

@@ -140,6 +159,11 @@ def get_len(b):
                    n_steps -= len_sample
                    s = 0
                    self._num_covered_epochs[module_id] += 1


This is actually the same logic like independent sampling mode in our MultiAgentEpisodeBuffers. For the future we should reduce code again.

Ah, great catch! We'll get rid of this iterator anyways b/c it relies on MultiAgentBatch. We should rewrite it to rely only on a plain dict.

simonsays1980 · 2024-09-03T15:17:35Z

rllib/utils/tests/test_minibatch_utils.py

-    {"mini_batch_size": 128, "num_sgd_iter": 10, "agent_steps": (56, 55)},
-    {"mini_batch_size": 128, "num_sgd_iter": 10, "agent_steps": (400, 400)},
-    {"mini_batch_size": 128, "num_sgd_iter": 10, "agent_steps": (64, 64)},
+    {"minibatch_size": 256, "num_epochs": 30, "agent_steps": (1652, 1463)},


…shuffle_batch_option_to_cyclic_iterator

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…shuffle_batch_option_to_cyclic_iterator

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…shuffle_batch_option_to_cyclic_iterator

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 added 4 commits August 30, 2024 21:15

wip

596a4d8

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

06ec0d1

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

38f0d99

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

ea8075f

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 requested review from ArturNiederfahrenhorst and simonsays1980 as code owners September 3, 2024 07:56

sven1977 added 5 commits September 3, 2024 10:09

merge

585095d

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

4e1e42e

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

61c3f20

Signed-off-by: sven1977 <svenmika1977@gmail.com>

fix

a20f44c

Signed-off-by: sven1977 <svenmika1977@gmail.com>

fix

b966d99

Signed-off-by: sven1977 <svenmika1977@gmail.com>

simonsays1980 approved these changes Sep 3, 2024

View reviewed changes

sven1977 added 4 commits September 4, 2024 10:47

Merge branch 'master' of https://github.com/ray-project/ray into add_…

be6c2e5

…shuffle_batch_option_to_cyclic_iterator

fix

42535d4

Signed-off-by: sven1977 <svenmika1977@gmail.com>

fix

292c71f

Signed-off-by: sven1977 <svenmika1977@gmail.com>

fix

1f748f1

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 enabled auto-merge (squash) September 4, 2024 10:51

github-actions bot disabled auto-merge September 4, 2024 10:51

github-actions bot added the go add ONLY when ready to merge, run all tests label Sep 4, 2024

fix

c13647a

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 enabled auto-merge (squash) September 4, 2024 11:19

fixes

cd38695

Signed-off-by: sven1977 <svenmika1977@gmail.com>

github-actions bot disabled auto-merge September 4, 2024 12:26

sven1977 added 2 commits September 4, 2024 14:35

fix

4f36d7a

Signed-off-by: sven1977 <svenmika1977@gmail.com>

fix

927ba3d

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 enabled auto-merge (squash) September 4, 2024 14:18

sven1977 added 4 commits September 5, 2024 08:28

Merge branch 'master' of https://github.com/ray-project/ray into add_…

bc552f0

…shuffle_batch_option_to_cyclic_iterator

Merge branch 'master' of https://github.com/ray-project/ray into add_…

f56c255

…shuffle_batch_option_to_cyclic_iterator

Merge branch 'master' of https://github.com/ray-project/ray into add_…

24cc69e

…shuffle_batch_option_to_cyclic_iterator

APPO stateless cartpole not learning

3264f9c

Signed-off-by: sven1977 <svenmika1977@gmail.com>

github-actions bot disabled auto-merge September 5, 2024 09:46

sven1977 added 2 commits September 5, 2024 13:20

wip

804bfc2

Signed-off-by: sven1977 <svenmika1977@gmail.com>

more ts for pendulum PPO

a79630a

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 mentioned this pull request Sep 16, 2024

CI test linux://rllib:learning_tests_pendulum_ppo is flaky #47434

Closed

sven1977 added 2 commits September 17, 2024 10:48

Merge branch 'master' of https://github.com/ray-project/ray into add_…

74b7a58

…shuffle_batch_option_to_cyclic_iterator

better PPO Pendulum tuned examples.

7bdab98

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 added rllib RLlib related issues rllib-newstack labels Sep 17, 2024

sven1977 enabled auto-merge (squash) September 17, 2024 10:13

fix

c26ae5d

Signed-off-by: sven1977 <svenmika1977@gmail.com>

github-actions bot disabled auto-merge September 17, 2024 10:45

sven1977 enabled auto-merge (squash) September 17, 2024 11:17

sven1977 merged commit ed5b382 into ray-project:master Sep 17, 2024
6 checks passed

sven1977 deleted the add_shuffle_batch_option_to_cyclic_iterator branch September 18, 2024 07:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Add "shuffle batch per epoch" option. #47458

[RLlib] Add "shuffle batch per epoch" option. #47458

sven1977 commented Sep 3, 2024 •

edited

Loading

simonsays1980 left a comment

simonsays1980 Sep 3, 2024

sven1977 Sep 4, 2024

simonsays1980 Sep 3, 2024

simonsays1980 Sep 3, 2024

sven1977 Sep 4, 2024

sven1977 Sep 4, 2024

simonsays1980 Sep 3, 2024

simonsays1980 Sep 3, 2024

simonsays1980 Sep 3, 2024

sven1977 Sep 4, 2024

simonsays1980 Sep 3, 2024

sven1977 Sep 4, 2024

simonsays1980 Sep 3, 2024

simonsays1980 Sep 3, 2024

sven1977 Sep 4, 2024

simonsays1980 Sep 3, 2024

[RLlib] Add "shuffle batch per epoch" option. #47458

[RLlib] Add "shuffle batch per epoch" option. #47458

Conversation

sven1977 commented Sep 3, 2024 • edited Loading

Why are these changes needed?

Related issue number

Checks

simonsays1980 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sven1977 commented Sep 3, 2024 •

edited

Loading