-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib; Off-policy] Add episode sampling to EpisodeReplayBuffer
.
#47500
[RLlib; Off-policy] Add episode sampling to EpisodeReplayBuffer
.
#47500
Conversation
…thod into algorithms. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
Awesome unification PR! This opens the door to a DQfD style mixing of offline and off-policy RL! Dumb question: Would this already work even if the PrioritizedEpisodeReplayBuffer doesn't support this flag yet? |
Thanks @sven1977! I guess that mixing would be possible even before? Wouldn't it? We just use DQN and add a dataset, then sample from the buffer and let the The |
EpisodeReplayBuffer
.EpisodeReplayBuffer
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now. Thanks a ton @simonsays1980 !!
…ay-project#47500) Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
…ay-project#47500) Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
…ay-project#47500) Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
…ay-project#47500) Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
…ay-project#47500) Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
…ay-project#47500) Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
…ay-project#47500) Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
…ay-project#47500) Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
…ay-project#47500) Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
Why are these changes needed?
The
EpisodeReplayBuffer
is still sampling batches, i.e. dicts. This PR proposes a way to add episode sampling to the buffer such that the buffer can be used in the same form as the other episode buffers. This is also a preliminary for using buffers in theOfflinePreLearner
.Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.