-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Add correct terminated and truncated batch sizes on zero-length episodes #46721
Conversation
Signed-off-by: Mark Stephenson <mark2000stephenson@gmail.com>
Signed-off-by: Mark Stephenson <mark2000stephenson@gmail.com>
Bumping this! |
@sven1977 Could you review when you're able to? |
Sorry to keep bugging you @sven1977 @simonsays1980 , but any way this could get merged in? |
@@ -100,6 +100,8 @@ def __call__( | |||
Columns.TERMINATEDS, | |||
items_to_add=( | |||
[False] * (len(sa_episode) - 1) + [sa_episode.is_terminated] | |||
if len(sa_episode) > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably remove any zero-length episodes in the loop here. They cannot be used for learning anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leave this to @sven1977 to decide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sven1977 could we give this a go? I think this is fine like this. The other option is to remove zero-length epiosdes entirely as they offer nothing to learn from. This could also happen if an agent in a multi-agent scenario received only the initial observation and nothing else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix @Mark2000 . LGTM.
A slightly more defensive fix would probably be to filter out empty episodes already in the connector pipeline, but maybe that would confuse some code that does draw information from those.
Our built-in EnvRunners do not return empty episodes ever, which is why this problem never came up.
auto-merge enabled ... |
…th episodes (ray-project#46721) Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
Why are these changes needed?
In
AddColumnsFromEpisodesToTrainBatch
, it is assumed that eachsa_episode
has length ≥ 1. When a zero-length episode is passed to the connector, the data added to the terminateds and truncateds columns are incorrectly sized; they should just be an empty list. This case generally doesn't come up, but when adding custom connectors that modify episode lengths (e.g. for semi-MDP type problems), zero-length episodes can be produced.Related issue number
Didn't open issue, but I can if necessary.
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.