-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Upgrade to gymnasium 1.0.0 (ale_py 0.10.1, mujoco 3.2.4, pettingzoo 1.24.3 supersuit 3.9.3). #45328
[RLlib] Upgrade to gymnasium 1.0.0 (ale_py 0.10.1, mujoco 3.2.4, pettingzoo 1.24.3 supersuit 3.9.3). #45328
Conversation
@@ -3,7 +3,6 @@ | |||
# Environment adapters. | |||
# --------------------- | |||
# Atari | |||
gymnasium==0.28.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since gymnasium is already part of the main Ray requirements.txt
file, we won't need this here anymore.
cc: @pseudo-rnd-thoughts @jkterry1 |
…ade_gymnasium_to_1_0_0a1
Signed-off-by: Sven Mika <sven@anyscale.io>
rllib/env/single_agent_env_runner.py
Outdated
@@ -249,6 +249,8 @@ def _sample_timesteps( | |||
observation=obs[env_index], | |||
infos=infos[env_index], | |||
) | |||
self._was_terminated = [False for _ in range(self.num_envs)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is completely new auto-reset logic of gymnasium 1.0. The sub-env only gets reset'd upon the next(!) step call (with a fake reward of 0.0 and term/trunc=guaranteed False; and the obs/infos being the reset-obs/infos).
This is actually good for us as we should always do the env-to-module connector pass (even after the last timestep with the terminal obs in the Episodes list) to make sure the user - in case they are writing to the episode - gets a chance to also alter the final obs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
rllib/env/single_agent_env_runner.py
Outdated
@@ -88,7 +88,7 @@ def __init__(self, config: AlgorithmConfig, **kwargs): | |||
# actually hold the spaces for a single env, but for boxes the | |||
# shape is (1, 1) which brings a problem with the action dists. | |||
# shape=(1,) is expected. | |||
module_spec.action_space = self.env.envs[0].action_space | |||
module_spec.action_space = self.env.single_action_space |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet. This is now gone.
eps += 1 | ||
|
||
episodes[env_index].add_env_step( | ||
infos[env_index].pop("final_observation"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, i.e. with gymnasium>=1.0.0
the final_observation
is gone and instead a regular observartion will be returned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, the final observation is returned in the actual obs
. The reset obs, you only get on the next(!) call to step, together with a dummy reward of 0.0.
…to upgrade_gymnasium_to_1_0_0a1 # Conflicts: # rllib/env/single_agent_env_runner.py
…ade_gymnasium_to_1_0_0a1
@mattip https://anaconda.org/conda-forge/gymnasium has been updated to v1.0.0 |
Hey @pseudo-rnd-thoughts , thanks for offering your help. Will do! Thus far, this has been a smoother ride than I thought (at least after I re-picked up this PR two days ago). Looks like all the tests are passing now and I also ran PPO+Pong, which learnt as well as with gymnasium==0.28.1. This is all looking very good. |
Hey @mattip , yes, this should be merged today/tomorrow. Just waiting for the last tests to run through (it's set to auto-merge). Just fixed the last braking one (SingleAgentEnvRunner), the rest looks fine. |
Great thanks! |
Reverting since this broke release tests and is blocking release. |
…ingzoo 1.24.3 supersuit 3.9.3). (ray-project#45328)
…ingzoo 1.24.3 supersuit 3.9.3). (ray-project#45328)
….4, pettingzoo 1.24.3 supersuit 3.9.3)." (ray-project#48297) Reverts ray-project#45328
…ingzoo 1.24.3 supersuit 3.9.3). (ray-project#45328) Signed-off-by: JP-sDEV <jon.pablo80@gmail.com>
….4, pettingzoo 1.24.3 supersuit 3.9.3)." (ray-project#48297) Reverts ray-project#45328 Signed-off-by: JP-sDEV <jon.pablo80@gmail.com>
…ingzoo 1.24.3 supersuit 3.9.3). (ray-project#45328) Signed-off-by: mohitjain2504 <mohit.jain@dream11.com>
….4, pettingzoo 1.24.3 supersuit 3.9.3)." (ray-project#48297) Reverts ray-project#45328 Signed-off-by: mohitjain2504 <mohit.jain@dream11.com>
Upgrade RLlib to gymnasium 1.0.0.
Reason:
Why are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.