Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Upgrade to gymnasium 1.0.0 (ale_py 0.10.1, mujoco 3.2.4, pettingzoo 1.24.3 supersuit 3.9.3). #45328

Merged
merged 45 commits into from
Oct 28, 2024

Conversation

sven1977
Copy link
Contributor

@sven1977 sven1977 commented May 14, 2024

Upgrade RLlib to gymnasium 1.0.0.

Reason:

  • We require some bug fixes in gymnasium that only exist in 1.0.0a1/2 (not in 0.29.1) that allow us to make use of their vectorized sync and async environments in RLlib's new EnvRunners.

Why are these changes needed?

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: sven1977 <svenmika1977@gmail.com>
@@ -3,7 +3,6 @@
# Environment adapters.
# ---------------------
# Atari
gymnasium==0.28.1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since gymnasium is already part of the main Ray requirements.txt file, we won't need this here anymore.

Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
@sven1977
Copy link
Contributor Author

sven1977 commented May 14, 2024

cc: @pseudo-rnd-thoughts @jkterry1
Congrats on gymnasium 1.0!! This is super exciting. :)

Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: Sven Mika <sven@anyscale.io>
@@ -249,6 +249,8 @@ def _sample_timesteps(
observation=obs[env_index],
infos=infos[env_index],
)
self._was_terminated = [False for _ in range(self.num_envs)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is completely new auto-reset logic of gymnasium 1.0. The sub-env only gets reset'd upon the next(!) step call (with a fake reward of 0.0 and term/trunc=guaranteed False; and the obs/infos being the reset-obs/infos).
This is actually good for us as we should always do the env-to-module connector pass (even after the last timestep with the terminal obs in the Episodes list) to make sure the user - in case they are writing to the episode - gets a chance to also alter the final obs.

Copy link
Collaborator

@simonsays1980 simonsays1980 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@@ -88,7 +88,7 @@ def __init__(self, config: AlgorithmConfig, **kwargs):
# actually hold the spaces for a single env, but for boxes the
# shape is (1, 1) which brings a problem with the action dists.
# shape=(1,) is expected.
module_spec.action_space = self.env.envs[0].action_space
module_spec.action_space = self.env.single_action_space
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet. This is now gone.

eps += 1

episodes[env_index].add_env_step(
infos[env_index].pop("final_observation"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, i.e. with gymnasium>=1.0.0 the final_observation is gone and instead a regular observartion will be returned?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, the final observation is returned in the actual obs. The reset obs, you only get on the next(!) call to step, together with a dummy reward of 0.0.

Signed-off-by: sven1977 <svenmika1977@gmail.com>
…to upgrade_gymnasium_to_1_0_0a1

# Conflicts:
#	rllib/env/single_agent_env_runner.py
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
@pseudo-rnd-thoughts
Copy link

@mattip https://anaconda.org/conda-forge/gymnasium has been updated to v1.0.0
@sven1977 Let me know if there is any issues with Gymnasium or documentation changes we need to add / note

Signed-off-by: sven1977 <svenmika1977@gmail.com>
@sven1977 sven1977 enabled auto-merge (squash) October 27, 2024 20:29
@sven1977
Copy link
Contributor Author

Hey @pseudo-rnd-thoughts , thanks for offering your help. Will do! Thus far, this has been a smoother ride than I thought (at least after I re-picked up this PR two days ago). Looks like all the tests are passing now and I also ran PPO+Pong, which learnt as well as with gymnasium==0.28.1. This is all looking very good.

@sven1977
Copy link
Contributor Author

Hey @mattip , yes, this should be merged today/tomorrow. Just waiting for the last tests to run through (it's set to auto-merge). Just fixed the last braking one (SingleAgentEnvRunner), the rest looks fine.

@mattip
Copy link
Contributor

mattip commented Oct 27, 2024

Great thanks!

@sven1977 sven1977 merged commit bfd0d95 into ray-project:master Oct 28, 2024
6 checks passed
@sven1977 sven1977 deleted the upgrade_gymnasium_to_1_0_0a1 branch October 28, 2024 08:29
@can-anyscale
Copy link
Collaborator

Reverting since this broke release tests and is blocking release.

can-anyscale added a commit that referenced this pull request Oct 28, 2024
….4, pettingzoo 1.24.3 supersuit 3.9.3)." (#48297)

Reverts #45328
edoakes pushed a commit to edoakes/ray that referenced this pull request Oct 30, 2024
Jay-ju pushed a commit to Jay-ju/ray that referenced this pull request Nov 5, 2024
Jay-ju pushed a commit to Jay-ju/ray that referenced this pull request Nov 5, 2024
JP-sDEV pushed a commit to JP-sDEV/ray that referenced this pull request Nov 14, 2024
…ingzoo 1.24.3 supersuit 3.9.3). (ray-project#45328)

Signed-off-by: JP-sDEV <jon.pablo80@gmail.com>
JP-sDEV pushed a commit to JP-sDEV/ray that referenced this pull request Nov 14, 2024
….4, pettingzoo 1.24.3 supersuit 3.9.3)." (ray-project#48297)

Reverts ray-project#45328
Signed-off-by: JP-sDEV <jon.pablo80@gmail.com>
mohitjain2504 pushed a commit to mohitjain2504/ray that referenced this pull request Nov 15, 2024
…ingzoo 1.24.3 supersuit 3.9.3). (ray-project#45328)

Signed-off-by: mohitjain2504 <mohit.jain@dream11.com>
mohitjain2504 pushed a commit to mohitjain2504/ray that referenced this pull request Nov 15, 2024
….4, pettingzoo 1.24.3 supersuit 3.9.3)." (ray-project#48297)

Reverts ray-project#45328

Signed-off-by: mohitjain2504 <mohit.jain@dream11.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests P1 Issue that should be fixed within a few weeks rllib RLlib related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants