Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Cherry pick rllib test fixes 2 6 #37154

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion release/release_tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4190,7 +4190,7 @@
team: rllib
cluster:
cluster_env: app_config.yaml
cluster_compute: 1gpu_16cpus.yaml
cluster_compute: 1gpu_32cpus.yaml

run:
timeout: 18000
Expand Down
17 changes: 17 additions & 0 deletions release/rllib_tests/1gpu_32cpus.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
cloud_id: {{env["ANYSCALE_CLOUD_ID"]}}
region: us-west-2

max_workers: 0

head_node_type:
name: head_node
instance_type: g5.8xlarge

worker_node_types: []

aws:
BlockDeviceMappings:
- DeviceName: /dev/sda1
Ebs:
DeleteOnTermination: true
VolumeSize: 500
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,9 @@ head_node_type:

worker_node_types:
- name: worker_node
instance_type: g3.8xlarge
min_workers: 1
max_workers: 1
instance_type: g3s.xlarge
min_workers: 2
max_workers: 2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh hmm if you change the number of workers here, you need to change the wait_for_node in release_tests.yaml as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh shucks. It seems to have all worked out though?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests were previously passing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, in theory it will be a data race bug so it might or might not happen

use_spot: false

aws:
Expand Down
2 changes: 1 addition & 1 deletion rllib/core/learner/torch/torch_learner.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ def __init__(
# Will be set during build.
self._device = None

# Whether to compile the RL Module of this learner. This implies that the
# Whether to compile the RL Module of this learner. This implies that the.
# forward_train method of the RL Module will be compiled. Further more,
# other forward methods of the RL Module will be compiled on demand.
# This is assumed to not happen, since other forwrad methods are not expected
Expand Down