-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spread replicas with custom resources in torch tune serve release test #46093
Conversation
For the Golden Notebook Torch Tune Serve release test. Use custom resources to make sure one replica gets scheduled for each of the two nodes in the cluster. Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>
num_replicas=2, | ||
ray_actor_options={"num_gpus": 1, "resources": {"worker": 1}} | ||
if use_gpu | ||
else {}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this won't spread across nodes if no GPU is used, is that intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think no_gpu
is used for smoke test. There is only one worker node type in this test, so no gpu should mean all replicas get started on head node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(if ed says it is good, it is good)
@edoakes ready to merge? |
Spread replicas with custom resources in torch tune serve release test
For the Golden Notebook Torch Tune Serve release test.
Use custom resources to make sure one replica gets scheduled for each of the two nodes in the cluster.
Signed-off-by: Cindy Zhang cindyzyx9@gmail.com