-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] make is_gpu, is_actor, root_detached_id fields late bind to workers. #47212
Conversation
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
# work, otherwise it should have created the 2 prestarted task-only | ||
# workers prior to https://github.com/ray-project/ray/pull/33623 | ||
assert result.total == 6 | ||
assert result.total == 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
big win: saved 2 workers from starting
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
src/ray/common/task/task_spec.h
Outdated
bool is_gpu, | ||
bool is_root_detached_actor); | ||
/// \param serialized_runtime_env The JSON-serialized runtime env for this worker. | ||
explicit WorkerCacheKey(std::string serialized_runtime_env); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can completely remove WorkerCacheKey
?
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There are test failures
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
wait_for_condition( | ||
lambda: len(get_metric_dictionaries("serve_deployment_request_counter")) == 2, | ||
lambda: len(get_metric_dictionaries("serve_deployment_request_counter_total")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now the counter metric name ends with _total
num_requests = get_metric_dictionaries("serve_deployment_request_counter") | ||
assert len(num_requests) == 2 | ||
expected_output = {"route": "/f", "deployment": "f", "application": "app1"} | ||
verify_metrics(num_requests[0], expected_output) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test assumes the order of the metrics it gets (f then g) but this is not something guaranteed so it's flaky. This PR changes to compare the whole metrics in a set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
serve test changes LGTM
…rkers. (ray-project#47212) Signed-off-by: Ruiyang Wang <rywang014@gmail.com> Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com> Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com> Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
…rkers. (ray-project#47212) Signed-off-by: Ruiyang Wang <rywang014@gmail.com> Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com> Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com> Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
…rkers. (ray-project#47212) Signed-off-by: Ruiyang Wang <rywang014@gmail.com> Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com> Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com> Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
…rkers. (ray-project#47212) Signed-off-by: Ruiyang Wang <rywang014@gmail.com> Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com> Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com> Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
…rkers. (ray-project#47212) Signed-off-by: Ruiyang Wang <rywang014@gmail.com> Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com> Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com> Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
…rkers. (ray-project#47212) Signed-off-by: Ruiyang Wang <rywang014@gmail.com> Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com> Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com> Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
…rkers. (ray-project#47212) Signed-off-by: Ruiyang Wang <rywang014@gmail.com> Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com> Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com> Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
…rkers. (ray-project#47212) Signed-off-by: Ruiyang Wang <rywang014@gmail.com> Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com> Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com> Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
…rkers. (ray-project#47212) Signed-off-by: Ruiyang Wang <rywang014@gmail.com> Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com> Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com> Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
When starting a worker for a task, we have:
Now, move the latter 3 fields from "early bind" into "late bind". The only early bind thing is now runtime_env.
This helps in worker reuse: the prestarted workers can be used as actors or gpu tasks, as long as they don't have runtime env.
Refactoring: move the "does this worker fit for this task" code into worker.cc. Simplify PopWorker a bit.