Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] make is_gpu, is_actor, root_detached_id fields late bind to workers. #47212

Merged
merged 17 commits into from
Sep 11, 2024

Conversation

rynewang
Copy link
Contributor

When starting a worker for a task, we have:

  • early bind: runtime env, is_gpu, is_actor
  • late bind: job_id, root_detached_id

Now, move the latter 3 fields from "early bind" into "late bind". The only early bind thing is now runtime_env.

This helps in worker reuse: the prestarted workers can be used as actors or gpu tasks, as long as they don't have runtime env.

Refactoring: move the "does this worker fit for this task" code into worker.cc. Simplify PopWorker a bit.

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
# work, otherwise it should have created the 2 prestarted task-only
# workers prior to https://github.com/ray-project/ray/pull/33623
assert result.total == 6
assert result.total == 4
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

big win: saved 2 workers from starting

rynewang and others added 3 commits August 20, 2024 22:02
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
@jjyao jjyao added the go add ONLY when ready to merge, run all tests label Aug 26, 2024
bool is_gpu,
bool is_root_detached_actor);
/// \param serialized_runtime_env The JSON-serialized runtime env for this worker.
explicit WorkerCacheKey(std::string serialized_runtime_env);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can completely remove WorkerCacheKey?

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Copy link
Collaborator

@jjyao jjyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

There are test failures

src/ray/common/task/task_spec.cc Outdated Show resolved Hide resolved
src/ray/raylet/worker.cc Outdated Show resolved Hide resolved
src/ray/raylet/worker.cc Outdated Show resolved Hide resolved
src/ray/raylet/worker.h Outdated Show resolved Hide resolved
src/ray/raylet/worker_pool.cc Outdated Show resolved Hide resolved
src/ray/raylet/worker_pool.cc Show resolved Hide resolved
rynewang and others added 9 commits September 4, 2024 13:27
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
wait_for_condition(
lambda: len(get_metric_dictionaries("serve_deployment_request_counter")) == 2,
lambda: len(get_metric_dictionaries("serve_deployment_request_counter_total"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now the counter metric name ends with _total

num_requests = get_metric_dictionaries("serve_deployment_request_counter")
assert len(num_requests) == 2
expected_output = {"route": "/f", "deployment": "f", "application": "app1"}
verify_metrics(num_requests[0], expected_output)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test assumes the order of the metrics it gets (f then g) but this is not something guaranteed so it's flaky. This PR changes to compare the whole metrics in a set.

Copy link
Contributor

@edoakes edoakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

serve test changes LGTM

@jjyao jjyao merged commit 7910a5e into ray-project:master Sep 11, 2024
4 of 5 checks passed
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
…rkers. (ray-project#47212)

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com>
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
…rkers. (ray-project#47212)

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com>
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
…rkers. (ray-project#47212)

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com>
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
…rkers. (ray-project#47212)

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com>
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
…rkers. (ray-project#47212)

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com>
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
…rkers. (ray-project#47212)

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com>
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
…rkers. (ray-project#47212)

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com>
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
…rkers. (ray-project#47212)

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com>
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
…rkers. (ray-project#47212)

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com>
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants