Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[State API] Add worker startup & initialization time to state API + use it for many_tasks #31916

Merged
merged 14 commits into from
Mar 24, 2023

Conversation

rkooo567
Copy link
Contributor

Why are these changes needed?

This PR adds information to retrieve

  1. worker startup time
  2. worker initialization time

from ray list workers --detail.

We also add summarization & print the result to many_tasks which will help debugging worker-related regressions.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: SangBin Cho <rkooo567@gmail.com>
Signed-off-by: SangBin Cho <rkooo567@gmail.com>
Signed-off-by: SangBin Cho <rkooo567@gmail.com>
@@ -202,12 +202,6 @@ static Gauge ObjectDirectoryRemovedLocations(
"have been removed from this node.",
"removals");

/// Worker Pool
static Histogram ProcessStartupTimeMs("process_startup_time_ms",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed it cuz I thought we don't really use it anyway. I am open to just keep it as well

// The field exists only when the worker is launched
// by a raylet. (I.e., driver worker won't have this value).
optional int64 worker_launch_time_ms = 25;
optional int64 worker_launched_time_ms = 26;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will update the docstring

Copy link
Contributor

@rickyyx rickyyx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
Should we make a perf metric out of it if this is being reliable and consistent?

python/ray/_private/workers/default_worker.py Outdated Show resolved Hide resolved
@scv119
Copy link
Contributor

scv119 commented Jan 26, 2023

looks we did a lot of plumbing.. is it really necessary?

@scv119 scv119 added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Jan 26, 2023
@rkooo567
Copy link
Contributor Author

Are you asking a feature is necessary or the plumbing is necessary?

Unfortunately plumbing is necessary to implement a feature

Signed-off-by: SangBin Cho <rkooo567@gmail.com>
@rkooo567 rkooo567 removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Jan 27, 2023
Copy link
Contributor

@scv119 scv119 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments

@@ -318,6 +318,8 @@ WorkerPool::BuildProcessCommandArgs(const Language &language,
if (language == Language::PYTHON) {
worker_command_args.push_back("--startup-token=" +
std::to_string(worker_startup_token_counter_));
worker_command_args.push_back("--worker-launch-time-ms=" +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm can we log into C++ metrics directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm we anyway need plumbing to calculate the initialization time although we use metrics
also metrics has cardinatliy issue + not practical at this point (until we have a default core dashboard, which we don't plan to have it for a while).

Having these values to worker state also makes sense given that's the direction we are going (add event information to state API)

Copy link
Contributor

@scv119 scv119 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

talked offline. while i don't think this is necessary but ok to move forward as it is for now.

@rkooo567 rkooo567 added @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. do-not-merge Do not merge this PR! labels Feb 1, 2023
@stale
Copy link

stale bot commented Mar 11, 2023

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

  • If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@stale stale bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Mar 11, 2023
@rkooo567 rkooo567 removed the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Mar 15, 2023
Signed-off-by: SangBin Cho <rkooo567@gmail.com>
Signed-off-by: SangBin Cho <rkooo567@gmail.com>
@rkooo567 rkooo567 requested a review from a team as a code owner March 20, 2023 23:35
// The time when this worker is launched.
// The time when this worker process is requested from raylet.
// The field exists only when the worker is launched
// by a raylet. (I.e., driver worker won't have this value).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did we un-optional the fields?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added // If the value doesn't present, it is -1.!

#: The time when the worker is started and initialized.
#: 0 if the value doesn't exist.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also -1 default it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me try

@rickyyx
Copy link
Contributor

rickyyx commented Mar 23, 2023

should we have a release test dry run to see how this looks like?

@rkooo567
Copy link
Contributor Author

release test here https://buildkite.com/ray-project/release-tests-pr/builds/32132#01870e86-78ba-4ac8-b459-432d6105a00d

Signed-off-by: SangBin Cho <rkooo567@gmail.com>
@rkooo567 rkooo567 merged commit cd70485 into ray-project:master Mar 24, 2023
@rickyyx
Copy link
Contributor

rickyyx commented Mar 24, 2023

Somehow not seeing th eoutputs there in the release test?

elliottower pushed a commit to elliottower/ray that referenced this pull request Apr 22, 2023
…se it for many_tasks (ray-project#31916)

This PR adds information to retrieve
1. worker startup time
2. worker initialization time

from `ray list workers --detail`.

We also add summarization & print the result to many_tasks which will help debugging worker-related regressions.

Signed-off-by: elliottower <elliot@elliottower.com>
ProjectsByJackHe pushed a commit to ProjectsByJackHe/ray that referenced this pull request May 4, 2023
…se it for many_tasks (ray-project#31916)

This PR adds information to retrieve
1. worker startup time
2. worker initialization time

from `ray list workers --detail`.

We also add summarization & print the result to many_tasks which will help debugging worker-related regressions.

Signed-off-by: Jack He <jackhe2345@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. do-not-merge Do not merge this PR!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants