-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[State API] Add worker startup & initialization time to state API + use it for many_tasks #31916
[State API] Add worker startup & initialization time to state API + use it for many_tasks #31916
Conversation
Signed-off-by: SangBin Cho <rkooo567@gmail.com>
Signed-off-by: SangBin Cho <rkooo567@gmail.com>
@@ -202,12 +202,6 @@ static Gauge ObjectDirectoryRemovedLocations( | |||
"have been removed from this node.", | |||
"removals"); | |||
|
|||
/// Worker Pool | |||
static Histogram ProcessStartupTimeMs("process_startup_time_ms", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed it cuz I thought we don't really use it anyway. I am open to just keep it as well
src/ray/protobuf/gcs.proto
Outdated
// The field exists only when the worker is launched | ||
// by a raylet. (I.e., driver worker won't have this value). | ||
optional int64 worker_launch_time_ms = 25; | ||
optional int64 worker_launched_time_ms = 26; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will update the docstring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Should we make a perf metric out of it if this is being reliable and consistent?
looks we did a lot of plumbing.. is it really necessary? |
Are you asking a feature is necessary or the plumbing is necessary? Unfortunately plumbing is necessary to implement a feature |
Signed-off-by: SangBin Cho <rkooo567@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comments
@@ -318,6 +318,8 @@ WorkerPool::BuildProcessCommandArgs(const Language &language, | |||
if (language == Language::PYTHON) { | |||
worker_command_args.push_back("--startup-token=" + | |||
std::to_string(worker_startup_token_counter_)); | |||
worker_command_args.push_back("--worker-launch-time-ms=" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm can we log into C++ metrics directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm we anyway need plumbing to calculate the initialization time although we use metrics
also metrics has cardinatliy issue + not practical at this point (until we have a default core dashboard, which we don't plan to have it for a while).
Having these values to worker state also makes sense given that's the direction we are going (add event information to state API)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
talked offline. while i don't think this is necessary but ok to move forward as it is for now.
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
|
Signed-off-by: SangBin Cho <rkooo567@gmail.com>
Signed-off-by: SangBin Cho <rkooo567@gmail.com>
// The time when this worker is launched. | ||
// The time when this worker process is requested from raylet. | ||
// The field exists only when the worker is launched | ||
// by a raylet. (I.e., driver worker won't have this value). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did we un-optional the fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added // If the value doesn't present, it is -1.
!
#: The time when the worker is started and initialized. | ||
#: 0 if the value doesn't exist. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we also -1 default it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me try
should we have a release test dry run to see how this looks like? |
Signed-off-by: SangBin Cho <rkooo567@gmail.com>
Somehow not seeing th eoutputs there in the release test? |
…se it for many_tasks (ray-project#31916) This PR adds information to retrieve 1. worker startup time 2. worker initialization time from `ray list workers --detail`. We also add summarization & print the result to many_tasks which will help debugging worker-related regressions. Signed-off-by: elliottower <elliot@elliottower.com>
…se it for many_tasks (ray-project#31916) This PR adds information to retrieve 1. worker startup time 2. worker initialization time from `ray list workers --detail`. We also add summarization & print the result to many_tasks which will help debugging worker-related regressions. Signed-off-by: Jack He <jackhe2345@gmail.com>
Why are these changes needed?
This PR adds information to retrieve
from
ray list workers --detail
.We also add summarization & print the result to many_tasks which will help debugging worker-related regressions.
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.