Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core][dashboard] push down job_or_submission_id to GCS. #47492

Merged
merged 12 commits into from
Sep 11, 2024

Conversation

rynewang
Copy link
Contributor

@rynewang rynewang commented Sep 5, 2024

GCS API GetAllJobInfo serves Dashboard APIs, even for only 1 job. This becomes slow when the number of jobs are high. This PR pushes down the job filter to GCS to save Dashboard workload.

This API is kind of strange because the filter job_or_submission_id is actually Either a Job ID Or a job_submission_id. We don't have an index on the latter, and some jobs don't have one. So we still GetAll from Redis; and filter by both IDs after that and before doing more RPC calls.

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
@@ -145,6 +145,16 @@ void GcsJobManager::AddJobFinishedListener(JobFinishListenerCallback listener) {
void GcsJobManager::HandleGetAllJobInfo(rpc::GetAllJobInfoRequest request,
rpc::GetAllJobInfoReply *reply,
rpc::SendReplyCallback send_reply_callback) {
// Get all job info. This is a complex operation:
// 1. One GetAll from the job table.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think #46861 will help here as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

surprised I haven't merged that; will debug now

Copy link
Collaborator

@jjyao jjyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which Dashboard endpoint will benefit from this?

python/ray/_private/gcs_aio_client.py Outdated Show resolved Hide resolved
Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com>
Signed-off-by: Ruiyang Wang <56065503+rynewang@users.noreply.github.com>
@rynewang
Copy link
Contributor Author

rynewang commented Sep 5, 2024

Which Dashboard endpoint will benefit from this?

@routes.get("/api/jobs/{job_or_submission_id}")

@alexeykudinkin can you confirm this is a hot API we encountered?

@liuxsh9
Copy link
Contributor

liuxsh9 commented Sep 6, 2024

Maybe #47530 can also give a performance boost? Would love to hear your thoughts!

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
@@ -39,6 +39,9 @@ message MarkJobFinishedReply {
message GetAllJobInfoRequest {
// The number of jobs to return. If not specified, return all jobs.
optional int32 limit = 1;
// If set, only return the job with that job id in hex, or the job with that
// job_submission_id.
optional bytes job_or_submission_id = 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this repeated to make this proper batch API

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would refrain until we find a need for it

src/ray/gcs/gcs_server/gcs_job_manager.cc Outdated Show resolved Hide resolved
Co-authored-by: Alexey Kudinkin <alexey.kudinkin@gmail.com>
Signed-off-by: Ruiyang Wang <56065503+rynewang@users.noreply.github.com>
@rynewang rynewang added the go add ONLY when ready to merge, run all tests label Sep 8, 2024
Copy link
Collaborator

@jjyao jjyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lg

python/ray/dashboard/modules/job/utils.py Outdated Show resolved Hide resolved
python/ray/dashboard/modules/job/utils.py Show resolved Hide resolved
src/ray/gcs/gcs_client/accessor.h Outdated Show resolved Hide resolved
src/ray/gcs/gcs_client/accessor.h Outdated Show resolved Hide resolved
@jjyao
Copy link
Collaborator

jjyao commented Sep 9, 2024

rebase to fix rllib failure

Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
@jjyao jjyao merged commit b1e12c8 into ray-project:master Sep 11, 2024
3 of 5 checks passed
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
…#47492)

GCS API GetAllJobInfo serves Dashboard APIs, even for only 1 job. This becomes slow when the number of jobs are high. This PR pushes down the job filter to GCS to save Dashboard workload.

This API is kind of strange because the filter `job_or_submission_id` is actually Either a Job ID Or a job_submission_id. We don't have an index on the latter, and some jobs don't have one. So we still GetAll from Redis; and filter by both IDs after that and before doing more RPC calls.

---------

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com>
Co-authored-by: Alexey Kudinkin <alexey.kudinkin@gmail.com>
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
…#47492)

GCS API GetAllJobInfo serves Dashboard APIs, even for only 1 job. This becomes slow when the number of jobs are high. This PR pushes down the job filter to GCS to save Dashboard workload.

This API is kind of strange because the filter `job_or_submission_id` is actually Either a Job ID Or a job_submission_id. We don't have an index on the latter, and some jobs don't have one. So we still GetAll from Redis; and filter by both IDs after that and before doing more RPC calls.

---------

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com>
Co-authored-by: Alexey Kudinkin <alexey.kudinkin@gmail.com>
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
…#47492)

GCS API GetAllJobInfo serves Dashboard APIs, even for only 1 job. This becomes slow when the number of jobs are high. This PR pushes down the job filter to GCS to save Dashboard workload.

This API is kind of strange because the filter `job_or_submission_id` is actually Either a Job ID Or a job_submission_id. We don't have an index on the latter, and some jobs don't have one. So we still GetAll from Redis; and filter by both IDs after that and before doing more RPC calls.

---------

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com>
Co-authored-by: Alexey Kudinkin <alexey.kudinkin@gmail.com>
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
…#47492)

GCS API GetAllJobInfo serves Dashboard APIs, even for only 1 job. This becomes slow when the number of jobs are high. This PR pushes down the job filter to GCS to save Dashboard workload.

This API is kind of strange because the filter `job_or_submission_id` is actually Either a Job ID Or a job_submission_id. We don't have an index on the latter, and some jobs don't have one. So we still GetAll from Redis; and filter by both IDs after that and before doing more RPC calls.

---------

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com>
Co-authored-by: Alexey Kudinkin <alexey.kudinkin@gmail.com>
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
…#47492)

GCS API GetAllJobInfo serves Dashboard APIs, even for only 1 job. This becomes slow when the number of jobs are high. This PR pushes down the job filter to GCS to save Dashboard workload.

This API is kind of strange because the filter `job_or_submission_id` is actually Either a Job ID Or a job_submission_id. We don't have an index on the latter, and some jobs don't have one. So we still GetAll from Redis; and filter by both IDs after that and before doing more RPC calls.

---------

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com>
Co-authored-by: Alexey Kudinkin <alexey.kudinkin@gmail.com>
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
…#47492)

GCS API GetAllJobInfo serves Dashboard APIs, even for only 1 job. This becomes slow when the number of jobs are high. This PR pushes down the job filter to GCS to save Dashboard workload.

This API is kind of strange because the filter `job_or_submission_id` is actually Either a Job ID Or a job_submission_id. We don't have an index on the latter, and some jobs don't have one. So we still GetAll from Redis; and filter by both IDs after that and before doing more RPC calls.

---------

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com>
Co-authored-by: Alexey Kudinkin <alexey.kudinkin@gmail.com>
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
…#47492)

GCS API GetAllJobInfo serves Dashboard APIs, even for only 1 job. This becomes slow when the number of jobs are high. This PR pushes down the job filter to GCS to save Dashboard workload.

This API is kind of strange because the filter `job_or_submission_id` is actually Either a Job ID Or a job_submission_id. We don't have an index on the latter, and some jobs don't have one. So we still GetAll from Redis; and filter by both IDs after that and before doing more RPC calls.

---------

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com>
Co-authored-by: Alexey Kudinkin <alexey.kudinkin@gmail.com>
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
…#47492)

GCS API GetAllJobInfo serves Dashboard APIs, even for only 1 job. This becomes slow when the number of jobs are high. This PR pushes down the job filter to GCS to save Dashboard workload.

This API is kind of strange because the filter `job_or_submission_id` is actually Either a Job ID Or a job_submission_id. We don't have an index on the latter, and some jobs don't have one. So we still GetAll from Redis; and filter by both IDs after that and before doing more RPC calls.

---------

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com>
Co-authored-by: Alexey Kudinkin <alexey.kudinkin@gmail.com>
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
…#47492)

GCS API GetAllJobInfo serves Dashboard APIs, even for only 1 job. This becomes slow when the number of jobs are high. This PR pushes down the job filter to GCS to save Dashboard workload.

This API is kind of strange because the filter `job_or_submission_id` is actually Either a Job ID Or a job_submission_id. We don't have an index on the latter, and some jobs don't have one. So we still GetAll from Redis; and filter by both IDs after that and before doing more RPC calls.

---------

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com>
Co-authored-by: Alexey Kudinkin <alexey.kudinkin@gmail.com>
Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants