Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ci][core] Perf regression on tasks_per_second, pgs_per_second #41338

Closed
rickyyx opened this issue Nov 22, 2023 · 7 comments · Fixed by #41475
Closed

[ci][core] Perf regression on tasks_per_second, pgs_per_second #41338

rickyyx opened this issue Nov 22, 2023 · 7 comments · Fixed by #41475
Assignees
Labels
bug Something that is supposed to be working; but isn't core Issues that should be addressed in Ray Core P0 Issues that should be fixed in short order release-blocker P0 Issue that blocks the release

Comments

@rickyyx
Copy link
Contributor

rickyyx commented Nov 22, 2023

What happened + What you expected to happen

Regression on:

image image

Versions / Dependencies

5a7071e [Core] Remove dead gcs monitor service (#41262)
1a986de [data] add support for multiple group keys in map_groups (#40778)
84c0ba0 [Data] Expose max_retry_cnt parameter for BigQuery Write (#41163)
25bee34 [GCS FT] Update Redis connection configs (#40860)
709dc1b [ci] build anyscale images for gcp (#41224)
31c6631 Make Ray compatible with pydantic>=2.5.0 (#40451)
9dec618 [ci] mark two core functions as flaky (#41237)
9a34839 [Doc][KubeRay]: Redis eviction suggestions when ENABLE_GCS_FT_REDIS_CLEANUP=false (#40949)
29dba63 [KubeRay] [Doc] Update the link of KubeRay API reference (#41236)
7c7eaf2 [Train][Tune] Support reading train result from cloud storage (#40622)
7a75d09 [core][gcs] Remove ByteSizeLong call from GcsTaskManager (#41108)
8209893 [core] make ProcessFD, [Client,Server]Connection non copyable. (#41106)
ca29fec [RLlib] New ConnectorV2 API #1: Some preparatory cleanups and fixes. (#41074)
0e2a523 Make Ray work on GH200 (#40816)

Reproduction script

NA

Issue Severity

None

@rickyyx rickyyx added bug Something that is supposed to be working; but isn't release-blocker P0 Issue that blocks the release P0 Issues that should be fixed in short order core Issues that should be addressed in Ray Core labels Nov 22, 2023
@rickyyx rickyyx self-assigned this Nov 22, 2023
@rickyyx
Copy link
Contributor Author

rickyyx commented Nov 27, 2023

@rickyyx
Copy link
Contributor Author

rickyyx commented Nov 27, 2023

cc @iycheng

@rickyyx
Copy link
Contributor Author

rickyyx commented Nov 28, 2023

Still bisecting - the previously thought PR already had regression in it. (Reverting doesn't work).

@rickyyx
Copy link
Contributor Author

rickyyx commented Nov 29, 2023

Ok this is the pr that introduces the regression: #40451

before this: https://buildkite.com/ray-project/release/builds/2308#018c1376-2be9-4fd0-ab4a-5d41e845e9c1 -> tasks_per_sec = 530
at this: https://buildkite.com/ray-project/release/builds/2306#018c137a-9694-455d-a113-cff39815fe87 -> tasks_per_sec = 423

rkooo567 pushed a commit that referenced this issue Nov 30, 2023
Closes #41338

The hypothesis is that we are importing some additional things when initializing the python workers, and changing the import to lazy seems to fix the issue.

See https://buildkite.com/ray-project/release/builds/2492#018c1923-c5d0-4e09-a67e-b45cf2c3b553

Master: tasks_per_seconds = 430
This PR: tasks_per_seconds = 530
@rickyyx rickyyx reopened this Dec 1, 2023
@rkooo567
Copy link
Contributor

rkooo567 commented Dec 6, 2023

Ok this is the pr that introduces the regression: #40451

@rickyyx since you fixed this issue, can we close it now?

@rickyyx
Copy link
Contributor Author

rickyyx commented Dec 6, 2023

Yeah.

@rickyyx rickyyx closed this as completed Dec 6, 2023
@rickyyx
Copy link
Contributor Author

rickyyx commented Dec 6, 2023

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't core Issues that should be addressed in Ray Core P0 Issues that should be fixed in short order release-blocker P0 Issue that blocks the release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants