Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core][dashboard][agent] add configurable timeouts for rt env agent and job_supervisor #47481

Merged
merged 4 commits into from
Sep 4, 2024

Conversation

rynewang
Copy link
Contributor

@rynewang rynewang commented Sep 4, 2024

GcsClient has a configurable timeout nums_py_gcs_reconnect_retry. However in GcsAioClient it's default to 5 and there's no way to control it. This PR changes the caller to use the flag gcs_rpc_server_reconnect_timeout_s to make it configurable. It's already used in agent.py but not in rt env agent and job_supervisor. This PR fixes all GcsAioClient caller in non-test python codebase.

Note that head.py has retry=0 which should mean infinite retry but it did not work. Fixes by checking 0-ness.

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
@jjyao jjyao added the go add ONLY when ready to merge, run all tests label Sep 4, 2024
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
@rynewang
Copy link
Contributor Author

rynewang commented Sep 4, 2024

Updated NewGcsAioClient to always use os.environ["RAY_py_gcs_connect_timeout_s"] and disregard num retries. Still kept the changes to the callers in case we use OldGcsAioClient.

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
@rynewang rynewang enabled auto-merge (squash) September 4, 2024 21:27
@rynewang rynewang merged commit e9f7930 into ray-project:master Sep 4, 2024
6 checks passed
@rynewang rynewang deleted the rtenv-client-timeout branch September 4, 2024 23:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants