-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Serve] Let the controller look up the head node and fix flaky standalone3 healthz test #36878
Conversation
Signed-off-by: Gene Su <e870252314@gmail.com>
Signed-off-by: Gene Su <e870252314@gmail.com>
Signed-off-by: Gene Su <e870252314@gmail.com>
…e look up in http state Signed-off-by: Gene Su <e870252314@gmail.com>
Signed-off-by: Gene Su <e870252314@gmail.com>
Signed-off-by: Gene Su <e870252314@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Change the pr title. This pr is mainly for sticking the controller to the head node.
Thank you Gene!
Sorry, i misread the code. the controller is already stick with head node. We just potentially pass wrong head_node_id to the controller. (this pr can fix it)
|
Signed-off-by: Gene Su <e870252314@gmail.com>
@GeneDer |
Signed-off-by: Gene Su <e870252314@gmail.com>
Signed-off-by: Gene Su <e870252314@gmail.com>
Signed-off-by: Gene Su <e870252314@gmail.com>
Signed-off-by: Gene Su <e870252314@gmail.com>
Signed-off-by: Gene Su <e870252314@gmail.com>
Signed-off-by: Gene Su <e870252314@gmail.com>
Signed-off-by: Gene Su <e870252314@gmail.com>
Signed-off-by: Gene Su <e870252314@gmail.com>
Signed-off-by: Gene Su <e870252314@gmail.com>
Signed-off-by: Gene Su <e870252314@gmail.com>
…orker.client() Signed-off-by: Gene Su <e870252314@gmail.com>
Signed-off-by: Gene Su <e870252314@gmail.com>
Signed-off-by: Gene Su <e870252314@gmail.com>
Signed-off-by: Gene Su <e870252314@gmail.com>
Signed-off-by: Gene Su <e870252314@gmail.com>
…lone3 healthz test (ray-project#36878) - Make sure we are using wait_for_condition in the test (could take time to broadcast). - Remove head_node_id from controller init args and instead fetch it in the controller init. Also remove it from serve_start in _private/api.py. - Add an assertion to check that the controller actually runs on the head node (use ray.nodes() and look for head node resource). - Filter Nones from the active node set in deployment_state. Add a unit test for this, it should never return None. Signed-off-by: e428265 <arvind.chandramouli@lmco.com>
Why are these changes needed?
wait_for_condition
in the test (could taketime to broadcast).
head_node_id
from controller init args and instead fetch itin the controller init. Also remove it from
serve_start
in_private/api.py
.head node (use ray.nodes() and look for head node resource).
None
s from the active node set in deployment_state. Add aunit test for this, it should never return
None
.Related issue number
Closes: #37001
Fixes flaky test: https://buildkite.com/ray-project/oss-ci-build-branch/builds/4687#0188fdc4-2bd2-4e43-828c-d92f6d48e89d
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.