-
Notifications
You must be signed in to change notification settings - Fork 539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CLI] Support --ip
option in sky status
to output cluster IP only
#2563
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @cblmemo! Tried it out and it works well. I think we need to think about multi-node clusters too (e.g., how to get IPs of all workers), but for this PR supporting only head node should be fine.
Also I found it slightly confusing when I had just one cluster and sky status --ip
simply returned the IP of that cluster without requiring a cluster name argument. Maybe its ok? Will let others chime in here.
sky/cli.py
Outdated
raise RuntimeError('Failed to fetch IP address. ' | ||
'Please try again later.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if there is some better hint we can provide here than just asking to try again later. Brainstorming - what are the reasons IP fetching would fail? Are there any actionable suggestions we can provide to fix those reasons?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One possible reason could be network connection issues 🤔or some ray errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see... I am trying to find a good way to surface those errors for debugging (i.e., print them when SKYPILOT_DEBUG=1
), but looks like any errors are not propagated through handle.external_ips()
... any other ideas?
Co-authored-by: Romil Bhardwaj <romil.bhardwaj@berkeley.edu>
Co-authored-by: Romil Bhardwaj <romil.bhardwaj@berkeley.edu>
Co-authored-by: Romil Bhardwaj <romil.bhardwaj@berkeley.edu>
A quick thought: Is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @cblmemo! LGTM with some minor ux nits. Since @Michaelvll is the author of the source issue, I'll let him comment on this too if the interface/UX makes sense to him.
sky/cli.py
Outdated
raise RuntimeError('Failed to fetch IP address. ' | ||
'Please try again later.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see... I am trying to find a good way to surface those errors for debugging (i.e., print them when SKYPILOT_DEBUG=1
), but looks like any errors are not propagated through handle.external_ips()
... any other ideas?
Co-authored-by: Romil Bhardwaj <romil.bhardwaj@berkeley.edu>
if ip: | ||
if len(clusters) != 1: | ||
with ux_utils.print_exception_no_traceback(): | ||
plural = 's' if len(clusters) > 1 else '' | ||
cluster_num = (str(len(clusters)) | ||
if len(clusters) > 0 else 'No') | ||
raise ValueError( | ||
_STATUS_IP_CLUSTER_NUM_ERROR_MESSAGE.format( | ||
cluster_num=cluster_num, | ||
plural=plural, | ||
verb='specified')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we just handle this in L1751? May not be necessary to separate the two cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some special cases...
For example:
- Say I have 3 clusters
test
,cluster-1
,cluster-2
in local database - If we only check the length of
cluster_records
(the return value ofcore.status
):sky status --ip test abc
will output the ip of clustertest
, since the_get_glob_clusters
doesn't find any clusters withabc
, which is not desirable (we transparently ignored one cluster)
- If we only check the length of
clusters
:sky status --ip 'cluster-*'
needs to throw an error since it will match two clusters
Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com>
Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @cblmemo - took another look, LGTM.
Resolved #2439 .
To discuss: Do we want to enable this feature for multiple clusters? If so, what delimiter shall we use? space or newline?
Tested (run the relevant ones):
bash format.sh
pytest tests/test_smoke.py
pytest tests/test_smoke.py::test_fill_in_the_name
bash tests/backward_comaptibility_tests.sh