-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UX: rework stdout error when ray fails to start #35378
base: master
Are you sure you want to change the base?
Conversation
@rkooo567 could you review this PR? |
@@ -1258,6 +1254,11 @@ def read_log(filename, lines_to_read): | |||
"Ray github. " | |||
"https://github.com/ray-project/ray/issues" | |||
) | |||
logger.error( | |||
"Error should be written to 'dashboard.log' " | |||
"We are printing the last lines" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you keep the link to the log directory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"We are printing the last lines" | |
"We are printing the last {len(lines)} lines" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both the log directory and the number of lines is contained a few lines lower. I think it should only appear once:
The last 2 lines of /tmp/ray/session_2023-05-16_10-04-16_493412_122666/logs/dashboard.log
(it may contain an error message from the dashboard):
lines = [] | ||
with open(dashboard_log, "rb") as f: | ||
with mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) as mm: | ||
end = mm.size() | ||
for _ in range(lines_to_read): | ||
sep = mm.rfind(b"\n", 0, end - 1) | ||
if sep == -1: | ||
if end > 1: | ||
lines.append(mm[: end - 1].decode("utf-8")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a unit test for this. case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point is to add the first line, which was missing before this PR. Note that the line
2023-05-16 10:04:17,780 INFO head.py:135 -- Dashboard head grpc address: 0.0.0.0:4289
is missing. Where should I add a test: is there currently a test for this part of the code that I can add to?
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
|
@mattip is it ready to review again? |
Yes, I am still waiting for guidance how to write a test. I would need to check the output of the worker when it starts up and cannot access the HEAD node. |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
|
It would be great if this could be merged (test or not); it would help debug a long-standing issue (#35383) where we're stuck on not knowing what's going wrong at all. Perhaps after debugging that issue, it'll be easier to write a test. |
Signed-off-by: mattip <matti.picus@gmail.com>
I rebased off master, maybe it will help move the review forward |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
|
Why are these changes needed?
When ray fails to start the dashboard, it prints out the last 20 lines of the dashboard.log file. Here is what it printed when I had a problem with grpcio 1
There are a few UX tweaks that can be made to the message:
Note that in my case the error was elsewhere, so perhaps this whole effort to print the dashboard.log should be a little less certain that the error is exactly here.
After this PR, the same error prints out
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.Footnotes
This appeared while updating the conda recipe for v2.4.0 in resync grpcio requirements conda-forge/ray-packages-feedstock#98, I am still searching for the root cause, probably related to the last line with a bogus worker number and an IOError ↩