Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Train Log]Ray Train Structured Logging #47806
base: master
Are you sure you want to change the base?
[Train Log]Ray Train Structured Logging #47806
Changes from 32 commits
48c0961
90b6893
dc79a60
d388da5
ace76a7
d6b64c0
ddadc1c
6216d4f
c77a181
427b861
58bbf5d
e4d8b63
99266ae
cbf4677
d3aeff8
02d439f
c3adf12
0fed499
35249c8
6a5fc2d
1768dc6
c1a9955
82b23ad
d79119f
b6c8cbd
1576354
441b043
6bcb2a2
7abb76d
50d2067
dee3fde
921fd4c
6335cf0
2a4a9fb
263d2b9
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we switch between the
core_context_filter
vs. theconsole_filter
?Let's just use the
core_context_filter
and remove theconsole_filter
(HiddenRecordFilter
), since Alan mentions thisHiddenRecordFilter
is not needed.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I think this HiddenRecordFilter is not very useful as well. I will remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we limit console output to INFO and above? DEBUG is ok for the log-viewer since people can filter it out, but it will spam the console.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually specified level
INFO
in theconsole_text
handler.>=DEBUG
, when it passed toconsole_text
handler, theINFO
level will filter out theDEBUG
messages, making it less spammy.file
handler don't specify extra levels, so theDEBUG
level info will show in JSON mode and can be filtered out by the user.However, if we set the default level of
ray.trian
logger asINFO
, DEBUG info will be removed at the logger level, that the file handler can not ingest extra information in the JSON mode.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should only tag
world_rank
,local_rank
, andnode_rank
.world_size
/local_world_size
is confusing to filter by.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
emm, good point. the size information is not useful for log searching / filtering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think
hide=True
is necessary. At least, the product won't utilize this field.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it! A followup question: what's the format we should follow so that the product team can use to filter for logs to be shown /hidden by default in the log viewer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can just filter out non "rank 0" or do whatever behavior based on the other fields
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move all these constants into the
logging.py
file? I'd rather keep this module isolated and have fewer changes intrain/*
code.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we want to make this module isolated from the other parts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the default configuration json string and the default decoding format variables to the
logging.py
, but still kept the two env var name variables withinconstant.py
. I think it makes more sense to keep all the user controlled env var names withinconstant.py
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for my own understanding, this
training_loop
function is part oftuner.fit()
function that will be called in a tune process. Therefore, thesession.get_run_id()
will actually get therun_id
from a tune process, although thisseesion
is imported fromray.train._internal
. This session is actually initialized insidefunctional_trainable.py
which defined undertune/trainable/function_trainable.py
. cc @justinvyu @matthewdengThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right, the
training_loop
here is the Train driver logic that is running inside thetune.FunctionTrainable
. So the "session" refers to the Ray Tune session, not the Ray Train Worker session.