-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Train Log]Ray Train Structured Logging #47806
base: master
Are you sure you want to change the base?
[Train Log]Ray Train Structured Logging #47806
Conversation
Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>
Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>
Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>
Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>
Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>
Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>
Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>
Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>
Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>
Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is nice! Here's a first pass with some questions.
(): ray._private.ray_logging.filters.CoreContextFilter | ||
|
||
handlers: | ||
file: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: file_text
is more descriptive of the encoding mode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments to pieces that I changed during the review for the ray data structured logging, overall lgtm please ping me when ready for a second pass.
else: | ||
config = _load_logging_config(DEFAULT_RAY_TRAIN_LOG_CONFIG_PATH) | ||
if RAY_TRAIN_LOG_ENCODING == RAY_TRAIN_JSON_LOG_ENCODING_FORMAT: | ||
for logger in config["loggers"].values(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@omatthew98 @hongpeng-guo One other question I had: Why is the configuration via an environment variable rather than a config somewhere in the API? Is adding a config in the API (ex: DataContext
/ ray.train.RunConfig
) a future plan?
That is a fair question, I was mostly going off what we already had in place which already used environment variables. I think there might be some argument to us wanting to use environment variables to ensure logging is configured as early as possible (e.g. on module initialization before A DataContext or train.RunConfig might exist), but not sure if that is the case. |
Just a heads up, based on this thread we are going to move our yaml configurations to python dictionary configurations. |
Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>
Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>
Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>
Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>
Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>
Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>
Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>
Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>
Comments handed, good to take a look for another round. |
python/ray/train/constants.py
Outdated
DEFAULT_LOG_CONFIG_YAML_STRING = """ | ||
version: 1 | ||
disable_existing_loggers: False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just have this be a dict similar to Data?
Yaml string may have syntax errors and is not native to python.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! I was thinking yaml
files have better readability comparing to json strings. However, after transformed as string, I don't think it's still that readable. I will make changes to update it to json string format.
# Env. variable to specify the encoding of the file logs when using the default config. | ||
LOG_ENCODING_ENV = "RAY_TRAIN_LOG_ENCODING" | ||
|
||
# Env. variable to specify the logging config path use defaults if not set | ||
LOG_CONFIG_PATH_ENV = "RAY_TRAIN_LOG_CONFIG_PATH" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move all these constants into the logging.py
file? I'd rather keep this module isolated and have fewer changes in train/*
code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we want to make this module isolated from the other parts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the default configuration json string and the default decoding format variables to the logging.py
, but still kept the two env var name variables within constant.py
. I think it makes more sense to keep all the user controlled env var names within constant.py
.
import yaml | ||
|
||
import ray | ||
from ray.tests.conftest import * # noqa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: don't think this is needed actually. conftest fixtures should automatically be visible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! I can remove this unnecessary import.
console_log_output = capsys.readouterr().err | ||
for log_line in console_log_output.splitlines(): | ||
with pytest.raises(json.JSONDecodeError): | ||
json.loads(log_line) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So all logger.warning(...)
gets logged in JSON format to the stdout as well? Does it make more sense for console outputs to be in the normal text format, and then only the ray-train.log
contains the JSON format?
Also, is it printed to stderr
or stdout
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
for
logger.warning(...)
, it is logged in JSON format toray-train.log
file, and logged to console as normal text format. The code here will raisepytest.raises(json.JSONDecodeError)
because it is normal text, not json in the console output format. -
By default, all the console output in python logger goes to
stderr
. Reference: https://docs.python.org/3/howto/logging.html#advanced-logging-tutorial. and https://github.com/hongpeng-guo/ray/blob/6a5fc2d39b0265e7b578f069a84ae772c123801b/python/ray/_private/log.py#L30
Can we include local, world, and node rank as part of the structure? |
Sure, I am thinking about adding more train only context in a followup PR. We also need to differentiate between driver, controller, and worker processes. All the ranks related concepts are only defined on worker processes. |
Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>
Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>
Update: I added a |
Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>
Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>
Why are these changes needed?
This PR creates the structured logging for ray train. The main structure follows the implementation of Ray Data's structured logging PR. Main components include:
python/ray/train/_internal/logging.py
: this file defines the logging utility functions;python/ray/train/tests/test_logging.py
: this file provides the corresponding unit tests for the logging utilities.Example
Code snippet:
JSON Logging
Text Logging
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.