-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make log handlers configurable, shorten entries #378
Conversation
Codecov Report
@@ Coverage Diff @@
## master #378 +/- ##
==========================================
- Coverage 84.23% 84.05% -0.18%
==========================================
Files 70 70
Lines 6348 6383 +35
==========================================
+ Hits 5347 5365 +18
- Misses 1001 1018 +17
|
logger.warning( | ||
f"Process rank: {training_args.local_rank}, device: {training_args.device}, n_gpu: {training_args.n_gpu}" | ||
+ f"distributed training: {bool(training_args.local_rank != -1)}, 16-bits training: {training_args.fp16}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This message is removed since training_args
are already logged.
hivemind/utils/logging.py
Outdated
elif _current_mode == StyleMode.EVERYWHERE: | ||
_disable_default_handler(None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is going to be a frequent enough use case, aside from our examples. Hivemind is first and foremost a library, and from my understanding, no other library implements this format hijacking mechanism: thus, I am not sure that this feature is actually required and expected to be present in a library about distributed DL. Probably, the reason for this is that it's usually expected that logs from the given library are consistent in their format in all applications for ease of support and issue reporting, and thus one has no direct incentive to implicitly adopt the logging format of other library
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't a blocker, I'm just afraid that adding unnecessary/rarely used features should not be our focus regarding logging
format="%(asctime)s - %(levelname)s - %(name)s - %(message)s", | ||
datefmt="%m/%d/%Y %H:%M:%S", | ||
level=logging.INFO if is_main_process(training_args.local_rank) else logging.WARN, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This basicConfig()
setting had no effect:
> This function does nothing if the root logger already has handlers configured, unless the keyword argument force is set to True.
69b8acb
to
d051bc9
Compare
@justheuristic has verbally approved this. |
Preliminaries
The current implementation of
hivemind.utils.logging
has several problems (they are not related to the colored logging and exist for a long time):Bugs: Sometimes, one message is logged multiple times. This due to the combination of the bugs:
If
name
stays the same,logging.getLogger(name)
always returns the same logger instance. We expect the same behavior from hivemind'sget_logger(name)
, however it adds a newlogging.StreamHandler
to the logger instance every time. If it is called N times, you will have N stream handlers and each message will be repeated N times.hivemind's
get_logger(name)
trims the first item in the module path. While this is intended to make the log line shorter by trimming thehivemind.
string, this actually forces all modules at the root scope (e.g.utils.py
andhuggingface_auth.py
at tanmoyai/sahajbert) to use the same logger. Because of the previous bug, we end up logging the same message multiple times even if we use differentname
s.It is not obvious how to force other libraries (such as
transformers
) to use our logging style, so we have inconsistent log line styles in the example. Screenshot:Since
hivemind
is a library, a developer may want to use it but keep the existing logging style in their application (i.e. force hivemind to follow it). Currently, there is no way to do it sinceget_logger()
does not use message propagation to the application loggers.Solution
First, we fix the bugs above, making
get_logger()
idempotent and avoiding trimming the actual logger name.Next, we note that there are 3 possible use cases:
We give a user a straightforward way to switch between these use cases via a special function:
Note: This approach is inspired by the
transformers.logging
module (docs, source). The module allows to enable/disable the propagation to the root logger and enable/disable thetransformers
default log style. However, our API is even higher-level.We enable the
in_root_logger
mode inexamples/albert
, so that all messages (from__main__
,transformers
, andhivemind
itself) consistently follow the hivemind style. Screenshot:We change some log messages to improve their presentation.