Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make log handlers configurable, shorten entries #378

Merged
merged 17 commits into from
Sep 6, 2021
Merged

Conversation

borzunov
Copy link
Member

@borzunov borzunov commented Sep 4, 2021

Preliminaries

The current implementation of hivemind.utils.logging has several problems (they are not related to the colored logging and exist for a long time):

  1. Bugs: Sometimes, one message is logged multiple times. This due to the combination of the bugs:

    • If name stays the same, logging.getLogger(name) always returns the same logger instance. We expect the same behavior from hivemind's get_logger(name), however it adds a new logging.StreamHandler to the logger instance every time. If it is called N times, you will have N stream handlers and each message will be repeated N times.

    • hivemind's get_logger(name) trims the first item in the module path. While this is intended to make the log line shorter by trimming the hivemind. string, this actually forces all modules at the root scope (e.g. utils.py and huggingface_auth.py at tanmoyai/sahajbert) to use the same logger. Because of the previous bug, we end up logging the same message multiple times even if we use different names.

  2. It is not obvious how to force other libraries (such as transformers) to use our logging style, so we have inconsistent log line styles in the example. Screenshot:

    Screen Shot 2021-09-04 at 6 35 08 AM
  3. Since hivemind is a library, a developer may want to use it but keep the existing logging style in their application (i.e. force hivemind to follow it). Currently, there is no way to do it since get_logger() does not use message propagation to the application loggers.

Solution

  1. First, we fix the bugs above, making get_logger() idempotent and avoiding trimming the actual logger name.

  2. Next, we note that there are 3 possible use cases:

    • (default) A user tries importing hivemind. It uses its own logging style among the package.
    • A user likes the hivemind logging style and wants to use it among their application (e.g. force other libraries to follow it).
    • A user does not like the hivemind logging style and wants to force it to follow another existing style.

    We give a user a straightforward way to switch between these use cases via a special function:

    use_hivemind_log_handler("in_hivemind")     # Option 1 (default)
    use_hivemind_log_handler("in_root_logger")  # Option 2, make the root logger to use hivemind style
    use_hivemind_log_handler("nowhere")         # Option 3, propagate hivemind logs to the existing root logger

    Note: This approach is inspired by the transformers.logging module (docs, source). The module allows to enable/disable the propagation to the root logger and enable/disable the transformers default log style. However, our API is even higher-level.

  3. We enable the in_root_logger mode in examples/albert, so that all messages (from __main__, transformers, and hivemind itself) consistently follow the hivemind style. Screenshot:

    Screen Shot 2021-09-04 at 6 33 03 AM
  4. We change some log messages to improve their presentation.

@codecov
Copy link

codecov bot commented Sep 4, 2021

Codecov Report

Merging #378 (3c6c551) into master (fb3f57b) will decrease coverage by 0.17%.
The diff coverage is 55.76%.

@@            Coverage Diff             @@
##           master     #378      +/-   ##
==========================================
- Coverage   84.23%   84.05%   -0.18%     
==========================================
  Files          70       70              
  Lines        6348     6383      +35     
==========================================
+ Hits         5347     5365      +18     
- Misses       1001     1018      +17     
Impacted Files Coverage Δ
hivemind/optim/collaborative.py 25.49% <14.28%> (ø)
hivemind/utils/logging.py 72.82% <62.22%> (-13.14%) ⬇️

logger.warning(
f"Process rank: {training_args.local_rank}, device: {training_args.device}, n_gpu: {training_args.n_gpu}"
+ f"distributed training: {bool(training_args.local_rank != -1)}, 16-bits training: {training_args.fp16}"
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This message is removed since training_args are already logged.

hivemind/utils/logging.py Outdated Show resolved Hide resolved
hivemind/utils/logging.py Outdated Show resolved Hide resolved
Comment on lines 130 to 150
elif _current_mode == StyleMode.EVERYWHERE:
_disable_default_handler(None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is going to be a frequent enough use case, aside from our examples. Hivemind is first and foremost a library, and from my understanding, no other library implements this format hijacking mechanism: thus, I am not sure that this feature is actually required and expected to be present in a library about distributed DL. Probably, the reason for this is that it's usually expected that logs from the given library are consistent in their format in all applications for ease of support and issue reporting, and thus one has no direct incentive to implicitly adopt the logging format of other library

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a blocker, I'm just afraid that adding unnecessary/rarely used features should not be our focus regarding logging

hivemind/utils/logging.py Outdated Show resolved Hide resolved
format="%(asctime)s - %(levelname)s - %(name)s - %(message)s",
datefmt="%m/%d/%Y %H:%M:%S",
level=logging.INFO if is_main_process(training_args.local_rank) else logging.WARN,
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This basicConfig() setting had no effect:

> This function does nothing if the root logger already has handlers configured, unless the keyword argument force is set to True.

@borzunov borzunov changed the title Rework hivemind.utils.logging Improve hivemind.utils.logging and log messages Sep 6, 2021
@borzunov
Copy link
Member Author

borzunov commented Sep 6, 2021

@justheuristic has verbally approved this.

@borzunov borzunov changed the title Improve hivemind.utils.logging and log messages Improve hivemind.utils.logging and some log messages Sep 6, 2021
@mryab mryab changed the title Improve hivemind.utils.logging and some log messages Make log handlers configurable, shorten entries Sep 6, 2021
@borzunov borzunov merged commit b84f62b into master Sep 6, 2021
@borzunov borzunov deleted the fix-get-logger branch September 6, 2021 23:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants