Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DataCatalog]: Disable catalog.load() logging #3918

Closed
ElenaKhaustova opened this issue Jun 3, 2024 · 8 comments · Fixed by #3968
Closed

[DataCatalog]: Disable catalog.load() logging #3918

ElenaKhaustova opened this issue Jun 3, 2024 · 8 comments · Fixed by #3968
Assignees
Labels
Issue: Feature Request New feature or improvement to existing feature

Comments

@ElenaKhaustova
Copy link
Contributor

ElenaKhaustova commented Jun 3, 2024

Description

catalog.load() produces extensive logging information when loading multiple datasets in the same notebook cell. These logger messages annoy users and cannot be disabled with the current implementation.

Context

"It’s very convenient to load my dataset when debugging locally. The only annoying thing is the multiple logger.info messages I get when I load multiple datasets in the same notebook cell."

Possible Implementation

  1. Add a boolean argument to catalog.load() to disable logging prints for cleaner outputs.
  2. Consider configuring this at logger level and suggest an approach.
  3. Explore if the amount of logging information during the catalog.load() can be reduced.
@ElenaKhaustova ElenaKhaustova added the Issue: Feature Request New feature or improvement to existing feature label Jun 3, 2024
@astrojuanlu
Copy link
Member

Relaying internal discussion:

@astrojuanlu: Rather than a boolean argument, could this be configured at the logger level instead?
@ElenaKhaustova: I think the problem is that users want to silent this particular method rather that disable logging at all.

@merelcht
Copy link
Member

merelcht commented Jun 7, 2024

I’m not sure we should have a custom approach for dealing with logging for dataset loading compared to other log messages.

@ElenaKhaustova ElenaKhaustova self-assigned this Jun 25, 2024
@ElenaKhaustova
Copy link
Contributor Author

The amount of logging information during the catalog.load() does not look extensive and should not be changed IMO. However, I can imagine cases with tens of datasets where this INFO output will pollute a cell.

Screenshot 2024-06-25 at 13 54 35

I also agree that adding a custom flag just for one method - catalog.load() is not the best solution.

Based on the above we can consider the following:

  1. Suggest using a custom logging config and setting a desired logging level. The drawback is that it changes global logger settings.
  2. Suggest changing a logger level at runtime and then resetting it to ensure it doesn't affect other modules. This might look a little bit complicated from the user's perspective.
import logging

# Change logging level
logger_level = logging.getLogger("kedro").level
logging.getLogger("kedro").setLevel(logging.WARNING)

for ds in catalog.list():
    _ = catalog.load(ds)

# Reset the level
logging.getLogger("kedro").setLevel(logger_level)
  1. Make the _logger property public so that users can modify it and set the desired logging level just for the catalog component rather than globally. In that case they would not need to reset it.
    @property

@astrojuanlu
Copy link
Member

@property
def _logger(self) -> logging.Logger:
return logging.getLogger(__name__)

This is a pattern I've never seen. And yet, can't the user do

logging.getLogger("kedro.io.data_catalog").setLevel(logging.WARNING)

? (Haven't tested this, unsure if it works)

@ElenaKhaustova
Copy link
Contributor Author

ElenaKhaustova commented Jun 26, 2024

@astrojuanlu

Yes, that works and doesn't affect the top-level loggers! So I believe that should be the solution and no changes are required as all of them lead to the same amount of effort from the user side.

Closing the issue.

@datajoely
Copy link
Contributor

So I still think we should make this easier for the user:

  • Docs at the very least
  • logging.yml setting?

@ElenaKhaustova
Copy link
Contributor Author

So I still think we should make this easier for the user:

  • Docs at the very least
  • logging.yml setting?

Good point, I will extend the section on how to customise logging then. But I would not change the default logging.yml as having INFO level seems reasonable at the start.

@merelcht
Copy link
Member

Completed in #3968

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Feature Request New feature or improvement to existing feature
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants