Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement logging #257

Open
wants to merge 53 commits into
base: master
Choose a base branch
from
Open

Implement logging #257

wants to merge 53 commits into from

Conversation

alexlyttle
Copy link
Collaborator

@alexlyttle alexlyttle commented Jan 16, 2021

Implement logging

Relates to Issue #175. Despite a lot of commits and information below, this PR is relatively straightforward. I am happy to discuss it over Zoom or similar!

  • Configures a base package logger
  • Adds a @debug decorator for logging entering and exiting functions
    • Add @debug(logger) decorator to all functions where useful
    • log more function information upon entering and exiting, thoughts welcome?
  • Adds a file_logging context manager
  • Log exceptions (inc/exc tracebacks) as warnings, errors or critical
  • Add info and debug logs everywhere for logging normal code operation
  • Consider when to use logging
    • E.g. we could replace some UserWarnings with logger.warning if the warning does not require action by the user. Remember, Python warnings should be treated as something to act on, it would be great to have all unit tests pass with no warnings!
    • Logging warnings, errors and critical should be used when warnings/exceptions are ignored by the code (e.g. in try/except)
    • Logging debug and info give general runtime information which is not an integral part of using the code but tracks its progress
    • Still use print to print the intended output of the code.

Logger

Following best-practices and suggestions in the Python docs, I add a logger to each module which takes the name of the module (__name__). The base logger for the package is called 'pbjam' and is configured in the package __init__.py. All module loggers inherit from this, as is the beauty of logging!

I have added one console stream handler at level 'INFO' and above to the base logger. It is worth noting that pbjam is a package and not an application or script, so adding a console stream handler like this goes against the Python docs (see here). I would be perfectly happy not having console logging. File logging, when calling session and star as implemented below, is more appropriate in my opinion. If we need to "talk" to the code user via the console then print is perfectly fine. Logging is just for tracking the progress of the code. Happy to discuss this over Zoom or in the comments!

Debug decorator

The @debug(logger) decorator can be added to any function in the pbjam modules by importing from .jar import debug. Then, simply add the decorator to your function like so,

import logging
from .jar import debug

logger = logging.getLogger(__name__)  # logger is defined at the top of each module
debugger = debug(logger)  # sets up a debugger decorator to send records to logger around a function

@debugger
def example_function():
    ...

This wraps the function by logging entering and exiting it - this is done by the functions _entering_function and _exiting_function in jar.py. We may not want to wrap all functions, especially ones which get called repeatedly. Logging takes time because it reads and writes to the handlers each time the logger is called. Therefore, we should be careful not to use it in functions for which speed is important!

File logging

For file logging, I had to consider that a session will run multiple instances of star and simply adding a logger handler each time one is initialised would continue sending logs to each previous star's log file. Also, the user can create many instances of star whenever they like, which could lead to unexpected results. Instead, I created a context manager which closes the file handler for each star after use. This is used in the decorator @file_logger.listen which can be used to wrap methods in subclasses of file_logger to record the contents to the log file.

class star(plotting, file_logger):
    """ ... """

    def __init__(self, ...):
        ...
        
        super(star, self).__init__(filename="...", level="..."). # introduces log file attribute
        
        with self.log_file:
            # every log in this context is recorded to `self.log_file` (i.e. the log file is 'listening')
            ...

    @file_logger.listen  # wraps the function in the `self.log_file` context (i.e. 'listens' to function logs)
    @debugger
    def run_kde(self, ...):
        """ ... """
        ...

    @file_logger.listen  # nested wrapping is allowed (i.e. `__call__` and `run_kde` both wrapped by `file_logger.listen`
    @debugger
    def __call__(self, ...):
        """ ... """

        self.run_kde(...)
        ...

Everything that goes on under the with is then logged to 'star.log' under the star's path. For session it is logged in 'session.log' under the session's path.

@alexlyttle
Copy link
Collaborator Author

Build is expected to fail in the same way as the main branch (but fixed in PR #255). However, I still need to adapt tests to work with logging, mainly in directory cleanup at the end of tests where the log file needs to also be removed.

@alexlyttle alexlyttle requested a review from nielsenmb February 3, 2021 11:02
@alexlyttle
Copy link
Collaborator Author

Attached is an example of a log file for a session with session_ID = 'test_session'. The user can set session_ID if, e.g. they are running multiple sessions in the same directory and want to keep the log files separate. I have updated the docs with this info. This only logs pbjam.session level logs as the more detailed logs are found in the star log files.

test_session.log

Here is an example log file for a star. This logs everything related to running the star, i.e. all pbjam level logs and below.

KIC4448777.log

@alexlyttle alexlyttle changed the title WIP: Implement logging Implement logging Feb 3, 2021
pbjam/priors.py Outdated
@@ -149,8 +158,13 @@ def _prior_size_check(self, pdata, numax, KDEsize):
idx = np.abs(pdata.numax.values - numax[0]) < nsigma * numax[1]

if not flag_warn:
# If this is a use warning, must give user instructions.
# Otherwise, make this a logger.warning
# Maybe user warning if len(pdata[idx]) == 0?
warnings.warn(f'Only {len(pdata[idx])} star(s) near provided numax. ' +
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be a logger warning.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a unit test which looks for this warning for the case where no stars are found near numax which will fail if we change this to logger.warning. Should we remove that check in the unit test?

pbjam/priors.py Outdated
warnings.warn(f'Sample for estimating KDE is less than the requested {KDEsize}.')
# warnings.warn recommend user change their code but logger.warning does not. Which is best here? I think the former - A. Lyttle
# warnings.warn(f'Sample for estimating KDE is less than the requested {KDEsize}.')
warnings.warn(f'Sample size for estimating KDE is {ntgts}, less than the requested {KDEsize}.') # Add user instruction here, e.g. increase numax uncertainty?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In hindsight its actually a bit misleading to say " less than the requested {KDEsize}" since when running in anything but the lowest level KDEsize can't be changed by the user.

Maybe replace with
'...less than the suggested {KDEsize}. The prior may not have similar stars, or if your uncertainty on numax is <~1% it may be too small.'

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make a change to that. Also, should this be a logger.warning rather than warnings.warn?

@nielsenmb
Copy link
Collaborator

nielsenmb commented Feb 3, 2021 via email

pbjam/session.py Outdated
self.references._addRef(['python', 'pandas', 'numpy', 'astropy',
'lightkurve'])
quarter=None, mission=None, path=None, download_dir=None,
session_ID=None, level='DEBUG'):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'level' as an variable name is ambiguous I think.

Can we replace this with verbosity, loggingLevel or something along those lines?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to change the var name to logging_level, since verbosity could be confused with console output, but this setting only determines the level of the file logging handler.

As an aside: If a user did want to change the console logging level (e.g. to 'DEBUG' level) they could just do:

from pbjam import console_handler

console_handler.setLevel('DEBUG')

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The var name can be more appropriate, up to you.

'level' is just ambiguous is all.

@@ -298,7 +300,7 @@ def _format_col(vardf, col, key):
np.array([_arr_to_lk(x, y, vardf.loc[i, 'ID'], key)]))
vardf[key] = temp
else:
print('Unhandled exception')
logger.critical('Unhandled exception.')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the best message every. I should have changed it long time ago.

Perhaps Unhandled exception when formatting input columns or something like that.

Now that it goes in the logger it's perhaps not that important though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is true, the extra detail may still be useful in the logger. Alternatively we could raise an Error, especially if the unhandled exception causes issues later one?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's probably best to raise an error.

This may actually just be a leftover from before I knew how to raise errors properly.

@nielsenmb
Copy link
Collaborator

I noticed in the example session log that functions that, for example, query MAST or Gaia don't appear in the log.

Would we just have to add the @debugger decorators to these functions?

I just think it would be useful to have a record of I/O events.

@nielsenmb
Copy link
Collaborator

nielsenmb commented Feb 3, 2021 via email

@alexlyttle
Copy link
Collaborator Author

@nielsenmb

I noticed in the example session log that functions that, for example, query MAST or Gaia don't appear in the log.

That's because the session example gives it a teff and bp_rp input. It should log such functions so I will try running without provided colour and teff and attach the log here!

Would we just have to add the @debugger decorators to these functions?

The debugger decorator logs info on entering and exiting the decorated function, so yes if we want to track that info then we should add it. I thought I had added but maybe not to all of them so I'll check :)

@alexlyttle
Copy link
Collaborator Author

alexlyttle commented Feb 3, 2021

I think since we're providing suggestions for the user it should be warnings.warn? (If I understand the protocol correctly). On a related note, are warnings.warn sent to the log file as well?

We could do this with logging.captureWarnings(True) and I'd have to add some code to the file_handler to also capture from the warnings logger (note this will capture warnings from all our dependencies too!). Alternatively, we could get away with changing all our warnings.warn to logger.warning, we can still provide suggestions to the user in these. I think PyMC3 does this, when e.g. your model rhat is bad it gives you a logger warning with suggestions rather than a warnings.warn. I prefer the latter to be honest with you, as it involves less of a janky hack.

Edit: to answer your first question, that is the protocol but in reality it's fine for loggers to provide suggestions to the user. It's more important with the other way round - warnings.warn should always prompt the user to make a change (think of numpy's Runtime Warnings for dividing by 0 or Future Warnings which say "change your code as it wont work in the future).

@nielsenmb
Copy link
Collaborator

note this will capture warnings from all our dependencies too!

This would just get things like the junk that's printed when loading CPnest for example? I think that's fine if we need to do it that way.

I think warnings.warn -> logger.warning is probably fine?

@alexlyttle
Copy link
Collaborator Author

This would just get things like the junk that's printed when loading CPnest for example? I think that's fine if we need to do it that way.

Yeah. All that cpnest stuff does get sent to a log file called cpnest.log in the star directory anyway if we need it.

I think warnings.warn -> logger.warning is probably fine?

I am happy doing this everywhere to be honest. It keeps things consistent.

@alexlyttle
Copy link
Collaborator Author

I noticed in the example session log that functions that, for example, query MAST or Gaia don't appear in the log.

I've added the debugger wrapper everywhere now! The Lightkurve stuff appears in the session log:

test_session.log

And the query MAST stuff appears in the star log:

KIC4448777.log

@nielsenmb
Copy link
Collaborator

I am happy doing this everywhere to be honest. It keeps things consistent.

Agreed

Logs look great!

@alexlyttle alexlyttle requested a review from nielsenmb February 3, 2021 16:05
@alexlyttle
Copy link
Collaborator Author

alexlyttle commented Feb 3, 2021

@nielsenmb for some reason my latest commits aren't showing up. I thought requesting another review may fix it but it hasn't yet (Edit: removed review request now). I'll let you know when it's all ready. I've made suggested changes and its better now. I added a few TODO comments about handling exceptions but I think that should be the subject of another PR!

@alexlyttle
Copy link
Collaborator Author

Commits have come through. Should be ready to merge now if everyone is happy. If you want me to do more though let me know. I think it's at a pretty good stage now and we can always add more in the future!

@alexlyttle alexlyttle removed the request for review from nielsenmb February 3, 2021 16:15
@nielsenmb
Copy link
Collaborator

Agreed. It's ready to merge I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants