Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding feature to intercept errors and save information to files #45

Merged
merged 12 commits into from
Feb 21, 2024

Conversation

dchandan
Copy link
Collaborator

Adding feature to intercept errors encountered while (i) generating STAC items, (ii) posting STAC items to the server. Information about the dataset and the error it encountered are saved to separate log files. An example log file from a run of the stac-populator is included below.

stac-populator_CMIP6populator_errors_20240119-185007.txt

@huard
Copy link
Collaborator

huard commented Jan 22, 2024

I think this PR branches off the other one about the crawler. I'll wait for you to merge the first one before reviewing.

@dchandan
Copy link
Collaborator Author

I've merged it now.

Copy link
Collaborator

@huard huard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe dumb question, but why are we using an "error file" in the first place and not just using logger to log errors to file ?

@dchandan
Copy link
Collaborator Author

No, that's a good question. (i) I didn't know how to add a common file handler to the multiple loggers that are generated in the get_logger function, (ii) the log files can get very busy with other outputs and I thought it might be useful to have a separate clean files with error data that I can then work on fixing (I could always just ack or sed the log files, but for the moment this just felt more useful for the next reason), (iii) I am hoping that we can write some logic for the ErrorLoader that allows us to feed this separate error file (whose format can evolve to include more information) into a new run of stac-populator so that the populator only works on these error files rather than all files in a catalog (even if data is not posted for existing items, working on an entire whole catalog is still slow because all items have to be crawled through).

Thoughts, comments, suggestions?

@huard
Copy link
Collaborator

huard commented Jan 22, 2024

It's possible to configure the LOGGER object to write both to the terminal and to a file on disk. So if that meets the requirements, it could be an option.

In another project I've used this:

def init_log():
    logger = logging.getLogger(__name__)
    logger.setLevel(logging.ERROR)

    fh = logging.FileHandler('log/run_MAGICC.log')
    fh.setLevel(logging.INFO)
    logger.addHandler(fh)
    return logger 

which logs ERRORs to the terminal, but stores all INFO messages to a file.

Copy link
Collaborator

@fmigneault fmigneault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like @huard mentioned, I think it would work better if the logger was reused.
One way to implement it would be to add multiple log handlers.
This way, you only need to call log.<level>() methods once, and each handler performs whatever it needs to do with the record respectively.

logger = get_logger("stac-populator")
log_handler = logging.StreamHandler(sys.stderr)
log_handler.setFormatter(logging._defaultFormatter)  # or whatever
err_handler = logging.FileHandler("<file-path>")
err_handler.setFormatter(logging.Formatter(fmt="{stac_item_name}: {stac_item_loc}"))
logger.addHandler(log_handler)
logger.addHandler(err_handler)

logger.debug("Custom message: {stac_item_name}: {stac_item_loc}", stac_item_name, stac_item_loc)

A custom formatter could be defined to ignore the records when required arguments (stac_item_name, stac_item_loc, etc.) are missing. Just need to define the emit method that handles the log record.

STACpopulator/implementations/CMIP6_UofT/add_CMIP6.py Outdated Show resolved Hide resolved
STACpopulator/api_requests.py Outdated Show resolved Hide resolved
STACpopulator/populator_base.py Outdated Show resolved Hide resolved
STACpopulator/populator_base.py Outdated Show resolved Hide resolved
@fmigneault
Copy link
Collaborator

@dchandan
A timely video that came up in my watch list today: https://www.youtube.com/watch?v=9L77QExPmI0
See JSON logs and extra parameter timestamps for some inspiration.

@dchandan
Copy link
Collaborator Author

Thanks for the comments and the useful video on the log file. I do like the JSON lines output format shown in the video as it would make parsing the log files for issues much easier. I've now revamped the whole app logging to use the setup in the video.

STACpopulator/populator_base.py Outdated Show resolved Hide resolved
STACpopulator/populator_base.py Outdated Show resolved Hide resolved
STACpopulator/populator_base.py Outdated Show resolved Hide resolved
STACpopulator/populator_base.py Outdated Show resolved Hide resolved
STACpopulator/logging.py Outdated Show resolved Hide resolved
@dchandan
Copy link
Collaborator Author

Is it good to merge now?

)
parser.add_argument("--debug", action="store_true", help="Set logger level to debug")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be in the parent argparser (generic one before specific implementation is called).
ns.debug will not exist if using other implementations than CMIP6

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I've moved it. I'm not sure I like the current position much. Right now, one has to do:

stac-populator --debug run CMIP6_UofT stac-host catalog-url --auth-handler cookie --auth-identity ~/daccs_cookie.txt

I'd prefer if one could instead just do:

stac-populator run CMIP6_UofT stac-host catalog-url --auth-handler cookie --auth-identity ~/daccs_cookie.txt --debug

By the way, this might be related to #46. I noticed that there is something off with the CLI parser system you've put into place. I'll let you look into it since you put it in and hence you probably understand it better than me.

Copy link
Collaborator

@fmigneault fmigneault Feb 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To allow stac-populator run CMIP6_UofT --debug ..., you could define the logging options as their own log_parser = argparse.ArgumentParser(...) and pass that log_parser in parents:

parents=[populator_parser],

Need to pass it down also to parser_maker so it can add log_parser to its own parents as well.

populator_parser = parser_maker()

Copy link
Collaborator

@fmigneault fmigneault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dchandan
Good to merge, but I let you decide if you want to try out #45 (comment) before or not.

@dchandan
Copy link
Collaborator Author

@fmigneault
Ok, we can do that later. Kinda pressed for time rn, so we can circle back to this later. I'll make an issue of this, so it is documented.

@dchandan dchandan merged commit ba179dc into master Feb 21, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants