Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue all messages using the logging library #250

Merged
merged 11 commits into from
Jan 26, 2022
Merged

Issue all messages using the logging library #250

merged 11 commits into from
Jan 26, 2022

Conversation

H0R5E
Copy link
Contributor

@H0R5E H0R5E commented Jan 17, 2022

This commit changes the issuing of messages to use the Python logging library. Logging in the module now follows this process:

  1. The module logger is initiated on import (without a handler)
  2. The user then has the opportunity to attach a handler to the logger before the pypandoc functions are called.
  3. When a function that issues messages is called, check if the logger has a handler and if not add a console handler, formatted in the same manner as pandoc's.
  4. When log messages are detected on Pandoc's stderr stream, parse them to determine the level and then reissue the log messages at the appropriate level.

Checking and setting up the logging handler is done by the _check_log_handler function (in the handler submodule), while determining the level of the messages issued by Pandoc on stderr is done by the _classify_pandoc_logging function.

Note, all public "quiet" flags are now deprecated, as this can now be handled by configuring the logger.

Closes #249

H0R5E added 4 commits January 17, 2022 15:17
This commit modifies the _convert_input function to print messages issued by pandoc on stderr that have not triggered a RuntimeError. These messages may be warnings, for instance or verbose output (see jgm/pandoc#6628).

This behaviour can be suppressed by passing True to the quite argument of _convert_input. As passthrough argument has also been added to convert, convert_text and convert_file.
At some point the warning issued by the pandoc docx writer for a missing image changes from:

[WARNING] Could not fetch resource 'missing.png': PandocResourceNotFound "missing.png"

to:

[WARNING] Could not fetch resource missing.png: PandocResourceNotFound "missing.png"
@JessicaTegner
Copy link
Owner

JessicaTegner commented Jan 17, 2022

amazing work @H0R5E

I was wondering, if it would be best behavior to initialize a logger instead of just using print. Or would that be overkill.
An alternative could be to use the already imported warnings module.

What do you think?

@H0R5E
Copy link
Contributor Author

H0R5E commented Jan 18, 2022

Thanks @NicklasTegner! I've been thinking about this PR and your suggestions.

Firstly, it struck me that the quiet flag is wrong! You can pass --quiet to pandoc in the extra_args to achieve the same thing, so I think the quiet flag I added should be removed.

Also, I don't think using the warnings module is the right approach because pandoc will put log messages with both INFO and WARNING level onto stderr.

Probably using the Python logging module is the most robust way to handle the stderr stream, but setting it up is a slightly complex process. Basically, I was considering this approach:

  1. Initiate the module logger on import (without a handler)
  2. The user then has the opportunity to attach a handler to the logger before the pypandoc functions are called. The logger could be muted by adding a null handler to it, for instance, or some other handler of choice.
  3. When _convert_input is called check if the logger has a handler and if not add a console handler, formatted in the same manner as pandoc's.
  4. When log messages are detected on stderr, parse them to determine if they are at INFO or WARNING level and then reissue the log messages at the appropriate level.
  5. EDIT: If a normal user wants to mute the pandoc output, they just pass --quiet in the extra_args, as expected.

The questions I have with this approach are "should the other functions that print (namely _ensure_pandoc_path) also use the logger, in which case also have responsibility for initializing it (and sharing the format?)?" and "is it just confusing to have this logging system working alongside the pandoc one?". My attempt at answers to these questions are that _ensure_pandoc_path should just keep printing and we use logging just to handle the pandoc output, and that if we do this right, the normal user won't notice that there is logging going on, while the power user has the opportunity to configure it.

So, in conclusion, I think the quiet flag should be removed either way as it is redundant. I think using the logging module is a good idea (to allow customization), but there are many ways to set it up and I wanted to check with you whether my idea above is sound.

Cheers,

Mat

@JessicaTegner
Copy link
Owner

JessicaTegner commented Jan 18, 2022

@H0R5E
In that case, yes the quiet flag should be removed.

For logging, I would say that your list is a good solution, I would however put all printing (including from _ensure_pandoc_path and others) into the logging as well, at the appropriate levels.

@H0R5E
Copy link
Contributor Author

H0R5E commented Jan 20, 2022

OK @NicklasTegner, I'll give it a try.

H0R5E added 5 commits January 24, 2022 14:15
This commit changes the issuing of messages to use the Python logging library. Logging in the module now follows this process:

1. The module logger is initiated on import (without a handler)
2. The user then has the opportunity to attach a handler to the logger before the pypandoc functions are called.
3. When a function that issues messages is called, check if the logger has a handler and if not add a console handler, formatted in the same manner as pandoc's.
4. When log messages are detected on Pandoc's stderr stream, parse them to determine the level and then reissue the log messages at the appropriate level.

Checking and setting up the logging handler is done by the `_check_log_handler` function, while determining the level of the messages issued by Pandoc on stderr is done by the `_classify_pandoc_logging` function.
This commit moves the remaining print statements, associated to the `download_pandoc` function to log messages instead.

This change required moving the `_check_log_handler` function to its own submodule, `handler`, to avoid circular import.

Also removed all "quiet" flags, as this can now be handled by configuring the logger.
@H0R5E
Copy link
Contributor Author

H0R5E commented Jan 24, 2022

@NicklasTegner, I've made the changes. Everything that printed or wrote to stdout or stderr now adds messages through the pypandoc logger. I've also removed ALL the quiet flags, including the existing ones. These didn't seem to make any sense anymore. For instance, download_pandoc simply muted stdout when the quiet flag was set, but I don't want to manipulate the logging level, because that is something under the user's control. Note that because I have removed the quiet flags this is a breaking change.

@H0R5E H0R5E changed the title Print pandoc output issued on stderr Issue all messages using the logging library Jan 24, 2022
@JessicaTegner
Copy link
Owner

@H0R5E can you explain what exactly is a "breaking change" in this?

@sthagen
Copy link

sthagen commented Jan 25, 2022

Usually any change in the public command line interface (removing a flag that users may have used and now they have to adapt) is a breaking change. But then, people usually embrace change because they want the beneficial parts that should come along with it - maybe just consider when bumping the semantic version.

@JessicaTegner
Copy link
Owner

mmhmm. What do you guys think.

Either just remove the flags outright in this pr, bumping the version to 1.8 when relizing.
Or issuing a warning message, stating that the quiet flag is Deprecated, and will be removed in 1.8, where we will then be able to include this pr in the next minor release, like 1.7.x?

@H0R5E
Copy link
Contributor Author

H0R5E commented Jan 25, 2022

So, to be precise, the breaking change is an API change to the ensure_pandoc_installed function:

-def ensure_pandoc_installed(url=None, targetfolder=None, version="latest", quiet=False, delete_installer=False):
+def ensure_pandoc_installed(url=None, 
+                            targetfolder=None,
+                            version="latest",
+                            delete_installer=False):

And an API change to the download_pandoc function:

-def download_pandoc(url=None, targetfolder=None, version="latest", quiet=False, delete_installer=False, download_folder=None):
+def download_pandoc(url=None,
+                    targetfolder=None,
+                    version="latest",
+                    delete_installer=False,
+                    download_folder=None):

My understanding is that semantic versioning would require that this then be a major version change (i.e. to version 2).

Here are the options I see for finishing this:

  1. Merge the PR as it is and bump to version 2. A major version change is indicative to users of breaking API changes.
  2. Replace the existing quiet flags, but just with deprecation warnings and no functionality. Release this change as a minor release (i.e. 1.8) and then the deprecation warnings would be removed on the next major release (i.e. version 2) - along with the quiet arguments.
  3. Replace the existing quiet flags and re-implement their functionality by adding if not quiet: guards in front of all the relevant logging messages.
  4. EDIT: You could do 2 and 3 together, I guess.

Happy to finish this off whichever way you want.

@H0R5E
Copy link
Contributor Author

H0R5E commented Jan 25, 2022

I think option 2 in my list is my preference, BTW.

@JessicaTegner
Copy link
Owner

I would say go ahead with option 2. Adding a message like:

deprecation Warming: The quiet flag in PyPandoc has been deprecated in favor of logging. See LINK for more information

Just essentially a message, stating that we have removed the functionality in favor of logging with the logging module, and where to find more information e.g the readme.

@H0R5E
Copy link
Contributor Author

H0R5E commented Jan 26, 2022

@NicklasTegner, I think this is ready now.

@JessicaTegner
Copy link
Owner

fantastic work. Good job :) @H0R5E

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Warnings issued on stderr are not reported
3 participants