-
-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Review and document conventions that are in use #419
Comments
This sounds similar to our experience too! We have a documentation guide which includes (or will likely soon include) some special cases that aren't covered in the numpydoc standard. Having more detail in the community standard would also greatly help with linters and auto-formatters.
Good point — I think I'm going to add this to our documentation guide as well. When we do a nitpicky doc build, enclosing variables in single backticks leads to a warning which then will cause our doc build to fail.
We do
It would be incredibly helpful for the numpydoc standard to include how to describe shape information. I was actually about to raise an issue on this very topic when I saw this issue, and will probably need to add this to our doc guide as well.
Agreed again: more examples for rare cases would be great! I wonder if it would be helpful to create a separate issue for each of the topics here, since it'd make it easier to work on them individually. Thank you for bringing all of this up! |
Yes, I think that would be quite useful. And this repo is the right place for it I'd say.
That is correct.
I think that the person who wrote that used code as the default. Single backticks in numpydoc are first of all used for cross-linking; you'd also use it to link to other functions in the namespace or with intersphinx. I believe that was the intent here, but the machinery was never completed so we ended up with italics in numpy and scipy, which is not great. Improvements welcome; ideally without large-scale reformatting in numpy and scipy though.
Problems here include:
Yes, this would be super helpful. |
The mental model that I've developed from both working directly on numpydoc and with numpydoc across multiple projects is that the numpydoc style guide comprises both a docstring standard and a style guide. Some of the information in the style guide defines a standard - for example, the docstring sections must have the exact names as prescribed, or else the docstring will not be parsed/rendered properly. However there are other suggestions that are more related to style and have nothing to do with the successful parsing/rendering of a docstring (e.g. "The length of docstring lines should be kept to 75 characters to facilitate reading the docstrings in text terminals." - n.b. this advice is not really followed in any project IME, including numpy and scipy themselves). To this I would add a third category, which is advice that has become the de facto standard in large scientific Python libraries, but which the numpydoc extension itself does not implement correctly. The most obvious example here is the "Enclose variables in single backticks" advice. As noted in the above discussion and in documentation issues throughout the ecosystem (see e.g. numpy/numpy#17714), this directly collides with the default role in sphinx which leads to excessive broken links among other things. To summarize, the numpydoc style guide is really (IMO) a confusing mix of standard and style guide and I agree that this is a hindrance to docstring standardization across libraries. The situation gets even more confusing when you consider the One proposal for moving forwardI think it would be very useful to review the numpydoc style guide, especially with input from consumer libraries that are outside of numpy/scipy where the standard is largely applied by default, even parts that are not necessarily documented. I'd advocate for:
|
This sounds great, could you @rossbar or someone who is already familiar with numpydoc internals and validation get started on the first bullet point? I am not sure I'd be of much use here. In the meantime I'll continue gathering info on the styles used where I think I'll be able to help. For example, regarding single backtings, double backticks (code formatting) or asterisks (italics) both Python and matplotlib use asterisks and it is documented (not sure how enforced though). Therefore, if there aren't significant parts of the numpy codebase using double backticks (maybe from expecting code formatting as @rgommers mentioned but seeing it wasn't working) it might be possible to start using * on all new docstrings and "broken" ones like the changes in the PR you linked and progressively move away from single backticks in this role. I can open specific issues to discuss each of the potential changes/things not being followed and then PRs once there is some consensus |
I'd be happy to review things - @Carreau 's insight would also be very relevant here as I know he has been running into (and fixing) corner cases in many libraries.
Given that the "single backticks for parameters" rule has been extensively adopted, I'd be much more in favor of fixing this in numpydoc rather than moving away from it. |
I agree that using single backticks is quite common, but if that were to become part of numpydoc then we'd need to fix the autolink or anyNumPy, SciPy and scikit-image use
which doesn't seem like the expected output as differs from the previous case and in addition links to things that are potentially unrelated such as function names (linking to itself when referencing the return value is common for example) or glossary terms (which can even be on an intersphinx glossary). As I mentioned above using the Single backticks are also used to reference the output values. The description of So in addition to forcing a value for the objNetworkX uses Note: matplotlib also uses obj as default role while at the same time using
NonePandas doesn't set the code or literalipython and scikit-learn (since 2019, before they used Note: ArviZ also uses |
The thing is, it is already recommended by
I think there's a way to add the necessary functionality to the numpydoc extension without clashing too severely with the default role. This is something I've been thinking about for a while and I think is a nice alternative to #303 that fixes all the issues and is flexible enough to not cause any major problems/breaks. I haven't had the time to actually implement/test the idea though. |
This sounds amazing! I am guessing it would then also be possible to customize the formatting so for example, projects that want so can continue to use code formatting for parameters. Would that be in use throughout all the docstring or only for parameter descriptions? And would it be possible for this to work on other pages? (context: I was very confused when first reading https://numpy.org/doc/stable/reference/ufuncs.html and I think it would benefit from something like this even if not technically a docstring)
True, sorry. I am a bit too focused on the edge cases. |
I have to take more time to re-read all of that, but I have been working on trying to auto-fix some of the numpydoc syntax (https://github.com/Carreau/velin), and in my experience, numpydoc allow a bit too much flexibility in its syntax, which lead to a lot of confusion. Numpydoc is not only used by Sphinx; https://github.com/Jupyter/papyri understand it as well, and things like backtick to link variable works. So I'd love for this discussion to not take only sphinx in consideration. IMHO this should be decomposed into two steps: numpydoc to some semantic repr, and then how sphinx renders this repr. But we should not take into consideration that One other thing to keep in mind with any change, is that currently there are still a lot of tools that display the raw docstring when users ask for help on a function. I personally would be happy for the "core" of the numpydoc style guide to be moved to a SPEC, so that each project can link to it. |
One thing I would like to change is the convention that the short summary end in a period (I see this in Examples:
The above read like headings more than complete sentences. To make them complete, one would have to add qualifiers such as
and
both of which feel redundant (or may get you over the dreaded 80-character line limit). Since it's required to separate the summary from the body with a new line anyway, do we have to use a period? Or should the convention be that the short summary is a complete sentence? I'm fine with either option, but I would prefer the guidelines recommend something to establish consistency. |
Agreed. AFAICT this isn't recommended in the style guide, but there is a validation check for it. However I would treat the validation check as something users can adopt if they want, but not necessarily recommended. |
A second thing I would like to gain consensus on is whether docstring guides should use python """
Parameters
------------
param: description instead of type
""" e.g. pandas/compat/pickle_compat.py def load(fh, encoding: str | None = None, is_verbose: bool = False):
"""
Load a pickle, with a provided encoding,
Parameters
----------
fh : a filelike object
encoding : an optional encoding
is_verbose : show exception output
""" should instead be def load(fh, encoding: str | None = None, is_verbose: bool = False):
"""
Load a pickle, with a provided encoding,
Parameters
----------
fh : Pathlike
a filelike object
encoding : str or None, default=None
an optional encoding
is_verbose : bool, default=False
If ``True``, show exception output
""" the above is still not fully type hinted (e.g. Inconsistent uses like this make linting challenging (see I also think we should allow using: param1: list[str]
param2: int | bool | None
param3: tuple[int, int, int] in addition to param1: list of str
param2: int, bool, or None
param3: tuple of int In the last case ( |
I am not sure if here is a good place of if it would be better to open a topic on https://discuss.scientific-python.org/. Happy to move it somewhere else if that isn't the best venue.
I am under the impression that most projects that are part of scientific-python and/or pydata ecosystem follow numpydoc without significant deviations. What varies I think is more the level of success especially if you move outside the core projects. I think one of the reasons for that is that the numpydoc style guide is not a comprehensive collection of all the rules in convention that are actually in use throughout the ecosystem (or even within numpy itself). I think missing conventions are more common, but some of the guidance is also not used or realistic. I am also not sure about how extended or de facto part of numpydoc or numpy/scipy codebase some of these conventions are which makes it hard to know if these are indeed conventions that should be in numpydoc or things that look the same by chance/contributor overlap. I'll add some examples below, but the main question is:
Would an effort to check the current style guide and find common ground with other projects be welcome?
Context: I have been trying for a while to improve the documentation of ArviZ and PyMC libraries and trying to follow numpydoc and rest of conventions used in the pydata ecosystem. I would love to simply link numpydoc, not write a whole new doc extending/reprating numpydoc (with the exception of 2-3 paragraphs at most about project specific htings like aliases defined in
numpydoc_xref_aliases
) and forcing contributors to navigate multiple docs to write docstrings. I also think that is numpydoc's goal, not to be only numpy/scipy specific, but I am not sure about this and if so it would be great to review it and to make sure it is up to date and as comprehensive as possible. I have added examples that I have gathered during that process (some of them below), but as I said in the beginning, the goal is not so much about the specific examples but about making sure I am on the right page about numpydoc's goals and scope and if so we can find the best way to go about it.Some examples
Short summary
Its description is:
which is followed by an example using function
add(a, b)
described as "The sum of two numbers.".The not use of variable names is generally followed, not using the function name however is not used and I don't think is realistic. e.g. most of the short summary sections in the linear algebra module do use the function name, virtually all methods in the random generator class have a short summary that is "Draw samples from the distribution.". And not using the function/method name would probably be less clear than using it, they refer to technical terms without any synonym available, and they can't be defined in the short summary either.
Application/scope of the doc
I was able to find project specific docstring convention pages for pandas or matplotlib but not for numpy nor scipy, they only had links to numpydoc. Am I right to assume that numpydoc alone is the official convention description for all of numpy and scipy?
There are a couple places where only numpy is mentioned such as the deprecation warning section, but most importantly, there also seems to be numpy infrastructure information (I think) in the style guide like the last couple sentences in the examples section. If I understand correctly, omitting the numpy import is only an option in numpy but not scipy docstrings for example, but I don't really know what to make of the auto-use of the plot directive when matplotlib is imported, is that part of numpydoc?
The parameter section also says "Enclose variables in single backticks" but the behaviour of single backtick enclosure in sphinx is defined by the value of
default_role
inconf.py
. Is that intended? In general this will default to italics, but it would become code formatting for projects using "code" as default role and what might be even worse, could easily be rendered as links to the glossary if using "autolink" as default role. e.g.axis
is a common parameter name and a term in numpy's glossary which is probably added to intersphinx by most projects and therefore a valid autolink key; that would mean that in the descriptions, all parameters would be rendered as italics except the axis one that would be a link (I have no background, but the fact that all parameters in https://numpy.org/devdocs/reference/generated/numpy.average.html are correctly enclosed within single backticks except foraxis
might point to this).Parameters
The
array_like
alias is used in passing only in the parameters section and explained at the very bottom of the page in the "other points to keep in mind". However, this "type" is key to many docstrings and it seems to have multiple extra conventions associated to it (shape, dtype) which are not documented in the parameters nor other points sections. The page in matplotlib's docs explains some of these conventions which seem to be the ones as used by numpy (in the linalg module for example, the input of the cholesky function is described asa : (…, M, M) array_like
). Is this something that should be part of numpydoc or something that is technically matplotlib specific and happened due to maintainer overlap or something like this? Other modules like the polynomial use slightly different convention:array_like, shape (M,)
, are/should both be part of numpydoc? a bit like the multipledefault
options allowedIt might also help to add further type examples inside numpydoc itself (maybe they can go inside a dropdown to not take too much space. numpydoc says:
but there are cases I am not sure what should be the way to go when writing a docstring or don't know what to answer when someone else asks me about it. pandas for example has some extra examples that have been helpful in such situations:
and so does matplotlib, with some extra conventions too such as "Use
(float, float)
to [...] the parentheses should be included to make the tuple-ness more obvious.". Which is also used for example in numpy.histogram to describe the range parameter. Do you know if this parentheses -> tuple-ness link is more extended in numpy/scipy docs? Should it be part of numpydoc or a point of divergence by matplotlib? Is thetuple of (float, float)
more common?The text was updated successfully, but these errors were encountered: