Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize Python Documentation styles #3003

Closed
datumbox opened this issue Nov 13, 2020 · 9 comments · Fixed by #3268
Closed

Standardize Python Documentation styles #3003

datumbox opened this issue Nov 13, 2020 · 9 comments · Fixed by #3268

Comments

@datumbox
Copy link
Contributor

datumbox commented Nov 13, 2020

📚 Documentation

Currently TorchVision uses different styles for documenting functions. Here are a few:

def write_png(input: torch.Tensor, filename: str, compression_level: int = 6):
"""
Takes an input tensor in CHW layout (or HW in the case of grayscale images)
and saves it in a PNG file.
Parameters
----------
input: Tensor[channels, image_height, image_width]
int8 image tensor of `c` channels, where `c` must be 1 or 3.
filename: str
Path to save the image.
compression_level: int
Compression factor for the resulting file, it must be a number
between 0 and 9. Default: 6
"""

def decode_jpeg(input: torch.Tensor) -> torch.Tensor:
"""
Decodes a JPEG image into a 3 dimensional RGB Tensor.
The values of the output tensor are uint8 between 0 and 255.
Arguments:
input (Tensor[1]): a one dimensional uint8 tensor containing
the raw bytes of the JPEG image.
Returns:
output (Tensor[3, image_height, image_width])
"""

def _make_divisible(v: float, divisor: int, min_value: Optional[int] = None) -> int:
"""
This function is taken from the original tf repo.
It ensures that all layers have a channel number that is divisible by 8
It can be seen here:
https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
:param v:
:param divisor:
:param min_value:
:return:
"""

class Normalize(torch.nn.Module):
"""Normalize a tensor image with mean and standard deviation.
Given mean: ``(mean[1],...,mean[n])`` and std: ``(std[1],..,std[n])`` for ``n``
channels, this transform will normalize each channel of the input
``torch.*Tensor`` i.e.,
``output[channel] = (input[channel] - mean[channel]) / std[channel]``
.. note::
This transform acts out of place, i.e., it does not mutate the input tensor.
Args:
mean (sequence): Sequence of means for each channel.
std (sequence): Sequence of standard deviations for each channel.
inplace(bool,optional): Bool to make this operation in-place.
"""

It's worth selecting one style, standardizing all pydocs and putting a CI test to ensure new PR follows the standard.


Edit: A series of PRs have manually resolved many of the inconsistencies described here. What's missing is adding a CI test to verify that new PRs follow the same standard.

PyTorch seems to have done work towards this direction, so it's worth investigating if we should adopt their solution. See comment: #3259 (comment)

Another alternative is to adopt an open-source docstring linter such as darglint. It seems that it is capable of validating google-style docstrings, it's configurable via setup.cfg and available via pip.

pip install darglint

# setup.cfg
[darglint]
strictness=short
ignore=DAR003,DAR004,DAR005,DAR103,DAR104,DAR105,DAR203,DAR301,DAR302,DAR401,DAR402,DAR501
enable=DAR001,DAR002,DAR101,DAR102,DAR201,DAR202

Unfortunately it's quite slow at the moment:

Not parallelised run:
darglint torchvision

325.77 sec

Parallelised run:
SECONDS=0 ; find torchvision -name "*.py" | xargs -n 1 -P 8 darglint ; echo $SECONDS

119 sec

cc @seemethere @vfdev-5 @fmassa

@fmassa
Copy link
Member

fmassa commented Nov 16, 2020

Good catch!

We should follow the documentation style from PyTorch, which uses Args: most of the time, although even PyTorch is not consistent there

@cpuhrsch
Copy link
Contributor

cc @brianjo

@brianjo
Copy link
Contributor

brianjo commented Nov 16, 2020

I'll see if I can find any guidance we have for this and I'll write some up, if I can't find anything. Thanks!

@NicolasHug
Copy link
Member

I've been investigating this a bit and here's what I'm thinking. We want to:

  1. make sure that the Args: format is used (i.e. google style)
  2. make sure that a function's signature is in sync with its documented parameters (i.e. that all parameters are properly documented)
  1. can be done by adding
napoleon_numpy_docstring = False
napoleon_google_docstring = True

in docs/conf.py. I made some experiments in #3278 and sphinx will properly issue a warning if we use the following syntax:

Parameters
----------

We can turn sphinx warnings into error as suggested in #3259 (comment) so that the CI can enforce such checks. Note that turning sphinx warnings into errors means that the CI will start failing with other kinds of doc-related issues: this isn't a big deal IMHO because there are few of these: only 37 sphinx warnings are raised as-of now, so they don't appear too often.

  1. can be solved by using pydocstyle and by adding the following line in setup.cfg:
[pydocstyle]
ignore=D100,D101,D102,D103,D104,D105,D106,D107,D200,D201,D202,D203,D204,D205,D206,D207,D208,D209,D210,D211,D212,D213,D214,D215,D300,D301,D302,D400,D401,D401,D402,D403,D404,D405,D406,D407,D408,D409,D410,D411,D412,D413,D414,D415,D416

This deactivate all checks but this one: D417 | Missing argument descriptions in the docstring. Shoudl we want to, we may want to stop ignoring other stuff in the future (?)

pydocstyle takes <= 5 seconds to run on torchvision and if it ever becomes too slow we can always just run it on the diff on each PR.

May I suggest to proceed with the following:

  • Fix current sphinx warnings and update conf.py as above
  • Use sphinx picky mode and start raising errors on warnings
  • Fix all of the pydocstyle D417 issues (there are just 20 of them so one PR should be enough) and introduce a new CI check using pydocstyle that would then run on all PRs

Thoughts @datumbox @fmassa ? I'm happy to follow up with PRs if you're happy with this solution

@datumbox
Copy link
Contributor Author

@NicolasHug Great analysis and proposals, thanks for looking into it.

@mattip Any thoughts on the above?

@mattip
Copy link
Contributor

mattip commented Jan 23, 2021

+1 for changing the conf.py and fixing the sphinx errors, then tun on the nitpicky mode. What is the overlap between the warnings from sphinx and the warnings from pydocstyle? If sphinx makes the pydocstyle tool redundant, then maybe it is not needed?

@NicolasHug
Copy link
Member

There are some redundancy (like checking for unexpected indentation), but as far as I can tell sphinx doesn't issue a warning when a parameter is missing from a function's docstring (item 2 above), which is what pydocstyle can be useful for

@pmeier
Copy link
Collaborator

pmeier commented Jan 26, 2021

flake8 (which we are already using) has an max-doc-length option, which works similar as max-line-length but for docstrings.

@fmassa
Copy link
Member

fmassa commented Jan 27, 2021

This all sounds great to me

@datumbox datumbox closed this as completed May 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants