Standardize Python Documentation styles #3003

datumbox · 2020-11-13T16:49:27Z

📚 Documentation

~~Currently~~ TorchVision uses different styles for documenting functions. Here are a few:

Lines 116 to 130 in 973db14

    
           def write_png(input: torch.Tensor, filename: str, compression_level: int = 6): 
        
               """ 
        
               Takes an input tensor in CHW layout (or HW in the case of grayscale images) 
        
               and saves it in a PNG file. 
        
               Parameters 
        
               ---------- 
        
               input: Tensor[channels, image_height, image_width] 
        
                   int8 image tensor of `c` channels, where `c` must be 1 or 3. 
        
               filename: str 
        
                   Path to save the image. 
        
               compression_level: int 
        
                   Compression factor for the resulting file, it must be a number 
        
                   between 0 and 9. Default: 6 
        
               """

vision/torchvision/io/image.py

Lines 135 to 144 in 973db14

    
           def decode_jpeg(input: torch.Tensor) -> torch.Tensor: 
        
               """ 
        
               Decodes a JPEG image into a 3 dimensional RGB Tensor. 
        
               The values of the output tensor are uint8 between 0 and 255. 
        
               Arguments: 
        
                   input (Tensor[1]): a one dimensional uint8 tensor containing 
        
               the raw bytes of the JPEG image. 
        
               Returns: 
        
                   output (Tensor[3, image_height, image_width]) 
        
               """

vision/torchvision/models/mobilenet.py

Lines 15 to 25 in 973db14

    
           def _make_divisible(v: float, divisor: int, min_value: Optional[int] = None) -> int: 
        
               """ 
        
               This function is taken from the original tf repo. 
        
               It ensures that all layers have a channel number that is divisible by 8 
        
               It can be seen here: 
        
               https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py 
        
               :param v: 
        
               :param divisor: 
        
               :param min_value: 
        
               :return: 
        
               """

vision/torchvision/transforms/transforms.py

Lines 195 to 210 in 973db14

    
           class Normalize(torch.nn.Module): 
        
               """Normalize a tensor image with mean and standard deviation. 
        
               Given mean: ``(mean[1],...,mean[n])`` and std: ``(std[1],..,std[n])`` for ``n`` 
        
               channels, this transform will normalize each channel of the input 
        
               ``torch.*Tensor`` i.e., 
        
               ``output[channel] = (input[channel] - mean[channel]) / std[channel]`` 
        
               .. note:: 
        
                   This transform acts out of place, i.e., it does not mutate the input tensor. 
        
               Args: 
        
                   mean (sequence): Sequence of means for each channel. 
        
                   std (sequence): Sequence of standard deviations for each channel. 
        
                   inplace(bool,optional): Bool to make this operation in-place. 
        
               """

It's worth selecting one style, standardizing all pydocs and putting a CI test to ensure new PR follows the standard.

Edit: A series of PRs have manually resolved many of the inconsistencies described here. What's missing is adding a CI test to verify that new PRs follow the same standard.

PyTorch seems to have done work towards this direction, so it's worth investigating if we should adopt their solution. See comment: #3259 (comment)

Another alternative is to adopt an open-source docstring linter such as darglint. It seems that it is capable of validating google-style docstrings, it's configurable via setup.cfg and available via pip.

pip install darglint

# setup.cfg
[darglint]
strictness=short
ignore=DAR003,DAR004,DAR005,DAR103,DAR104,DAR105,DAR203,DAR301,DAR302,DAR401,DAR402,DAR501
enable=DAR001,DAR002,DAR101,DAR102,DAR201,DAR202

Unfortunately it's quite slow at the moment:

Not parallelised run:
darglint torchvision

325.77 sec

Parallelised run:
SECONDS=0 ; find torchvision -name "*.py" | xargs -n 1 -P 8 darglint ; echo $SECONDS

119 sec

cc @seemethere @vfdev-5 @fmassa

The text was updated successfully, but these errors were encountered:

fmassa · 2020-11-16T13:51:57Z

Good catch!

We should follow the documentation style from PyTorch, which uses Args: most of the time, although even PyTorch is not consistent there

cpuhrsch · 2020-11-16T17:11:14Z

cc @brianjo

brianjo · 2020-11-16T17:18:09Z

I'll see if I can find any guidance we have for this and I'll write some up, if I can't find anything. Thanks!

NicolasHug · 2021-01-22T17:31:10Z

I've been investigating this a bit and here's what I'm thinking. We want to:

make sure that the Args: format is used (i.e. google style)
make sure that a function's signature is in sync with its documented parameters (i.e. that all parameters are properly documented)

can be done by adding

napoleon_numpy_docstring = False
napoleon_google_docstring = True

in docs/conf.py. I made some experiments in #3278 and sphinx will properly issue a warning if we use the following syntax:

Parameters
----------

We can turn sphinx warnings into error as suggested in #3259 (comment) so that the CI can enforce such checks. Note that turning sphinx warnings into errors means that the CI will start failing with other kinds of doc-related issues: this isn't a big deal IMHO because there are few of these: only 37 sphinx warnings are raised as-of now, so they don't appear too often.

can be solved by using pydocstyle and by adding the following line in setup.cfg:

[pydocstyle]
ignore=D100,D101,D102,D103,D104,D105,D106,D107,D200,D201,D202,D203,D204,D205,D206,D207,D208,D209,D210,D211,D212,D213,D214,D215,D300,D301,D302,D400,D401,D401,D402,D403,D404,D405,D406,D407,D408,D409,D410,D411,D412,D413,D414,D415,D416

This deactivate all checks but this one: D417 | Missing argument descriptions in the docstring. Shoudl we want to, we may want to stop ignoring other stuff in the future (?)

pydocstyle takes <= 5 seconds to run on torchvision and if it ever becomes too slow we can always just run it on the diff on each PR.

May I suggest to proceed with the following:

Fix current sphinx warnings and update conf.py as above
Use sphinx picky mode and start raising errors on warnings
Fix all of the pydocstyle D417 issues (there are just 20 of them so one PR should be enough) and introduce a new CI check using pydocstyle that would then run on all PRs

Thoughts @datumbox @fmassa ? I'm happy to follow up with PRs if you're happy with this solution

datumbox · 2021-01-22T18:15:36Z

@NicolasHug Great analysis and proposals, thanks for looking into it.

@mattip Any thoughts on the above?

mattip · 2021-01-23T18:36:59Z

+1 for changing the conf.py and fixing the sphinx errors, then tun on the nitpicky mode. What is the overlap between the warnings from sphinx and the warnings from pydocstyle? If sphinx makes the pydocstyle tool redundant, then maybe it is not needed?

NicolasHug · 2021-01-24T11:43:26Z

There are some redundancy (like checking for unexpected indentation), but as far as I can tell sphinx doesn't issue a warning when a parameter is missing from a function's docstring (item 2 above), which is what pydocstyle can be useful for

pmeier · 2021-01-26T15:49:24Z

flake8 (which we are already using) has an max-doc-length option, which works similar as max-line-length but for docstrings.

fmassa · 2021-01-27T14:03:05Z

This all sounds great to me

datumbox mentioned this issue Nov 13, 2020

[BC-breaking] ColorJitter gets its random params by calling get_params() #3001

Merged

vfdev-5 added the module: documentation label Nov 13, 2020

datumbox added enhancement module: ci labels Dec 1, 2020

datumbox mentioned this issue Dec 22, 2020

[*.py] Rename "Arguments:" to "Args:" #3203

Merged

NicolasHug mentioned this issue Jan 20, 2021

Unify parameters formatting in docstrings #3268

Merged

datumbox mentioned this issue Jan 22, 2021

build and upload docs from circleci #3259

Merged

datumbox closed this as completed in #3268 Jan 22, 2021

datumbox reopened this Jan 22, 2021

NicolasHug mentioned this issue Jan 25, 2021

Fix sphinx warnings and turn warnings into errors #3290

Merged

datumbox closed this as completed May 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Standardize Python Documentation styles #3003

Standardize Python Documentation styles #3003

datumbox commented Nov 13, 2020 •

edited

Loading

fmassa commented Nov 16, 2020 •

edited

Loading

cpuhrsch commented Nov 16, 2020

brianjo commented Nov 16, 2020

NicolasHug commented Jan 22, 2021

datumbox commented Jan 22, 2021

mattip commented Jan 23, 2021

NicolasHug commented Jan 24, 2021

pmeier commented Jan 26, 2021

fmassa commented Jan 27, 2021

Standardize Python Documentation styles #3003

Standardize Python Documentation styles #3003

Comments

datumbox commented Nov 13, 2020 • edited Loading

📚 Documentation

fmassa commented Nov 16, 2020 • edited Loading

cpuhrsch commented Nov 16, 2020

brianjo commented Nov 16, 2020

NicolasHug commented Jan 22, 2021

datumbox commented Jan 22, 2021

mattip commented Jan 23, 2021

NicolasHug commented Jan 24, 2021

pmeier commented Jan 26, 2021

fmassa commented Jan 27, 2021

datumbox commented Nov 13, 2020 •

edited

Loading

fmassa commented Nov 16, 2020 •

edited

Loading