Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[doc] Update docstrings/documentations of all the datasets #931

Merged
merged 10 commits into from
Oct 2, 2020

Conversation

mthrok
Copy link
Collaborator

@mthrok mthrok commented Oct 1, 2020

Currently the documentation page does not list constructor arguments or overwrote __getitem__ methods. This PR fixes it.

(Note: the annotation is broken form the beginning and I cannot fix it.)

Example:
Screen Shot 2020-10-01 at 18 10 45

@@ -76,9 +76,15 @@ def load_cmuarctic_item(line: str,


class CMUARCTIC(Dataset):
"""
Create a Dataset for CMU_arctic. Each item is a tuple of the form:
Copy link
Contributor

@mruberry mruberry Oct 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really want to remove the information about the structure of these datasets here? I understand it's going in the getitem documentation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's moved to __getitem__ method.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think describing it in __getitem__ has better locality, which is better for the developer experience. If something is changed on the implementation side, they only need to change the corresponding docstring and there are less chances that there are discrepancy between the docstring description and the actual implementation, compared to the case where the same information are repeated in class description.
Of course, we are having the current docathon to catch such discrepancies, so one can argue that it's okay to have such discrepancies, but it will confuse users on master branch still.

@@ -140,7 +116,7 @@ def _load_tedlium_item(self, fileid: str, line: int, path: str) -> Tuple[Tensor,
path (str): Dataset root path

Returns:
Tedlium_item: A namedTuple containing [waveform, sample_rate, transcript, talk_id, speaker_id, identifier]
tuple: ``(waveform, sample_rate, transcript, talk_id, speaker_id, identifier)``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fix here.

Copy link
Contributor

@mruberry mruberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes look like a significant improvement.

In the future it may be interesting to enumerate all supported inputs for each param along with a brief description of what they do.

Comment on lines +32 to +33
:members:
:special-members: __getitem__
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: out of curiosity, does this change the output?


Args:
root (str): Path to the directory where the dataset is found or downloaded.
url (str, optional): Type of the dataset to dowload. This is **NOT** the actual URL.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same about code path for url

Copy link
Contributor

@vincentqb vincentqb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for updating this, lgtm overall :)

@mthrok mthrok merged commit e3d1d74 into pytorch:master Oct 2, 2020
@mthrok mthrok deleted the doc-datasets branch October 2, 2020 16:50
@mthrok mthrok changed the title Update docstrings/documentations of all the datasets [doc] Update docstrings/documentations of all the datasets Oct 14, 2020
mpc001 pushed a commit to mpc001/audio that referenced this pull request Aug 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants