-
Notifications
You must be signed in to change notification settings - Fork 657
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[doc] Update docstrings/documentations of all the datasets #931
Conversation
@@ -76,9 +76,15 @@ def load_cmuarctic_item(line: str, | |||
|
|||
|
|||
class CMUARCTIC(Dataset): | |||
""" | |||
Create a Dataset for CMU_arctic. Each item is a tuple of the form: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you really want to remove the information about the structure of these datasets here? I understand it's going in the getitem documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's moved to __getitem__
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think describing it in __getitem__
has better locality, which is better for the developer experience. If something is changed on the implementation side, they only need to change the corresponding docstring and there are less chances that there are discrepancy between the docstring description and the actual implementation, compared to the case where the same information are repeated in class description.
Of course, we are having the current docathon to catch such discrepancies, so one can argue that it's okay to have such discrepancies, but it will confuse users on master branch still.
@@ -140,7 +116,7 @@ def _load_tedlium_item(self, fileid: str, line: int, path: str) -> Tuple[Tensor, | |||
path (str): Dataset root path | |||
|
|||
Returns: | |||
Tedlium_item: A namedTuple containing [waveform, sample_rate, transcript, talk_id, speaker_id, identifier] | |||
tuple: ``(waveform, sample_rate, transcript, talk_id, speaker_id, identifier)`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice fix here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changes look like a significant improvement.
In the future it may be interesting to enumerate all supported inputs for each param along with a brief description of what they do.
:members: | ||
:special-members: __getitem__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: out of curiosity, does this change the output?
torchaudio/datasets/libritts.py
Outdated
|
||
Args: | ||
root (str): Path to the directory where the dataset is found or downloaded. | ||
url (str, optional): Type of the dataset to dowload. This is **NOT** the actual URL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same about code path for url
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for updating this, lgtm overall :)
Add profiling tracer example
Currently the documentation page does not list constructor arguments or overwrote
__getitem__
methods. This PR fixes it.(Note: the annotation is broken form the beginning and I cannot fix it.)
Example: