Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supervised Reconstruction Dataset + bug fixes and formatting #25

Merged
merged 5 commits into from
Feb 13, 2024

Conversation

GabrielBG0
Copy link
Collaborator

No description provided.

@GabrielBG0 GabrielBG0 linked an issue Feb 12, 2024 that may be closed by this pull request
@GabrielBG0 GabrielBG0 requested a review from otavioon February 12, 2024 18:47
Copy link
Collaborator

@otavioon otavioon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to remove the __getitem__ implementation, as it is equivalent to its parent class implementaion. If you wont want, I would suggest to change the return type from __getitem__ to Tuple[Any,Any], as inner type could not be infered yet (depends on readers' return type).

"""
return len(self.readers[0])

def __getitem__(self, index: int) -> Tuple[np.ndarray, np.ndarray]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SimpleDataset already implements the equivaltent __getitem__code to this one.
Thus, you can omit this getitem implementation.
It worth noticing that we cannnot infer the return type (inside the tuple) yet, as it depends of the reader. The second element could be an int instead of np.ndarray, that represents the label, for instance.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal with this dataset is to be as specific as possible, hence the Transform Pipeline and the (exactly) two readers. The implementation is almost the same but it uses only one transform for both data points and it returns a known type as its output (the numpy array tuple), which is better for code downstream in the full pipeline process. Sure, as it is implemented, I can’t be sure what types are returned by the readers but if that’s a problem I would rather put a check to ensure it then to return Any.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @GabrielBG0. I did not see the class name, my bad.

  1. I agree with a more specific implementation. However, I think this class could be a bit more generic. The name suggests that this dataset would only be used for Semantic Segmentation classes, which is not true. In fact, the same implementation here would be used for any other reconstruction task (predicting seismic attributes, seismic facies classification/segmentation, etc). Thus, any task that takes an input and has a target with the same shape as input should subclass this one. Therefore, maybe we can change its name to something like SupervisedReconstructionDataset. What do you think?
  2. I agree with the typing hint, as it is a specific class. However, it is worth notice that, yes, this is the same behavior as in the base class. If you have only one Transform and several readers, the same transform will be applied to all data fetched from the readers (equivalent to the codehere). We can still rewrite this whole __getitem__ impementation to a simple super().__getitem__(index). This would reduce code duplication and unit test cases.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, what do you think about it now?

sslt/data/datasets/supervised_dataset.py Outdated Show resolved Hide resolved
"""
return len(self.readers[0])

def __getitem__(self, index: int) -> Tuple[np.ndarray, np.ndarray]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @GabrielBG0. I did not see the class name, my bad.

  1. I agree with a more specific implementation. However, I think this class could be a bit more generic. The name suggests that this dataset would only be used for Semantic Segmentation classes, which is not true. In fact, the same implementation here would be used for any other reconstruction task (predicting seismic attributes, seismic facies classification/segmentation, etc). Thus, any task that takes an input and has a target with the same shape as input should subclass this one. Therefore, maybe we can change its name to something like SupervisedReconstructionDataset. What do you think?
  2. I agree with the typing hint, as it is a specific class. However, it is worth notice that, yes, this is the same behavior as in the base class. If you have only one Transform and several readers, the same transform will be applied to all data fetched from the readers (equivalent to the codehere). We can still rewrite this whole __getitem__ impementation to a simple super().__getitem__(index). This would reduce code duplication and unit test cases.

sslt/data/datasets/supervised_dataset.py Outdated Show resolved Hide resolved
sslt/data/datasets/supervised_dataset.py Outdated Show resolved Hide resolved
@GabrielBG0 GabrielBG0 requested a review from otavioon February 13, 2024 15:11
@GabrielBG0 GabrielBG0 changed the title Supervised Semantic Segmentation Dataset + bug fixes and formatting Supervised Reconstruction Dataset + bug fixes and formatting Feb 13, 2024
Signed-off-by: Otavio Napoli <otavio.napoli@gmail.com>
@otavioon otavioon requested review from otavioon and removed request for otavioon February 13, 2024 16:14
@otavioon
Copy link
Collaborator

LGTM! @GabrielBG0 I've fixed some typos in class' documentation and add some examples. Please, check if it is OK.

@GabrielBG0 GabrielBG0 merged commit 1aa1f34 into main Feb 13, 2024
1 check passed
@GabrielBG0 GabrielBG0 deleted the 14-supervised-dataset branch February 13, 2024 16:29
@GabrielBG0 GabrielBG0 self-assigned this Feb 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Supervised Reconstruction Dataset
2 participants