Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve testing of datasets #3375

Closed
pmeier opened this issue Feb 11, 2021 · 2 comments
Closed

Improve testing of datasets #3375

pmeier opened this issue Feb 11, 2021 · 2 comments

Comments

@pmeier
Copy link
Collaborator

pmeier commented Feb 11, 2021

One reason we lag behind with the datasets tests (see #963 (comment)) is that right now writing a test is lengthy and complicated. To overcome this, I'm proposing to add a well-documented DatasetTestCase superclass. With minimal user interference it should be able to automatically test

  1. if the dataset inherits from VisionDataset,
  2. if the dataset raises an error in case it does not exist or is corrupted,
  3. if applicable test if the transform, target_transform, and transforms are applied properly,
  4. if repr(dataset) can be built, and
  5. if len(dataset) matches the number of fake examples (see below).

Of course the contributor is free to test more things that go beyond the above list, but in general this should cover most cases for most datasets. The only thing that needs to implement manually is the generation of fake data in the respective structure of the dataset. This can be aided by providing utility functions to create an image, a folder of images and so on.

Within the test we then simply mock out the download and extraction functions and thus make the dataset accept the fake data. Given that this logic is rarely in excess of 10 LOC and uses mostly functions from torchvision.datasets.utils, IMO it is a good tradeoff towards simpler tests.

cc @pmeier

@fmassa
Copy link
Member

fmassa commented Feb 11, 2021

I think this is great!

One thing we should always enforce for testing as well is that __getitem__ works and gives the expected result for mocked data.

@pmeier
Copy link
Collaborator Author

pmeier commented Feb 21, 2021

Closed in #3402

@pmeier pmeier closed this as completed Feb 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants