You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One reason we lag behind with the datasets tests (see #963 (comment)) is that right now writing a test is lengthy and complicated. To overcome this, I'm proposing to add a well-documented DatasetTestCase superclass. With minimal user interference it should be able to automatically test
if the dataset inherits from VisionDataset,
if the dataset raises an error in case it does not exist or is corrupted,
if applicable test if the transform, target_transform, and transforms are applied properly,
if repr(dataset) can be built, and
if len(dataset) matches the number of fake examples (see below).
Of course the contributor is free to test more things that go beyond the above list, but in general this should cover most cases for most datasets. The only thing that needs to implement manually is the generation of fake data in the respective structure of the dataset. This can be aided by providing utility functions to create an image, a folder of images and so on.
Within the test we then simply mock out the download and extraction functions and thus make the dataset accept the fake data. Given that this logic is rarely in excess of 10 LOC and uses mostly functions from torchvision.datasets.utils, IMO it is a good tradeoff towards simpler tests.
One reason we lag behind with the datasets tests (see #963 (comment)) is that right now writing a test is lengthy and complicated. To overcome this, I'm proposing to add a well-documented
DatasetTestCase
superclass. With minimal user interference it should be able to automatically testVisionDataset
,transform
,target_transform
, andtransforms
are applied properly,repr(dataset)
can be built, andlen(dataset)
matches the number of fake examples (see below).Of course the contributor is free to test more things that go beyond the above list, but in general this should cover most cases for most datasets. The only thing that needs to implement manually is the generation of fake data in the respective structure of the dataset. This can be aided by providing utility functions to create an image, a folder of images and so on.
Within the test we then simply mock out the download and extraction functions and thus make the dataset accept the fake data. Given that this logic is rarely in excess of 10 LOC and uses mostly functions from
torchvision.datasets.utils
, IMO it is a good tradeoff towards simpler tests.cc @pmeier
The text was updated successfully, but these errors were encountered: