Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add test for ImageNet #976

Merged
merged 11 commits into from
Jun 3, 2019
Merged

[WIP] Add test for ImageNet #976

merged 11 commits into from
Jun 3, 2019

Conversation

pmeier
Copy link
Collaborator

@pmeier pmeier commented May 31, 2019

This primarily adds a test for the ImageNet dataset based on #966. To achieve this this adds some fake data, which is structured as the real one. During the test only ~ 23 MB will be downloaded.

Furthermore, this does the following:

  • rename extract_file and download_and_extract to extract_archive and download_and_extract_archive to better reflect what they are doing
  • Add extract_root parameter to download_and_extract_archive in order to separate the download and extract location

ToDo:

  • This breaks the test_folder.py. I'll wait for feedback on the other points before I correct this.
  • Currently the the download URLs for the fake data are set to my fork. They need to be updated if this gets merged.
  • For whatever reason the fake data archives get corrupted somewhere in the up- / download process. If I download them via the GitHub GUI, ILSVRC2012_img_train.tar is about 10 KB and can be extracted. If I do the same with download_url or wget the archive is about 66 KB. Can someone reproduce this? Does someone know what I'm doing wrong here?

@pmeier pmeier changed the title Add test for ImageNet [WIP] Add test for ImageNet May 31, 2019
Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thanks!

I think we don't need to download the file if it's already present in the repo. This would avoid the issue that you are facing, where the dataset is not downloading properly (I think we are taking the wrong download path, and we are downloading html information as well).

test/test_datasets.py Outdated Show resolved Hide resolved
@pmeier
Copy link
Collaborator Author

pmeier commented Jun 3, 2019

The tests fail because the package mock is not available. If I understood it correctly mock is part of unittest as of python 3.3. Thus, if we want the tests to be BC, we need to add it to the packages that the CI installs. How can I add it?

@fmassa
Copy link
Member

fmassa commented Jun 3, 2019

we need to add it to the packages that the CI installs. How can I add it?

You can install it on CI by adding an entry in

- pip install future

@fmassa
Copy link
Member

fmassa commented Jun 3, 2019

BTW, this is looking great!

@pmeier
Copy link
Collaborator Author

pmeier commented Jun 3, 2019

Now only the test_folder.py is failing. One one hand it tests the functionality of the ImageFolder dataset but on the other hand also the transform and target_transform. I would rename it to test_dataset_transforms.py and move the test of ImageFolder to test_datasets.py. Is that OK?

@fmassa
Copy link
Member

fmassa commented Jun 3, 2019

I would rename it to test_dataset_transforms.py and move the test of ImageFolder to test_datasets.py. Is that OK?

Sounds good to me

@pmeier
Copy link
Collaborator Author

pmeier commented Jun 3, 2019

Everything should work as expected now. I've added a contextmanager for easier handling of the tmp_dir.


The test_datasets_transforms only tests the ImageFolder dataset. For the future we need some generalisation of the tests in order to avoid creating the same tests over and over gain for every dataset. I'll try to come up with something.

@codecov-io
Copy link

codecov-io commented Jun 3, 2019

Codecov Report

Merging #976 into master will increase coverage by 1.53%.
The diff coverage is 71.79%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #976      +/-   ##
==========================================
+ Coverage   61.16%   62.69%   +1.53%     
==========================================
  Files          65       65              
  Lines        5091     5080      -11     
  Branches      764      761       -3     
==========================================
+ Hits         3114     3185      +71     
+ Misses       1767     1678      -89     
- Partials      210      217       +7
Impacted Files Coverage Δ
torchvision/datasets/imagenet.py 87.62% <100%> (+66.07%) ⬆️
torchvision/datasets/caltech.py 20.65% <25%> (ø) ⬆️
torchvision/datasets/cifar.py 38.2% <50%> (ø) ⬆️
torchvision/datasets/mnist.py 63.23% <50%> (ø) ⬆️
torchvision/datasets/omniglot.py 32% <50%> (ø) ⬆️
torchvision/datasets/stl10.py 26.59% <50%> (ø) ⬆️
torchvision/datasets/utils.py 61.48% <82.35%> (+2.42%) ⬆️
torchvision/transforms/transforms.py 81.89% <0%> (-0.65%) ⬇️
torchvision/models/detection/faster_rcnn.py 74.39% <0%> (ø) ⬆️
torchvision/__init__.py 68.75% <0%> (+6.25%) ⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 220b69b...8496947. Read the comment docs.

Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome, thanks a lot for the great PR!

@fmassa fmassa merged commit 7716aba into pytorch:master Jun 3, 2019
@pmeier pmeier deleted the imagenet_test branch June 4, 2019 06:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants