-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add test for ImageNet #976
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, thanks!
I think we don't need to download the file if it's already present in the repo. This would avoid the issue that you are facing, where the dataset is not downloading properly (I think we are taking the wrong download path, and we are downloading html information as well).
The tests fail because the package |
You can install it on CI by adding an entry in Line 35 in 47db963
|
BTW, this is looking great! |
Now only the |
Sounds good to me |
Everything should work as expected now. I've added a The |
Codecov Report
@@ Coverage Diff @@
## master #976 +/- ##
==========================================
+ Coverage 61.16% 62.69% +1.53%
==========================================
Files 65 65
Lines 5091 5080 -11
Branches 764 761 -3
==========================================
+ Hits 3114 3185 +71
+ Misses 1767 1678 -89
- Partials 210 217 +7
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks awesome, thanks a lot for the great PR!
This primarily adds a test for the
ImageNet
dataset based on #966. To achieve this this adds some fake data, which is structured as the real one. During the test only ~ 23 MB will be downloaded.Furthermore, this does the following:
extract_file
anddownload_and_extract
toextract_archive
anddownload_and_extract_archive
to better reflect what they are doingextract_root
parameter todownload_and_extract_archive
in order to separate the download and extract locationToDo:
This breaks thetest_folder.py
. I'll wait for feedback on the other points before I correct this.Currently the the download URLs for the fake data are set to my fork. They need to be updated if this gets merged.For whatever reason the fake data archives get corrupted somewhere in the up- / download process. If I download them via the GitHub GUI,ILSVRC2012_img_train.tar
is about 10 KB and can be extracted. If I do the same withdownload_url
orwget
the archive is about 66 KB. Can someone reproduce this? Does someone know what I'm doing wrong here?