Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix redirect behavior of datasets.utils.download_url #3564

Merged
merged 2 commits into from
Mar 15, 2021

Conversation

pmeier
Copy link
Collaborator

@pmeier pmeier commented Mar 15, 2021

Fixes #3560. #3236 introduced redirect resolving in datasets.utils.download_url(). For this requests.get() is used:

def _get_redirect_url(url: str, max_hops: int = 10) -> str:
import requests
for hop in range(max_hops + 1):
response = requests.get(url)
if response.url == url or response.url is None:
return url
url = response.url
else:
raise RecursionError(f"Too many redirects: {max_hops + 1})")

This has two downsides:

  1. requests.get() already downloads the data although we only want to check if a redirect happens. Meaning, we download the data twice. This never stood out in our tests, since we download a small amount of data which leads to a minimal increase in testing time. For large datasets, say STL10 STL10 does not download #3560, this is not acceptable. We should switch to a head requests, e.g. requests.head(), that also carries the redirect information without downloading any data.
  2. Every dataset that has a the download flag is now indirectly dependent on requests without any mention of this in the documentation.

This PR solves both.

@codecov
Copy link

codecov bot commented Mar 15, 2021

Codecov Report

Merging #3564 (6e10f3a) into master (c808d16) will decrease coverage by 0.04%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3564      +/-   ##
==========================================
- Coverage   78.84%   78.79%   -0.05%     
==========================================
  Files         105      105              
  Lines        9755     9756       +1     
  Branches     1568     1568              
==========================================
- Hits         7691     7687       -4     
- Misses       1577     1583       +6     
+ Partials      487      486       -1     
Impacted Files Coverage Δ
torchvision/datasets/utils.py 66.83% <100.00%> (-2.36%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c808d16...6e10f3a. Read the comment docs.

Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@fmassa fmassa merged commit 4560556 into pytorch:master Mar 15, 2021
@pmeier pmeier deleted the fix-redirect-url branch March 15, 2021 11:24
pmeier added a commit to pmeier/vision that referenced this pull request Mar 15, 2021
* use head request for redirects

* remove requests dependency
fmassa pushed a commit that referenced this pull request Mar 15, 2021
* use head request for redirects

* remove requests dependency
facebook-github-bot pushed a commit that referenced this pull request Mar 19, 2021
Summary:
* use head request for redirects

* remove requests dependency

Reviewed By: fmassa

Differential Revision: D27127987

fbshipit-source-id: 097cacc7c472dbe3fb52215313bc8549a3e691e5
@datumbox datumbox added bug and removed fix labels Apr 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

STL10 does not download
5 participants