-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flaky behavior when downloading Google drive files #2992
Comments
IMO, this could be the offender. If I remember correctly, the @jgbradley1 Right now, https://docs.google.com/uc?export=download&id=0B6eKvaijfFUDQUUwd21EckhUbWs&confirm=o7Z2 (the URL we request from you sample above), does not trigger the daily quota exceeded. Could you ping me as fast as possible if you see this again, so I can take a look? |
@pmeier on MiniImageNet I have this issue reproduced every time on my environment (Google Drive Daily quota shouldn't be exceeded)
This code downloads MiniImageNet dataset (the dataset itself is ~1Gb worth of data). If I comment out |
@pmeier I can confirm I'm seeing the same behavior as @KhabarlakKonstantin for mini imagenet. At this time, the mini imagenet file has not exceeded the threshold. The real issue here then is how we're checking for a quota exceeded error. A string search is just not going to be a viable solution for large files (1+ GB). According to the Google Drive API docs, the http status code should be Modifying
Using my modified function above, I am able to download mini imagenet and see the progress bar. |
The mini imagenet file has now exceeded the threshold for today when downloading programatically. Here are some observations of what I can see.
Output of
I can still manually download the file from here using a browser. Perhaps Google Drive allows more browser-initiated downloads than programmatic-initiated downloads. The most interesting behavior about this is the reported status code will switch randomly between Final conclusion: the function |
True. I didn't think of this when I implemented the check. We should use
This was the first thing I checked and I always got 200 back. I only went for checking the response content after that.
True again, but since it makes normal operation impossible, this is no solution. I'm going to revert the commit for now and send a proper fix later. |
Still suffering from this problem when downloading CUB_200_2011 datasets from https://drive.google.com/file/d/1hbzc_P1FuxMkcabkgn9ZKinBwW683j45 While downloading from a browser (Chrome) is okay. |
@jgbradley1 I see, thanks for your notice! |
🐛 Bug
This is a rather difficult bug to diagnose because certain internet activity must be present. The issue is with
torhvision.datasets.utils.download_file_from_google_drive()
. It does not gracefully handle large files that have exceeded their daily download quota.To Reproduce
The following two prerequisites must be met in order to detect this issue.
will lead to the python session getting killed.
The python process hangs on the call to
torchvision.datasets.utils._quota_exceeded(...)
. My best guess is the code in this function is performing a string search that is either inefficient or causing python to search the entire data payload (resulting in a timeout).Expected behavior
Calling
download_file_from_google_drive(...)
should not kill the session when download quota thresholds have been met on large files.Environment
cc @pmeier
The text was updated successfully, but these errors were encountered: