Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: safely resume interrupted downloads #294
feat: safely resume interrupted downloads #294
Changes from 12 commits
3189ac4
4a37b82
2b14f44
b1895f9
c04d8ba
fee2696
1e81eb7
5dea6ab
8b09434
c29efb2
4967a4e
c4591d8
e28d237
d579918
9a04788
188b484
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What causes this case to happen? Transcoding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is due to retried requests being range requests. For range requests, as noted here, there's no way to detect data corruption for that byte range alone.
Therefore, here we retrieve the expected checksum/checksum object only once for the initial download request. Then we calculate and validate the checksum when the download completes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this happen via a user specifying a generation on the object? Were we not respecting this previously?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep this would happen via a user specifying a generation on the object. Previously, we've been respecting that only through
download.media_url
A property
download._object_generation
is added. It records the object generation from either (1) generation query param from the media_url, or (2) the object generation from the initial response header. This specific line of code does (1) and retrieves it from the media_urlP.S. It's tricky in how limited information is passed from python-storage to resumable-media-python. A resumable-media-python
download
instance only knows the specified object generation from itsmedia_url
, and the "object" itself isn't pertained in a download.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if this should be highlighted as a shortcoming in the decompressive transcoding docs-- not being able to resume a download may be costly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's mentioned in the very bottom section of the decompressive transcoding docs. I agree we can add notes on how retries may be impacted in this sense.