-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: safely resume interrupted downloads #294
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple thoughts, generally looking really good to me!
General comment, can you give some more details about how you tested this out with the emulator?
# data corruption for that byte range alone. | ||
if self._expected_checksum is None and self._checksum_object is None: | ||
# `_get_expected_checksum()` may return None even if a checksum was | ||
# requested, in which case it will emit an info log _MISSING_CHECKSUM. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What causes this case to happen? Transcoding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is due to retried requests being range requests. For range requests, as noted here, there's no way to detect data corruption for that byte range alone.
Therefore, here we retrieve the expected checksum/checksum object only once for the initial download request. Then we calculate and validate the checksum when the download completes.
@@ -160,13 +171,48 @@ def consume( | |||
if self._stream is not None: | |||
request_kwargs["stream"] = True | |||
|
|||
# Assign object generation if generation is specified in the media url. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this happen via a user specifying a generation on the object? Were we not respecting this previously?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep this would happen via a user specifying a generation on the object. Previously, we've been respecting that only through download.media_url
A property download._object_generation
is added. It records the object generation from either (1) generation query param from the media_url, or (2) the object generation from the initial response header. This specific line of code does (1) and retrieves it from the media_url
P.S. It's tricky in how limited information is passed from python-storage to resumable-media-python. A resumable-media-python download
instance only knows the specified object generation from its media_url
, and the "object" itself isn't pertained in a download.
|
||
self._process_response(result) | ||
|
||
# With decompressive transcoding, GCS serves back the whole file regardless of the range request, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if this should be highlighted as a shortcoming in the decompressive transcoding docs-- not being able to resume a download may be costly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's mentioned in the very bottom section of the decompressive transcoding docs. I agree we can add notes on how retries may be impacted in this sense.
Thanks for the review! I've added data integrity checks and test cases to the retry conf test (open PR). The changes in this PR are tested against the testbench using above-mentioned tests. Before the changes, conformance tests fail as below. The conf tests pass running locally against the changes made in this PR.
|
This is looking really good in general. Based on offline discussion I would recommend moving the decompressive transcoding feature to a TODO and moving ahead with the rest of this PR. There may be some details that take a while to resolve for transcoding and it's important that we still move ahead with the rest of this PR which is a major fix to retry logic for downloads. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM pending @tritone comment resolutions. Thank you!
Thanks Chris and Andrew! I've moved the transcoding feature, tracking in #303 |
If a retryable error occurs mid-download, the download starts sending data to the stream from the
offset_of_last_byte_received
rather than starting from the beginning of the file, and resolves data integrity issues.object_generation
andbytes downloaded
Fixes #284