-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BlobWriter should abort multipart upload during exception handling #1228
Comments
Hi @ddelange, thanks for the context I was able to reproduce the failure. From some initial prototyping I think we can resolve this by adding exit() example:def __exit__(self, exc_type, exc_val, traceback):
if exc_type != None:
self._buffer.close()
return False
return True Full repro example:from google.cloud import storage
import http, time
http.client.HTTPConnection.debuglevel=5 # used to review outgoing requests
storage_client = storage.Client()
bucket = storage_client.bucket("bucket")
blob = bucket.blob("test-upload-1228")
with blob.open('wb', chunk_size=1024*1024*2) as fp:
fp.write(b'first chunk') # not yet uploaded
fp.write(b'b' * (1024 * 1024 * 4)) # uploaded
raise BaseException("error") @andrewsg or @cojenco can one of you work to address this issue in fileio? |
nice! why do you wish to return True in the no-exception case? I think |
Primarily a prototyping miss. thanks for pointing that out. After an offline discussion this is a tricky issue in that the feature has been implemented this way for several years and other users would have adopted the existing pattern (complete whatever is written to GCS). @cojenco @andrewsg An alternative would be to use a flag; that fails the object upload on failure unless a failure should cancel out the upload. I naively thought the latter but there are likely consequences in changing this behavior now. |
Does gcs employ some sort of deprecation warning system to phase in such behaviour changes? To make the old behaviour opt-in, and flip that default in the next major version? From the Zen of Python:
I would argue that making the proper behaviour of the with-statement opt-in would be the wrong way around. People exploiting bugs can reasonably expect that bug to be fixed at some point. |
Hi 👋 I've opened a PR to cancel the resumable upload analogous to the implementations in smart_open for azure and s3 (and previously gcs). Please have a look, especially with the feature flag topic above still open for now. |
Hi 👋 Did you happen to discuss #1243 internally in the meantime? |
We did discuss it. I would prefer the behavior you suggest. We do have concerns that it may break applications for anyone who has worked around the existing behavior. We will definitely incorporate this fix into the upcoming major version release. Whether we incorporate it into a minor version release before that is still undecided. Thank you very much for your kind attention. |
Hey 👋 do you think we can do a first review cycle on the PR? |
Hi @ddelange, I'll have a 3.0 (next major version release) branch open next week. Let's retarget this PR into that branch at that time and review then. I'll add a reminder to myself to ping this thread when the branch is open. The docs update we'll need for this change is going to be part of the 3.0 "possibly breaking changes" list so me getting that list started in the README or elsewhere will be the work item that unblocks this PR. |
fixed by #1243 |
Hi 👋
When an error occurs during a multipart upload,
BlobWriter.__exit__
is called with exception context. When that is the case, the multipart upload should be aborted. Currently,BlobWriter.__exit__
blindly callsBlobWriter.close
, which in turn finishes (commits) the multipart upload. The user then ends up with a partial blob on gcs, whereas the multipart upload should have been aborted.Steps to reproduce
BlobWriter
Code example
For reference:
smart_open.s3.MultiPartWriter.__exit__
The text was updated successfully, but these errors were encountered: