Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry Uploads when one multipart failed #321

Open
planetahuevo opened this issue Sep 20, 2022 · 3 comments
Open

Retry Uploads when one multipart failed #321

planetahuevo opened this issue Sep 20, 2022 · 3 comments

Comments

@planetahuevo
Copy link
Contributor

I am getting some errors with big uploads:

Exception 'phpbu\App\Backup\Sync\Exception' with message 'An exception occurred while uploading parts to a multipart upload. The following parts had errors:
    - Part 249: Error executing "UploadPart" on "https://s3.eu-central-003.backblazeb2.com/filenameedited.gz?partNumber=249&uploadId=editedc003_v0312018_t0025_u01663705241225"; AWS HTTP error: Server error: `PUT https://s3.eu-central-003.backblazeb2.com/filenameedited.gz?partNumber=249&uploadId=editedc003_v0312018_t0025_u01663705241225` resulted in a `500 Internal Server Error` response:
    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <Error>
        <Code>InternalError</Code>
        <Message>An internal  (truncated...)
     InternalError (server): An internal error occurred.  Please retry your upload. - <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <Error>
        <Code>InternalError</Code>
        <Message>An internal error occurred.  Please retry your upload.</Message>
    </Error>
  
    - Part 253: Error executing "UploadPart" on "https://s3.eu-central-003.backblazeb2.com/filenameedited.gz?partNumber=253&uploadId=editedc003_v0312018_t0025_u01663705241225"; AWS HTTP error: Server error: `PUT https://s3.eu-central-003.backblazeb2.com/filenameedited.gz?partNumber=253&uploadId=editedc003_v0312018_t0025_u01663705241225` resulted in a `500 Internal Server Error` response:
    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <Error>
        <Code>InternalError</Code>
        <Message>An internal  (truncated...)
     InternalError (server): An internal error occurred.  Please retry your upload. - <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <Error>
        <Code>InternalError</Code>
        <Message>An internal error occurred.  Please retry your upload.</Message>
    </Error>
  
    - Part 271: Error executing "UploadPart" on "https://s3.eu-central-003.backblazeb2.com/filenameedited.gz?partNumber=271&uploadId=editedc003_v0312018_t0025_u01663705241225"; AWS HTTP error: Server error: `PUT https://s3.eu-central-003.backblazeb2.com/filenameedited.gz?partNumber=271&uploadId=editedc003_v0312018_t0025_u01663705241225` resulted in a `500 Internal Server Error` response:
    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <Error>
        <Code>InternalError</Code>
        <Message>An internal  (truncated...)
     InternalError (server): An internal error occurred.  Please retry your upload. - <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <Error>
        <Code>InternalError</Code>
        <Message>An internal error occurred.  Please retry your upload.</Message>
    </Error>
  
    - Part 297: Error executing "UploadPart" on "https://s3.eu-central-003.backblazeb2.com/filenameedited.gz?partNumber=297&uploadId=editedc003_v0312018_t0025_u01663705241225"; AWS HTTP error: Server error: `PUT https://s3.eu-central-003.backblazeb2.com/filenameedited.gz?partNumber=297&uploadId=editedc003_v0312018_t0025_u01663705241225` resulted in a `500 Internal Server Error` response:
    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <Error>
        <Code>InternalError</Code>
        <Message>An internal  (truncated...)
     InternalError (server): An internal error occurred.  Please retry your upload. - <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <Error>
        <Code>InternalError</Code>
        <Message>An internal error occurred.  Please retry your upload.</Message>
    </Error>
  
    - Part 333: Error executing "UploadPart" on "https://s3.eu-central-003.backblazeb2.com/filenameedited.gz?partNumber=333&uploadId=editedc003_v0312018_t0025_u01663705241225"; AWS HTTP error: Server error: `PUT https://s3.eu-central-003.backblazeb2.com/filenameedited.gz?partNumber=333&uploadId=editedc003_v0312018_t0025_u01663705241225` resulted in a `500 Internal Server Error` response:
    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <Error>
        <Code>InternalError</Code>
        <Message>An internal  (truncated...)
     InternalError (server): An internal error occurred.  Please retry your upload. - <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <Error>
        <Code>InternalError</Code>
        <Message>An internal error occurred.  Please retry your upload.</Message>
    </Error>
  
    - Part 377: Error executing "UploadPart" on "https://s3.eu-central-003.backblazeb2.com/filenameedited.gz?partNumber=377&uploadId=editedc003_v0312018_t0025_u01663705241225"; AWS HTTP error: Server error: `PUT https://s3.eu-central-003.backblazeb2.com/filenameedited.gz?partNumber=377&uploadId=editedc003_v0312018_t0025_u01663705241225` resulted in a `503 Service Unavailable` response:
    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <Error>
        <Code>ServiceUnavailable</Code>
        <Message>no tome (truncated...)
     ServiceUnavailable (server): no tomes available - <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <Error>
        <Code>ServiceUnavailable</Code>
        <Message>no tomes available</Message>
    </Error>
  
    '
    in phar:///usr/local/bin/phpbu/Backup/Sync/AmazonS3v3.php:72

I think this is not phpbu fault, but it would be great if we can retry the uploads when the error is 503 or 500.
At the moment, the whole backup fails because of this, and with a retry after 5-10 seconds I am sure it will work again.

I think this is critical to work with medium-big size backups. In my case this was a 3GB backup, so not that huge.

@planetahuevo
Copy link
Contributor Author

Related question.
Will multipart upload be automatically selected when the file is bigger than 5GB?
What is the size of the parts?

@planetahuevo
Copy link
Contributor Author

planetahuevo commented Sep 20, 2022

This is what b2 recommends: https://www.backblaze.com/blog/b2-503-500-server-error/
And this is S3:
https://aws.amazon.com/premiumsupport/knowledge-center/http-5xx-errors-s3/#:~:text=The%20error%20code%20500%20Internal,high%2C%20exceeding%20the%20request%20rate.

I think customers using the S3 API won't have to worry about this discussion. :-)

When using the S3 API Backblaze will not return that particular error code for that situation (asking customer code to get a new upload URL, since Amazon S3 has no other upload URL, the client side could not possibly "fix" anything, so the Backblaze S3 layer doesn't have the ability to return this particular HTTP response code, correct). It is an anomaly of implementing the lower cost (to Backblaze) B2 API that didn't have the load balancer layer in it.

This is a side note (doesn't affect the answer): I am kind of concerned about the quality of client side customer code out there that cannot handle intermittent failures. The most conservative, best code uploading to Amazon S3 in the Amazon S3 datacenters (Backblaze not involved in any way, shape, or form) has to be able to handle rare failures where it retries the transmission when it gets an HTTP response that isn't a "200 - ok". And the difference between "rare" and "common" escapes me from a code standpoint. Put differently, it is more terrifying to me that the code MIGHT fail or corrupt data if the network experiences over 1% network failure than if the code always fails or always works. I hope that made sense.

I guess what I'm worried about is this scenario: some program out there not handling any errors at all in their upload code, and what they are banking on is that if tonight's backup is totally corrupted due to 1 network error, that tomorrow's backup will work because it will have zero network errors. And then what occurs is a bad network cable causing 50 network errors per day prevents them from ever getting an uncorrupted backup. I hope that made sense as a concern.

@planetahuevo
Copy link
Contributor Author

@sebastianfeldmann any ideas?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant