Add ability for S3 commands to increase retry count #1092

jamesls · 2015-01-13T21:54:05Z

We've seen several issues opened now where, due to a number of variables, the max number of attempts, which is currently 5, is too low. This can be due to a less reliable WAN link, the available resources on the machine running the commands not being sufficient, the parallelism for S3 transfers being too high, etc.

To help with this issue, we should provide some sort of mechanism that allows a user to bump up the retry count. The main use case would be when transferring either a large amount of files or large files. In these scenarios you're more willing to retry as many times as needed to get the request to succeed.

See:

#1065

ShyneColdchain · 2015-01-13T22:42:25Z

Correct me if I am wrong - would this potentially help issue of ~190GB upload to s3 bucket in region: us standard via

aws s3 cp DATA.csv s3://BUCKET_NAME/data.csv ?

Creates about 900 parts. It gets to about 15 of 900 parts before failing with:

upload failed: ./DATA.csv to s3://BUCKET_NAME/data.csv
HTTPSConnectionPool(host='BUCKET_NAME.s3.amazonaws.com', port=443): Max retries exceeded with url: /data.csv?partNumber=9&uploadId=CgfYBQnTUBVMCmrdy_uvMXOk0vqQcsBl570rE6LCC7aNzHO8wBtn_Y1A.gkP9A35VLpOruZXD6k9pPBIUNmXsQ-- (Caused by <class 'socket.error'>: [Errno 104] Connection reset by peer)

Thank you.

ShyneColdchain · 2015-01-13T23:14:59Z

^ And includes well over 5 retries in --debug (I meant to include that).

pingaws · 2015-02-17T06:53:36Z

Is there any progress or plan about the feature release?

spookylukey · 2015-11-07T17:40:17Z

For the case of large files, it seems from this line that if any part of an upload fails, the whole thing is cancelled:

https://github.com/aws/aws-cli/blob/develop/awscli/customizations/s3/tasks.py#L259

The problem here is that for an unreliable internet connection (e.g. fails every 10 minutes) and a large file, there is a very high chance that at least one part of a multipart upload is going to fail. This means that the whole upload gets cancelled, i.e. a very low chance of success.

Could these failed parts be re-queued instead of causing cancellation?

spookylukey · 2015-11-08T09:22:20Z

Also looking at the code, it seems there are only retries for downloads, not uploads - https://github.com/aws/aws-cli/blob/develop/awscli/customizations/s3/tasks.py. This means that despite the mulitpart upload feature, large files are very unlikely to succeed if there are issues with the network connection - if any part fails then the whole is cancelled.

spookylukey · 2015-11-08T15:41:44Z

I'd be willing to work on this. However, I'd need some guidance:

Uploading needs retry logic adding, as it currently has none. Should we just do what DownloadPartTask does (repeat in a loop), or something else? Should it default to the same number of attempts as DownloadPartTask?
Should there be separate configuration parameters for download retries/upload retries?
Should it be possible to configure infinite retries, and what value should be used for that?

spookylukey · 2015-12-16T22:16:10Z

@kyleknap I'm offering to work on this - if someone can answer my questions above, I can get going. There are two separate features I guess:

retries for uploading
configuration for number of uploads.

Do you want me to create a new issue for part 1) ?

kyleknap · 2015-12-17T00:27:59Z

Here are some responses to your previous question:

So for upload parts we actually do have retry logic, that lives in botocore: https://github.com/boto/botocore/blob/develop/botocore/retryhandler.py. This defaults to 5: https://github.com/boto/botocore/blob/develop/botocore/data/_retry.json#L48 For the download parts though, we have some more retry logic on top of botocore's retry logic, causing the retries to be potentially more than 5.
I think one configuration option would be best here. We see retries happen a lot for multipart copies.
No I do not think that infinite retries should be allowable. For uploads we already do exponential backoff, so the time waiting between retries will get unreasonably long and it should error out.

I think it should be fine to keep tracking this on this issue. No need for a new issue to be opened.

I think being able to hook into the botocore logic that I linked with a value that you can provide for max retries would be the best approach, and I believe that was what James was referring to when he first opened the issue.

spookylukey · 2015-12-17T09:13:35Z

Great, thanks so much for the response, hopefully I'll get time to look at this over the Christmas period.

spookylukey · 2015-12-28T19:49:12Z

Working out how to configure the max_attempts value is proving quite difficult...

There is no documentation for how to do this kind of thing - https://botocore.readthedocs.org/en/latest/index.html - and I generally have the principle of "docs or it doesn't exist".

But digging deeper, here is the chain I followed:

create_checker_from_retry_config https://github.com/boto/botocore/blob/develop/botocore/retryhandler.py#L98
create_retry_handler https://github.com/boto/botocore/blob/develop/botocore/retryhandler.py#L72
ClientCreator._register_retries https://github.com/boto/botocore/blob/develop/botocore/client.py#L103

The config for the retries is loaded from _retry.json, via self._loader.load_data('_retry'). This doesn't seem to give any opportunity for passing in other config, except by additional configuration files (via botocore.loaders.Loader.search_paths

So I can't see any way to configure this programmatically, without changes to botocore.

spookylukey · 2016-01-29T08:09:04Z

In case anyone else is looking for a workaround, I've found that the sync command for s3cmd works well.

thehesiod · 2016-04-04T20:11:06Z

@spookylukey Mind entering a botocore issue for us? Sounds like this would fit perfectly in the Config class: http://botocore.readthedocs.org/en/latest/reference/config.html

ASayre · 2018-02-06T10:17:58Z

Good Morning!

We're closing this issue here on GitHub, as part of our migration to UserVoice for feature requests involving the AWS CLI.

This will let us get the most important features to you, by making it easier to search for and show support for the features you care the most about, without diluting the conversation with bug reports.

As a quick UserVoice primer (if not already familiar): after an idea is posted, people can vote on the ideas, and the product team will be responding directly to the most popular suggestions.

We’ve imported existing feature requests from GitHub - Search for this issue there!

And don't worry, this issue will still exist on GitHub for posterity's sake. As it’s a text-only import of the original post into UserVoice, we’ll still be keeping in mind the comments and discussion that already exist here on the GitHub issue.

GitHub will remain the channel for reporting bugs.

Once again, this issue can now be found by searching for the title on: https://aws.uservoice.com/forums/598381-aws-command-line-interface

-The AWS SDKs & Tools Team

This entry can specifically be found on UserVoice at: https://aws.uservoice.com/forums/598381-aws-command-line-interface/suggestions/33168364-add-ability-for-s3-commands-to-increase-retry-coun

jamesls · 2018-04-06T22:29:50Z

Based on community feedback, we have decided to return feature requests to GitHub issues.

madrobby · 2020-01-16T16:47:42Z

Hi, is there any movement on this? I have a spotty connection and literally are unable to download any file that’s larger than a few hundred MiB from S3.

jamesls added accepted feature-request A feature should be added or improved. s3 labels Jan 13, 2015

jamesls mentioned this issue Jan 13, 2015

RequestTimeTooSkewed when uploading large files? #1065

Closed

This was referenced Jan 14, 2015

s3: Max retries exceeded with url #928

Closed

Using aws s3 cp in multiple process/threading #1018

Closed

awsdave removed the accepted label Mar 26, 2015

JordonPhillips changed the title ~~Add ability for S3 commands to increase retry account~~ Add ability for S3 commands to increase retry count Nov 8, 2015

kyleknap added the configuration label Nov 19, 2015

thehesiod mentioned this issue Apr 4, 2016

Timeout and retry fixes aio-libs/aiobotocore#44

Closed

tamsky mentioned this issue Jun 24, 2016

uploading file to s3 results in error 104 (connection reset) boto/boto#2207

Open

ASayre closed this as completed Feb 6, 2018

jamesls reopened this Apr 6, 2018

kdaily added the automation-exempt Issue will not be subject to stale-bot label Jul 29, 2020

kdaily added the needs-review This issue or pull request needs review from a core team member. label Sep 3, 2021

justindho moved this to Contribution Ready in AWS CLI Community Contributions May 11, 2022

justindho added this to AWS CLI Community Contributions May 11, 2022

justindho added community contribution-ready and removed needs-review This issue or pull request needs review from a core team member. labels May 11, 2022

tim-finnigan mentioned this issue Nov 4, 2022

s3 sync: retry behavior; push last retry to bottom of queue #2617

Closed

tim-finnigan added the p2 This is a standard priority issue label Nov 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ability for S3 commands to increase retry count #1092

Add ability for S3 commands to increase retry count #1092

jamesls commented Jan 13, 2015

ShyneColdchain commented Jan 13, 2015

ShyneColdchain commented Jan 13, 2015

pingaws commented Feb 17, 2015

spookylukey commented Nov 7, 2015

spookylukey commented Nov 8, 2015

spookylukey commented Nov 8, 2015

spookylukey commented Dec 16, 2015

kyleknap commented Dec 17, 2015

spookylukey commented Dec 17, 2015

spookylukey commented Dec 28, 2015

spookylukey commented Jan 29, 2016

thehesiod commented Apr 4, 2016

ASayre commented Feb 6, 2018 •

edited

Loading

jamesls commented Apr 6, 2018

madrobby commented Jan 16, 2020

Add ability for S3 commands to increase retry count #1092

Add ability for S3 commands to increase retry count #1092

Comments

jamesls commented Jan 13, 2015

ShyneColdchain commented Jan 13, 2015

ShyneColdchain commented Jan 13, 2015

pingaws commented Feb 17, 2015

spookylukey commented Nov 7, 2015

spookylukey commented Nov 8, 2015

spookylukey commented Nov 8, 2015

spookylukey commented Dec 16, 2015

kyleknap commented Dec 17, 2015

spookylukey commented Dec 17, 2015

spookylukey commented Dec 28, 2015

spookylukey commented Jan 29, 2016

thehesiod commented Apr 4, 2016

ASayre commented Feb 6, 2018 • edited Loading

jamesls commented Apr 6, 2018

madrobby commented Jan 16, 2020

ASayre commented Feb 6, 2018 •

edited

Loading