Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability for S3 commands to increase retry count #1092

Open
jamesls opened this issue Jan 13, 2015 · 15 comments
Open

Add ability for S3 commands to increase retry count #1092

jamesls opened this issue Jan 13, 2015 · 15 comments
Labels
automation-exempt Issue will not be subject to stale-bot community configuration contribution-ready feature-request A feature should be added or improved. p2 This is a standard priority issue s3

Comments

@jamesls
Copy link
Member

jamesls commented Jan 13, 2015

We've seen several issues opened now where, due to a number of variables, the max number of attempts, which is currently 5, is too low. This can be due to a less reliable WAN link, the available resources on the machine running the commands not being sufficient, the parallelism for S3 transfers being too high, etc.

To help with this issue, we should provide some sort of mechanism that allows a user to bump up the retry count. The main use case would be when transferring either a large amount of files or large files. In these scenarios you're more willing to retry as many times as needed to get the request to succeed.

See:

#1065

@ShyneColdchain
Copy link

Correct me if I am wrong - would this potentially help issue of ~190GB upload to s3 bucket in region: us standard via

aws s3 cp DATA.csv s3://BUCKET_NAME/data.csv ?

Creates about 900 parts. It gets to about 15 of 900 parts before failing with:

upload failed: ./DATA.csv to s3://BUCKET_NAME/data.csv
HTTPSConnectionPool(host='BUCKET_NAME.s3.amazonaws.com', port=443): Max retries exceeded with url: /data.csv?partNumber=9&uploadId=CgfYBQnTUBVMCmrdy_uvMXOk0vqQcsBl570rE6LCC7aNzHO8wBtn_Y1A.gkP9A35VLpOruZXD6k9pPBIUNmXsQ-- (Caused by <class 'socket.error'>: [Errno 104] Connection reset by peer)

Thank you.

@ShyneColdchain
Copy link

^ And includes well over 5 retries in --debug (I meant to include that).

@pingaws
Copy link

pingaws commented Feb 17, 2015

Is there any progress or plan about the feature release?

@awsdave awsdave removed the accepted label Mar 26, 2015
@spookylukey
Copy link

For the case of large files, it seems from this line that if any part of an upload fails, the whole thing is cancelled:

https://github.com/aws/aws-cli/blob/develop/awscli/customizations/s3/tasks.py#L259

The problem here is that for an unreliable internet connection (e.g. fails every 10 minutes) and a large file, there is a very high chance that at least one part of a multipart upload is going to fail. This means that the whole upload gets cancelled, i.e. a very low chance of success.

Could these failed parts be re-queued instead of causing cancellation?

@JordonPhillips JordonPhillips changed the title Add ability for S3 commands to increase retry account Add ability for S3 commands to increase retry count Nov 8, 2015
@spookylukey
Copy link

Also looking at the code, it seems there are only retries for downloads, not uploads - https://github.com/aws/aws-cli/blob/develop/awscli/customizations/s3/tasks.py. This means that despite the mulitpart upload feature, large files are very unlikely to succeed if there are issues with the network connection - if any part fails then the whole is cancelled.

@spookylukey
Copy link

I'd be willing to work on this. However, I'd need some guidance:

  1. Uploading needs retry logic adding, as it currently has none. Should we just do what DownloadPartTask does (repeat in a loop), or something else? Should it default to the same number of attempts as DownloadPartTask?

  2. Should there be separate configuration parameters for download retries/upload retries?

  3. Should it be possible to configure infinite retries, and what value should be used for that?

@spookylukey
Copy link

@kyleknap I'm offering to work on this - if someone can answer my questions above, I can get going. There are two separate features I guess:

  1. retries for uploading
  2. configuration for number of uploads.

Do you want me to create a new issue for part 1) ?

@kyleknap
Copy link
Contributor

Here are some responses to your previous question:

  1. So for upload parts we actually do have retry logic, that lives in botocore: https://github.com/boto/botocore/blob/develop/botocore/retryhandler.py. This defaults to 5: https://github.com/boto/botocore/blob/develop/botocore/data/_retry.json#L48 For the download parts though, we have some more retry logic on top of botocore's retry logic, causing the retries to be potentially more than 5.

  2. I think one configuration option would be best here. We see retries happen a lot for multipart copies.

  3. No I do not think that infinite retries should be allowable. For uploads we already do exponential backoff, so the time waiting between retries will get unreasonably long and it should error out.

I think it should be fine to keep tracking this on this issue. No need for a new issue to be opened.

I think being able to hook into the botocore logic that I linked with a value that you can provide for max retries would be the best approach, and I believe that was what James was referring to when he first opened the issue.

@spookylukey
Copy link

Great, thanks so much for the response, hopefully I'll get time to look at this over the Christmas period.

@spookylukey
Copy link

Working out how to configure the max_attempts value is proving quite difficult...

There is no documentation for how to do this kind of thing - https://botocore.readthedocs.org/en/latest/index.html - and I generally have the principle of "docs or it doesn't exist".

But digging deeper, here is the chain I followed:

The config for the retries is loaded from _retry.json, via self._loader.load_data('_retry'). This doesn't seem to give any opportunity for passing in other config, except by additional configuration files (via botocore.loaders.Loader.search_paths

So I can't see any way to configure this programmatically, without changes to botocore.

@spookylukey
Copy link

In case anyone else is looking for a workaround, I've found that the sync command for s3cmd works well.

@thehesiod
Copy link

@spookylukey Mind entering a botocore issue for us? Sounds like this would fit perfectly in the Config class: http://botocore.readthedocs.org/en/latest/reference/config.html

@ASayre
Copy link
Contributor

ASayre commented Feb 6, 2018

Good Morning!

We're closing this issue here on GitHub, as part of our migration to UserVoice for feature requests involving the AWS CLI.

This will let us get the most important features to you, by making it easier to search for and show support for the features you care the most about, without diluting the conversation with bug reports.

As a quick UserVoice primer (if not already familiar): after an idea is posted, people can vote on the ideas, and the product team will be responding directly to the most popular suggestions.

We’ve imported existing feature requests from GitHub - Search for this issue there!

And don't worry, this issue will still exist on GitHub for posterity's sake. As it’s a text-only import of the original post into UserVoice, we’ll still be keeping in mind the comments and discussion that already exist here on the GitHub issue.

GitHub will remain the channel for reporting bugs.

Once again, this issue can now be found by searching for the title on: https://aws.uservoice.com/forums/598381-aws-command-line-interface

-The AWS SDKs & Tools Team

This entry can specifically be found on UserVoice at: https://aws.uservoice.com/forums/598381-aws-command-line-interface/suggestions/33168364-add-ability-for-s3-commands-to-increase-retry-coun

@ASayre ASayre closed this as completed Feb 6, 2018
@jamesls
Copy link
Member Author

jamesls commented Apr 6, 2018

Based on community feedback, we have decided to return feature requests to GitHub issues.

@jamesls jamesls reopened this Apr 6, 2018
@madrobby
Copy link

Hi, is there any movement on this? I have a spotty connection and literally are unable to download any file that’s larger than a few hundred MiB from S3.

@kdaily kdaily added the automation-exempt Issue will not be subject to stale-bot label Jul 29, 2020
@kdaily kdaily added the needs-review This issue or pull request needs review from a core team member. label Sep 3, 2021
@justindho justindho moved this to Contribution Ready in AWS CLI Community Contributions May 11, 2022
@justindho justindho added community contribution-ready and removed needs-review This issue or pull request needs review from a core team member. labels May 11, 2022
@tim-finnigan tim-finnigan added the p2 This is a standard priority issue label Nov 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automation-exempt Issue will not be subject to stale-bot community configuration contribution-ready feature-request A feature should be added or improved. p2 This is a standard priority issue s3
Projects
Status: Contribution Ready
Development

No branches or pull requests