Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add config variables for changing S3 runtime values #1122

Closed
wants to merge 12 commits into from

Conversation

jamesls
Copy link
Member

@jamesls jamesls commented Feb 3, 2015

This pull requests adds support for the following parameters in the ~/.aws/config file:

  • max_concurrent_requests: The maximum number of concurrent transfer requests to S3 at any given moment.
  • max_queue_size: The maximum number of transfer tasks queued at any given point in time.
  • multipart_threshold: The size at which we switch to multipart uploads/downlaods. The appropriate values are specified below.
  • multipart_chunk_size: The chunk size to use when using multipart transfers.

A sample config looks like:

[profile development]
aws_access_key_id = foo
aws_secret_access_key = bar
region = us-west-2
s3 =
  max_concurrent_requests = 20
  max_queue_size = 10000
  multipart_threshold = 128MB
  multipart_chunk_size = 64MB

This addresses several needs we've heard including:

This still needs documentation, but the code is ready for review.

Code Overview

While working on this code, there were parts of the code I updated before making any functional changes. These include things like pep8 formatting, removing code that's no longer called, improving some of the semantics of the CommandParameters class. None of these changes have any functional changes:

fb51db1 - Cleanup code formatting
60f7377 - Remove unused method
ab7ba1a - Remove session dep from CommandParameters
e9fdaf7 - Remove trailing/leading whitespace
2e3477d - Move part size constants out to their respective modules
9ba4d38 - Move constants to transferconfig

Then the two commits that add in the change. The first is the addition of a RuntimeConfig object that manages all of the merging/converting of config file values:

697662b - Add class for merging converting s3 runtime config

Then plumbing this newly created class into the S3 command codebase:

db24830 - Plumb through S3 runtime config

I'd suggest taking a look at the last two commits on their own and then reviewing the rest.

cc @kyleknap @danielgtaylor

We no longer verify that the bucket exists.
CommandParameters is moving (but is not quite there) towards being
only concerned with command parameter validation and restructuring.
It's should not be concerned with also making S3 requests, _especially_
recursively deleting customer objects.  This is better suited for
the CommandArchitecture or the RbCommand objects.  While it would
take more work to get it there, the S3TransferCommand base class seems
to be a better fit and an incremental improvement.  It makes the
CommandParameters class simpler.
As part of refactoring out the runtime config, the constants
that _should_ not change (because they're based on S3 limits)
are moved to the corresponding modules that use these values.

This makes a clear distinction between values tied to s3 limits,
and values the user can change if needed.

Also, based on the latest S3 docs, the maximum number of parts
is now 10000 instead of 1000 so I've updated that accordingly.
This resulted in a few unit tests updates.
This plumbs in the RuntimeConfig into the
S3SubCommand, CommandArchitecture, and S3Handler
classes.

The S3Handler __init__ method changed by removing
the multi_threshold and chunksize params as these are
now specified via the newly added runtime_config param.
Given the interface change, there were a several unit tests
that needed updating.
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.12%) to 91.63% when pulling db24830 on jamesls:s3-configuration into 86f7f52 on aws:develop.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.02%) to 91.77% when pulling db24830 on jamesls:s3-configuration into 86f7f52 on aws:develop.

@jamesls
Copy link
Member Author

jamesls commented Feb 3, 2015

Looking at the coveralls coverage drop...

@jamesls
Copy link
Member Author

jamesls commented Feb 4, 2015

Also, just noticed that there's an integ test failure for streaming. Looks like I missed plumbing this into streaming s3handler as well, which I believe should not use the runtime config settings.

self.executor = Executor(
num_threads=self.EXECUTOR_NUM_THREADS,
num_threads=self._runtime_config['max_concurrent_requests'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be

num_threads=self._runtime_config['max_concurrent_requests']-1

Because the main thread is acting as a producer by calling the ListObjects operation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After initially implementing that, I ended up making this just match the number of threads. The main reason is that if we do add max - 1, then the minimum value needs to be 2, which I think would confuse customers. It's also not intuitive that you can't set this value to 1. If a user wants to "turn off" parallelism, you'd think you'd set max_concurrent_requests to 1, but they would need to set max_concurrent_requests to 2. I could see us getting questions about that.

Another option is to rename this to be more explicit. If we called it max_transfer_requests maybe that would clear up what this is doing. Although I think max_concurrent_requests is more clear than max_transfer_threads. What do you think?

Or another option would be to be clear in our documentation what this means. It's the number of concurrent requests uploading/downloading/copying to s3. It does not include any ListObjects calls.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I like that last option the best. If I had no knowledge of the internals of the s3 commands, I would also find it confusing that the lowest that you can make it is 2. I do not like max_transfer_requests because it is ambiguous (it seems to imply that we will at most do that maximum amount of transfers for the one command).

@kyleknap
Copy link
Contributor

kyleknap commented Feb 4, 2015

Looks good. I guess my main comment was related to the interpretation of max_concurrent_requests. I just want to be on the same page with you before it gets merged.

We need to use the specified values in this class
instead of the .defaults() value.
@coveralls
Copy link

Coverage Status

Coverage increased (+0.16%) to 91.9% when pulling 31260e1 on jamesls:s3-configuration into 86f7f52 on aws:develop.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.15%) to 91.89% when pulling 57f4937 on jamesls:s3-configuration into 86f7f52 on aws:develop.

To be consistent with what the "aws s3 ls" command outputs.
As per review feedback.
@jamesls
Copy link
Member Author

jamesls commented Feb 4, 2015

@kyleknap I believe I've incorporated all the feedback. Coveralls also looks good now. Let me know if I've missed anything.

cc @danielgtaylor

@coveralls
Copy link

Coverage Status

Coverage increased (+0.15%) to 91.9% when pulling 4d44ce6 on jamesls:s3-configuration into 86f7f52 on aws:develop.

@kyleknap
Copy link
Contributor

kyleknap commented Feb 4, 2015

Thanks! 🚢

@jamesls
Copy link
Member Author

jamesls commented Feb 9, 2015

Closing, this was merged via: 9911da3

@joelvanvelden
Copy link

@jamesls,
The given sample config has an error:
"multipart_chunk_size = 64MB"
should instead read:
multipart_chunksize = 64MB

I found this out the hard way :(

Also, this (very useful) new feature is not documented anywhere apart from this pull request.

@jamesls
Copy link
Member Author

jamesls commented Feb 19, 2015

@joelvanvelden Thanks for pointing that out. I wish I would have named it multipart_chunk_size to be consistent with max_queue_size but I suppose it's too late to change.

Regarding the documentation, there's an existing pull request out (#1147) that adds topic guides to the CLI help system (as well as the online html docs). This gives us a place to document advanced topics such as these s3 settings. We'll be adding a topic guide on s3 configuration shortly.

jamesls added a commit to jamesls/aws-cli that referenced this pull request Feb 24, 2015
Documents the config values added in aws#1122.
@TimLudwinski
Copy link

Unfortunately, setting max_concurrent_requests doesn't solve the issue of limiting the bandwidth for me. With 1 connection (I verified that there was only one though netstat), our 100Mb pipe is clogged. (Not that having this bandwidth is a bad thing necessarily). We still need #1090 to be implemented.

thoward-godaddy pushed a commit to thoward-godaddy/aws-cli that referenced this pull request Feb 12, 2022
Remove duplicate `sam deploy` in command help text
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants