-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add config variables for changing S3 runtime values #1122
Conversation
We no longer verify that the bucket exists.
CommandParameters is moving (but is not quite there) towards being only concerned with command parameter validation and restructuring. It's should not be concerned with also making S3 requests, _especially_ recursively deleting customer objects. This is better suited for the CommandArchitecture or the RbCommand objects. While it would take more work to get it there, the S3TransferCommand base class seems to be a better fit and an incremental improvement. It makes the CommandParameters class simpler.
As part of refactoring out the runtime config, the constants that _should_ not change (because they're based on S3 limits) are moved to the corresponding modules that use these values. This makes a clear distinction between values tied to s3 limits, and values the user can change if needed. Also, based on the latest S3 docs, the maximum number of parts is now 10000 instead of 1000 so I've updated that accordingly. This resulted in a few unit tests updates.
This plumbs in the RuntimeConfig into the S3SubCommand, CommandArchitecture, and S3Handler classes. The S3Handler __init__ method changed by removing the multi_threshold and chunksize params as these are now specified via the newly added runtime_config param. Given the interface change, there were a several unit tests that needed updating.
Looking at the coveralls coverage drop... |
Also, just noticed that there's an integ test failure for streaming. Looks like I missed plumbing this into streaming s3handler as well, which I believe should not use the runtime config settings. |
self.executor = Executor( | ||
num_threads=self.EXECUTOR_NUM_THREADS, | ||
num_threads=self._runtime_config['max_concurrent_requests'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be
num_threads=self._runtime_config['max_concurrent_requests']-1
Because the main thread is acting as a producer by calling the ListObjects
operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After initially implementing that, I ended up making this just match the number of threads. The main reason is that if we do add max - 1
, then the minimum value needs to be 2
, which I think would confuse customers. It's also not intuitive that you can't set this value to 1. If a user wants to "turn off" parallelism, you'd think you'd set max_concurrent_requests
to 1, but they would need to set max_concurrent_requests to 2
. I could see us getting questions about that.
Another option is to rename this to be more explicit. If we called it max_transfer_requests
maybe that would clear up what this is doing. Although I think max_concurrent_requests
is more clear than max_transfer_threads
. What do you think?
Or another option would be to be clear in our documentation what this means. It's the number of concurrent requests uploading/downloading/copying to s3. It does not include any ListObjects calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I like that last option the best. If I had no knowledge of the internals of the s3 commands, I would also find it confusing that the lowest that you can make it is 2. I do not like max_transfer_requests
because it is ambiguous (it seems to imply that we will at most do that maximum amount of transfers for the one command).
Looks good. I guess my main comment was related to the interpretation of |
We need to use the specified values in this class instead of the .defaults() value.
To be consistent with what the "aws s3 ls" command outputs. As per review feedback.
@kyleknap I believe I've incorporated all the feedback. Coveralls also looks good now. Let me know if I've missed anything. |
Thanks! 🚢 |
Closing, this was merged via: 9911da3 |
@jamesls, I found this out the hard way :( Also, this (very useful) new feature is not documented anywhere apart from this pull request. |
@joelvanvelden Thanks for pointing that out. I wish I would have named it multipart_chunk_size to be consistent with max_queue_size but I suppose it's too late to change. Regarding the documentation, there's an existing pull request out (#1147) that adds topic guides to the CLI help system (as well as the online html docs). This gives us a place to document advanced topics such as these s3 settings. We'll be adding a topic guide on s3 configuration shortly. |
Documents the config values added in aws#1122.
Unfortunately, setting |
Remove duplicate `sam deploy` in command help text
This pull requests adds support for the following parameters in the
~/.aws/config
file:A sample config looks like:
This addresses several needs we've heard including:
This still needs documentation, but the code is ready for review.
Code Overview
While working on this code, there were parts of the code I updated before making any functional changes. These include things like pep8 formatting, removing code that's no longer called, improving some of the semantics of the
CommandParameters
class. None of these changes have any functional changes:fb51db1 - Cleanup code formatting
60f7377 - Remove unused method
ab7ba1a - Remove session dep from CommandParameters
e9fdaf7 - Remove trailing/leading whitespace
2e3477d - Move part size constants out to their respective modules
9ba4d38 - Move constants to transferconfig
Then the two commits that add in the change. The first is the addition of a
RuntimeConfig
object that manages all of the merging/converting of config file values:697662b - Add class for merging converting s3 runtime config
Then plumbing this newly created class into the S3 command codebase:
db24830 - Plumb through S3 runtime config
I'd suggest taking a look at the last two commits on their own and then reviewing the rest.
cc @kyleknap @danielgtaylor