From 3ab778be3a542d491b4fcdd62606576c2da2f998 Mon Sep 17 00:00:00 2001 From: James Saryerwinnie Date: Tue, 24 Feb 2015 14:04:33 -0800 Subject: [PATCH 1/2] Add topic guide for s3 configs Documents the config values added in #1122. --- awscli/topics/s3-config.rst | 135 ++++++++++++++++++++++++++++++++++++ 1 file changed, 135 insertions(+) create mode 100644 awscli/topics/s3-config.rst diff --git a/awscli/topics/s3-config.rst b/awscli/topics/s3-config.rst new file mode 100644 index 000000000000..4fa8bb8e8e26 --- /dev/null +++ b/awscli/topics/s3-config.rst @@ -0,0 +1,135 @@ +:title: AWS CLI S3 Configuration +:description: Advanced configuration for AWS S3 Commands +:category: S3 +:related command: s3 cp, s3 sync, s3 mv, s3 rm + +The ``aws s3`` transfer commands, which include the ``cp``, ``sync``, ``mv``, +and ``rm`` commands, have additional configuration you can use to control +S3 transfers. This topic guide discusses these parameters as well as best +practices and guidelines for setting these values. + +Before discussing the specifics of these values, note that these values are +entirely optional. You should be able to use the ``aws s3`` transfer commands +without having to configure any of these values. These configuration values +are provided in the case where you need to modify one of these values, either +for performance reasons or to account for the specific environment where these +``aws s3`` commands are being run. + + +Configuration Values +==================== + +These are the configuration values you can set for S3: + +* ``max_concurrent_requests`` - The maximum number of concurrent requests. +* ``max_queue_size`` - The maximum number of tasks in the task queue. +* ``multipart_threshold`` - The size threshold where the CLI uses multipart + transfers. +* ``multipart_chunksize`` - When using multipart transfers, this is the chunk + size that will be used. + +These values must be set under the top level ``s3`` key in the AWS Config File, +which has a default location of ``~/.aws/config``. Below is an example +configuration:: + + [profile development] + aws_access_key_id=foo + aws_secret_access_key=bar + s3 = + max_concurrent_requests = 20 + max_queue_size = 10000 + multipart_threshold = 64MB + multipart_chunksize = 16MB + +Note that all the S3 configuration values are indented and nested under the top +level ``s3`` key. + +You can also set these values programatically using the ``aws configure set`` +command. For example, to set the above values for the default profile, you +could instead run these commands:: + + $ aws configure set default.s3.max_concurrent_requests 20 + $ aws configure set default.s3.max_queue_size 10000 + $ aws configure set default.s3.multipart_threshold 64MB + $ aws configure set default.s3.multipart_chunksize 16MB + + +max_concurrent_requests +----------------------- + +**Default** - ``10`` + +The ``aws s3`` transfer commands are multithreaded. At any given time, +multiple requests to Amazon S3 are in flight. For example, if you are +uploading a directory via ``aws s3 cp localdir s3://bucket/ --recursive``, the +AWS CLI could be uploading the local files ``localdir/file1``, +``localdir/file2``, and ``localdir/file3`` in parallel. The +``max_concurrent_requests`` specifies the maximum number of transfer commands +that are allowed at any given time. + +You may need to change this value for a few reasons: + +* Decreasing this value - On some environments, the default of 10 concurrent + requests can overwhelm a system. This may cause connection timeouts or + slow the responsiveness of the system. Lowering this value will make the + S3 transfer commands less resource intensive. The tradeoff is that + S3 transfers may take longer to complete. +* Increasing this value - In some scenarios, you may want the S3 transfers + to complete as quickly as possible, using as much network bandwidth + as necessary. In this scenario, the default number of concurrent requests + may not be sufficient to utilize all the network bandwidth available. + Increasing this value may improve the time it takes to complete an + S3 transfer. + + +max_queue_size +-------------- + +**Default** - ``1000`` + +The AWS CLI internally uses a producer consumer model, where we queue up S3 +tasks that are then executed by consumers, which in this case utilize a bound +thread pool, controlled by ``max_concurrent_requests``. The enqueuing rate +can be much faster than the rate at which consumers are executing tasks. +To avoid unbounded growth, the task queue size is capped to a specific size. +This configuration value changes the value of that maximum number. + +You generally will not need to change this value. This value also +corresponds to the number of tasks we are aware of that need to be +executed. This means that by default we can only see 1000 tasks ahead. +Until the S3 command knows the total number of tasks executed, the +progress line will show a total of ``...``. Increasing this value +means that we will be able to more quickly know the total number of +tasks needed, assuming that the enqueuing rate is quicker than the +rate of task consumption. The tradeoff is that a larger max queue +size will require more memory. + + +multipart_threshold +------------------- + +**Default** - ``8MB`` + +When uploading, downloading, or copying a file, the S3 commands +will switch to multipart operations if the file reaches a given +size threshold. The ``multipart_threshold`` controls this value. +You can specify this value in one of two ways: + +* The file size in bytes. For example, ``1048576``. +* The file size with a size suffix. You can use ``KB``, ``MB``, ``GB``, + ``TB``. For example: ``10MB``, ``1GB``. Note that S3 imposes + constraints on valid values that can be used for multipart + operations. + + +multipart_chunksize +------------------- + +**Default** - ``8MB`` + +Once the S3 commands have decided to use multipart operations, the +file is divided into chunks. This configuration option specifies what +the chunk size (also referred to as the part size) should be. This +value can specified using the same semantics as ``multipart_threshold``, +that is either as the number of bytes as an integer, or using a size +suffix. From 046b756f56514c458a23713dfce27245f591bf1c Mon Sep 17 00:00:00 2001 From: James Saryerwinnie Date: Tue, 24 Feb 2015 17:47:31 -0800 Subject: [PATCH 2/2] Incorporate review feedback --- awscli/topics/s3-config.rst | 39 +++++++++++++++++++------------------ 1 file changed, 20 insertions(+), 19 deletions(-) diff --git a/awscli/topics/s3-config.rst b/awscli/topics/s3-config.rst index 4fa8bb8e8e26..0f7e3ce328fb 100644 --- a/awscli/topics/s3-config.rst +++ b/awscli/topics/s3-config.rst @@ -4,9 +4,9 @@ :related command: s3 cp, s3 sync, s3 mv, s3 rm The ``aws s3`` transfer commands, which include the ``cp``, ``sync``, ``mv``, -and ``rm`` commands, have additional configuration you can use to control -S3 transfers. This topic guide discusses these parameters as well as best -practices and guidelines for setting these values. +and ``rm`` commands, have additional configuration values you can use to +control S3 transfers. This topic guide discusses these parameters as well as +best practices and guidelines for setting these values. Before discussing the specifics of these values, note that these values are entirely optional. You should be able to use the ``aws s3`` transfer commands @@ -23,10 +23,10 @@ These are the configuration values you can set for S3: * ``max_concurrent_requests`` - The maximum number of concurrent requests. * ``max_queue_size`` - The maximum number of tasks in the task queue. -* ``multipart_threshold`` - The size threshold where the CLI uses multipart - transfers. +* ``multipart_threshold`` - The size threshold the CLI uses for multipart + transfers of individual files. * ``multipart_chunksize`` - When using multipart transfers, this is the chunk - size that will be used. + size that the CLI uses for multipart transfers of individual files. These values must be set under the top level ``s3`` key in the AWS Config File, which has a default location of ``~/.aws/config``. Below is an example @@ -89,19 +89,20 @@ max_queue_size The AWS CLI internally uses a producer consumer model, where we queue up S3 tasks that are then executed by consumers, which in this case utilize a bound -thread pool, controlled by ``max_concurrent_requests``. The enqueuing rate -can be much faster than the rate at which consumers are executing tasks. -To avoid unbounded growth, the task queue size is capped to a specific size. -This configuration value changes the value of that maximum number. - -You generally will not need to change this value. This value also -corresponds to the number of tasks we are aware of that need to be -executed. This means that by default we can only see 1000 tasks ahead. -Until the S3 command knows the total number of tasks executed, the -progress line will show a total of ``...``. Increasing this value -means that we will be able to more quickly know the total number of -tasks needed, assuming that the enqueuing rate is quicker than the -rate of task consumption. The tradeoff is that a larger max queue +thread pool, controlled by ``max_concurrent_requests``. A task generally maps +to a single S3 operation. For example, as task could be a ``PutObjectTask``, +or a ``GetObjectTask``, or an ``UploadPartTask``. The enqueuing rate can be +much faster than the rate at which consumers are executing tasks. To avoid +unbounded growth, the task queue size is capped to a specific size. This +configuration value changes the value of that maximum number. + +You generally will not need to change this value. This value also corresponds +to the number of tasks we are aware of that need to be executed. This means +that by default we can only see 1000 tasks ahead. Until the S3 command knows +the total number of tasks executed, the progress line will show a total of +``...``. Increasing this value means that we will be able to more quickly know +the total number of tasks needed, assuming that the enqueuing rate is quicker +than the rate of task consumption. The tradeoff is that a larger max queue size will require more memory.