Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose max_bandwidth configuration from s3transfer #1430

Closed
lauhayden opened this issue Jan 25, 2018 · 16 comments
Closed

Expose max_bandwidth configuration from s3transfer #1430

lauhayden opened this issue Jan 25, 2018 · 16 comments
Labels
feature-request This issue requests a feature. response-requested Waiting on additional information or feedback.

Comments

@lauhayden
Copy link

This is a feature request to expose the new max_bandwith config option from s3transfer. It should be fairly trivial to expose the config option in boto3.s3.transfer.TransferConfig now that s3transfer supports it (as of 0.1.12)

Related issues + pull requests:

@joguSD
Copy link
Contributor

joguSD commented Jan 25, 2018

Marking this as a feature request.

@joguSD joguSD added the feature-request This issue requests a feature. label Jan 25, 2018
@vincentcabosart
Copy link

Hello, I understand this feature will be able to throttle upload to S3 while using Boto3. Is this correct? Do you know when it will be implemented? Thank you!

@Dmitry1987
Copy link

Hi, was looking for rate limit feature for s3 transfers in boto3 too. So this is my +1 for this feature request :) .

@vincentcabosart
Copy link

Yes, it is working with AWS CLI at the moment, but not yet with Boto3.

@make-ing
Copy link

+1

1 similar comment
@aleliaert
Copy link

+1

@paopaoactioner
Copy link

So any updates on this issue?

@ZerepL
Copy link

ZerepL commented Dec 13, 2019

+1

1 similar comment
@jwp763
Copy link

jwp763 commented Jan 10, 2020

+1

@itsthejoker
Copy link

Following up on this because boto3 clobbers my network connection while it's uploading. Would love to have this accessible.

@aleliaert
Copy link

+1
Note that one workaround (or adventure) could be to use Linux/Docker traffic control features:
https://tldp.org/HOWTO/Traffic-Control-HOWTO/software.html#s-iproute2-tc
https://github.com/lukaszlach/docker-tc

@tong-bluehill
Copy link

tong-bluehill commented Jul 16, 2021

Since the max_bandwidth is supported in boto/s3transfer but not in boto/boto3, the workaround is using s3transfer to upload with a limit. Paste my code here.

import boto3
s3_client=boto3.client('s3')

import s3transfer
from s3transfer.manager import TransferConfig, TransferManager
tc = TransferConfig(max_bandwidth=100*1000)
tm = TransferManager(s3_client, tc)
tf = tm.upload('/my/file', 'mys3bucket', 'mys3key')
print(tf.result())

@korbinianbauer
Copy link

korbinianbauer commented Oct 15, 2021

As an alternative to tong-bluehill's workaround, I'm using my own minimal TransferConfig class, which includes the missing max_bandwidth key. (Implementation in ▸ spoiler below, based on s3transfer.manager.TransferConfig).

This can then be passed to .upload_fileobj() or .upload_file() as per usual.

Open here for MyTransferConfig class:
from s3transfer.constants import KB, MB

class MyTransferConfig(object):
    def __init__(self,
                 multipart_threshold=8 * MB,
                 multipart_chunksize=8 * MB,
                 max_request_concurrency=10,
                 max_submission_concurrency=5,
                 max_request_queue_size=1000,
                 max_submission_queue_size=1000,
                 max_io_queue_size=1000,
                 io_chunksize=256 * KB,
                 num_download_attempts=5,
                 max_in_memory_upload_chunks=10,
                 max_in_memory_download_chunks=10,
                 max_bandwidth=None, # <--------------------------------------------------
                 use_threads=True):

        self.multipart_threshold = multipart_threshold
        self.multipart_chunksize = multipart_chunksize
        self.max_request_concurrency = max_request_concurrency
        self.max_submission_concurrency = max_submission_concurrency
        self.max_request_queue_size = max_request_queue_size
        self.max_submission_queue_size = max_submission_queue_size
        self.max_io_queue_size = max_io_queue_size
        self.io_chunksize = io_chunksize
        self.num_download_attempts = num_download_attempts
        self.max_in_memory_upload_chunks = max_in_memory_upload_chunks
        self.max_in_memory_download_chunks = max_in_memory_download_chunks
        self.max_bandwidth = max_bandwidth # <--------------------------------------------
        self.use_threads = use_threads
        self._validate_attrs_are_nonzero()

    def _validate_attrs_are_nonzero(self):
        for attr, attr_val, in self.__dict__.items():
            if attr_val is not None and attr_val <= 0:
                raise ValueError(
                    'Provided parameter %s of value %s must be greater than '
                    '0.' % (attr, attr_val))

It's more LOC initially for the class, but then it's just adding the Config key to the client's upload method call and that's it.

Usage:

upload_config = MyTransferConfig(max_bandwidth=1024**2) # 1MiB/s

s3_client.upload_file(path, bucket_name, object_key, Config=upload_config)

### or

with open(path, "rb") as f:
    s3_client.upload_fileobj(f, bucket_name, object_key, Config=upload_config)

Result of uploading ~40MiB worth of files with a 1 MiB/s bandwidth limit:

1MiBps upload limit

I noticed that for low bandwidths like 100 KiB/s, the upload speeds are quite "bursty":
100KiB per second

I could stabilize this behaviour by changing the default value of parameter bytes_threshold of class BandwidthLimitedStream in bandwidth.py from the default 256 * 1024 to a lower value. e.g. 8 * 1024. However, I didn't investigate the side-effects of doing so, so feel free to propose a better solution. I don't know if this for example causes any serious overheads on disk I/O or network.
100KiB_optimized

I also wrote a small method to apply this patch automatically:

def patch_s3transfer_upload_chunksize():
    import s3transfer
    bandwidth_module_file = os.path.dirname(s3transfer.__file__) + "/bandwidth.py"
    patch_cmd = "sed -i 's/bytes_threshold=256 \* 1024/bytes_threshold=8 \* 1024/g' " + bandwidth_module_file
    os.system(patch_cmd)

@nateprewitt
Copy link
Contributor

Hey everyone,

I've put a PR (#3059) to address this gap and hopefully remove the requirement for these additional workarounds. Please feel free to leave any comments or concerns on that PR, otherwise we'll look at getting it merged into an upcoming release.

@nateprewitt nateprewitt added the response-requested Waiting on additional information or feedback. label Oct 27, 2021
@nateprewitt
Copy link
Contributor

This feature is now available in boto3 1.19.5 and later. We'll close this issue as resolved at this point. Please feel free to open a new issue with any feedback.

@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request This issue requests a feature. response-requested Waiting on additional information or feedback.
Projects
None yet
Development

No branches or pull requests