Using aws s3 cp in multiple process/threading #1018

delbinwen · 2014-11-21T15:57:31Z

Hi,

First, please let me describe our aws cli usage scenario. We store a set of files on s3 with same prefix, say s3://my_bucket/, and a set of EC2 instances will copy the file, process it, and upload it to another bucket. We use SWF, and therefore multiple workers on EC2 will actually do the tasks, and we use aws s3 cp of course. I think this scenario is quite common.

It works well with less than 5 files to process. However, if we process many files (100+), things could go wrong. There are two observations:

aws s3 cp looks like stuck, it could take several hours to download a single file (less than 1 GB), or it never returns. We use basic command syntax, like aws s3 cp s3://my_bucket/file.1 /tmp/
aws s3 cp returns 0 code but did not complete all parts. We capture the standout in our log and looks like below:
'stdout': u'Completed 1 of 685 part(s) with 1 file(s) remaining\rCompleted 2 of 685 part(s) with 1 file(s) remaining\rCompleted 3 of 685 part(s) with 1 file(s) remaining\rCompleted 4 of 685 part(s) with 1 file(s) remaining\rCompleted 5 of 685 part(s) with 1 file(s) remaining\rCompleted 6 of 685 part(s) with 1 file(s) remaining\rCompleted 7 of 685 part(s) with 1 file(s) remaining\rCompleted 8 of 685 part(s) with 1 file(s) remaining\rCompleted 9 of 685 part(s) with 1 file(s) remaining\rCompleted 10 of 685 part(s) with 1 file(s) remaining\r\n'

I think --debug option might be required to help investigation. I will try to work on this, but this issue is quite urgent and therefore I want to know if there is any other information that we could provide to assist this.

Thanks,
Wesley

kyleknap · 2014-11-21T17:24:05Z

When do you try to downloading the files from s3? Is it after the ec2 instances copy the object to the new bucket or is it while the ec2 instances are performing the copy?

If you are trying to download objects from a location in a bucket while objects are being copied to the same location at the same time, there could be potential problems in downloading the files. The main issue being we only perform one ListObjects (or more if pagination is required) for all of the objects in the bucket. So, if objects are added during this ListObjects it may throw off the process.

delbinwen · 2014-11-22T06:34:42Z

We didn't upload new files to s3://my_bucket when EC2 instances are processing. In recent runs, we uploaded the files to s3://my_bucket one or two weeks ago, and we only instructed EC2 to process them, and we did not upload the processed files back to the same bucket.

As we observed, we have problem when we have multiple processes use aws s3 cp to copy the file from same bucket, it could cause issue. Our current configuration will have 12 (EC2 instances) * 8 (8 processes on each EC2) = 96 processes to access same s3 bucket with aws cli concurrently.

kyleknap · 2014-11-24T18:27:15Z

Interesting. I believe this is due to the fact that the aws s3 cp is already multi-threaded. It uses 10 threads to preform s3 related actions. So on a single ec2 instance, you could have 80 threads (10 threads * 8 processes) in total that are all trying to make requests to s3. This could cause your threads to starve in the sense that there is not enough bandwidth to make all of these requests simultaneously.

This explanation makes sense given your observation:
"It works well with less than 5 files to process. However, if we process many files (100+), things could go wrong."

If there are only 5 s3 objects being downloaded, there would only be 40 threads making requests (5 threads * 8 processes), given the files are not being uploaded via multi-parts. Since this is half of the thread maximum that I stated before, the threads are starving less. However if you are downloading 100+ files, you will be hitting that 80 thread maximum.

So, I guess my question is now: Is it possible to have a fewer amount of processes running the aws s3 cp commands say 1 or 2 as opposed to 8? I believe if you can cut this down, you will have better results.

delbinwen · 2014-12-01T08:39:23Z

Thank you for the reply! It's good to know how aws s3 cp handles request.

After doing more investigation, we met network bandwidth issue because all EC2 instances are behind NAT, and it was resolved.

In the recent runs we did not see incomplete download issue. I'm not sure if the resolved bandwidth issue could help this one. Is there an timeout mechanism implemented that could be related to this issue?

kyleknap · 2014-12-01T17:42:48Z

S3 usually throws RequestTimeout errors when the socket has not been read or written to in a while. This could happen if the threads are starving each other as I explained in my previous comment. However, this error will be displayed to the user. Are you seeing those types of errors? Or is it a different type of errors?

jamesls · 2015-01-15T00:00:06Z

Consolidating issues into tracking issues. There's two that are relevant here:

Limit bandwidth: Add ability to limit bandwidth for S3 uploads/downloads #1090
Set retry configuration: Add ability for S3 commands to increase retry count #1092
Limit concurrency: Limit number of parallel s3 transfers #907

Closing in favor of those issues.

kyleknap added s3 response-needed labels Nov 21, 2014

jamesls closed this as completed Jan 15, 2015

diehlaws added needs-response and removed response-needed labels Jan 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using aws s3 cp in multiple process/threading #1018

Using aws s3 cp in multiple process/threading #1018

delbinwen commented Nov 21, 2014

kyleknap commented Nov 21, 2014

delbinwen commented Nov 22, 2014

kyleknap commented Nov 24, 2014

delbinwen commented Dec 1, 2014

kyleknap commented Dec 1, 2014

jamesls commented Jan 15, 2015

Using aws s3 cp in multiple process/threading #1018

Using aws s3 cp in multiple process/threading #1018

Comments

delbinwen commented Nov 21, 2014

kyleknap commented Nov 21, 2014

delbinwen commented Nov 22, 2014

kyleknap commented Nov 24, 2014

delbinwen commented Dec 1, 2014

kyleknap commented Dec 1, 2014

jamesls commented Jan 15, 2015