-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using aws s3 cp in multiple process/threading #1018
Comments
When do you try to downloading the files from s3? Is it after the ec2 instances copy the object to the new bucket or is it while the ec2 instances are performing the copy? If you are trying to download objects from a location in a bucket while objects are being copied to the same location at the same time, there could be potential problems in downloading the files. The main issue being we only perform one |
We didn't upload new files to s3://my_bucket when EC2 instances are processing. In recent runs, we uploaded the files to s3://my_bucket one or two weeks ago, and we only instructed EC2 to process them, and we did not upload the processed files back to the same bucket. As we observed, we have problem when we have multiple processes use aws s3 cp to copy the file from same bucket, it could cause issue. Our current configuration will have 12 (EC2 instances) * 8 (8 processes on each EC2) = 96 processes to access same s3 bucket with aws cli concurrently. |
Interesting. I believe this is due to the fact that the This explanation makes sense given your observation: If there are only 5 s3 objects being downloaded, there would only be 40 threads making requests (5 threads * 8 processes), given the files are not being uploaded via multi-parts. Since this is half of the thread maximum that I stated before, the threads are starving less. However if you are downloading 100+ files, you will be hitting that 80 thread maximum. So, I guess my question is now: Is it possible to have a fewer amount of processes running the |
Thank you for the reply! It's good to know how aws s3 cp handles request. After doing more investigation, we met network bandwidth issue because all EC2 instances are behind NAT, and it was resolved. In the recent runs we did not see incomplete download issue. I'm not sure if the resolved bandwidth issue could help this one. Is there an timeout mechanism implemented that could be related to this issue? |
S3 usually throws RequestTimeout errors when the socket has not been read or written to in a while. This could happen if the threads are starving each other as I explained in my previous comment. However, this error will be displayed to the user. Are you seeing those types of errors? Or is it a different type of errors? |
Consolidating issues into tracking issues. There's two that are relevant here:
Closing in favor of those issues. |
Hi,
First, please let me describe our aws cli usage scenario. We store a set of files on s3 with same prefix, say s3://my_bucket/, and a set of EC2 instances will copy the file, process it, and upload it to another bucket. We use SWF, and therefore multiple workers on EC2 will actually do the tasks, and we use aws s3 cp of course. I think this scenario is quite common.
It works well with less than 5 files to process. However, if we process many files (100+), things could go wrong. There are two observations:
'stdout': u'Completed 1 of 685 part(s) with 1 file(s) remaining\rCompleted 2 of 685 part(s) with 1 file(s) remaining\rCompleted 3 of 685 part(s) with 1 file(s) remaining\rCompleted 4 of 685 part(s) with 1 file(s) remaining\rCompleted 5 of 685 part(s) with 1 file(s) remaining\rCompleted 6 of 685 part(s) with 1 file(s) remaining\rCompleted 7 of 685 part(s) with 1 file(s) remaining\rCompleted 8 of 685 part(s) with 1 file(s) remaining\rCompleted 9 of 685 part(s) with 1 file(s) remaining\rCompleted 10 of 685 part(s) with 1 file(s) remaining\r\n'
I think --debug option might be required to help investigation. I will try to work on this, but this issue is quite urgent and therefore I want to know if there is any other information that we could provide to assist this.
Thanks,
Wesley
The text was updated successfully, but these errors were encountered: