Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws s3 sync does not work as expected #3651

Closed
blurayne opened this issue Oct 12, 2018 · 3 comments
Closed

aws s3 sync does not work as expected #3651

blurayne opened this issue Oct 12, 2018 · 3 comments
Assignees
Labels
closing-soon This issue will automatically close in 4 days unless further comments are made.

Comments

@blurayne
Copy link

blurayne commented Oct 12, 2018

We do an initial sync to an s3 bucket, then try to update changed files which does not work as expected.

I have tested this behavior for aws cli in following version:

  • python2.7-awscli1.9.7
  • python2.7-awscli1.15.47
  • python3.6-awscli1.15.47

Is there something I miss out? For me it seems that after a multi-part upload checksums for MD5 are not set. Also enforcing s3cmd to do multi-part uploads by --multipart-chunk-size-mb=5MB works just fine.

Yet, concerning debug logs of aws cli Content-MD5 does get calculated for parts and is sent

2018-10-12 16:08:03,860 - ThreadPoolExecutor-0_0 - botocore.endpoint - DEBUG - Making request for OperationModel(name=UploadPart) (verify_ssl=True) with params: {'url_path': '/l3testing/multi/part-10.out', 'query_string': {'uploadId': ...

The setup is run by default aws cli options.

Test-Setup

for i in {1..10}; do dd if=/dev/urandom of=multi/part-$i.out bs=1MB count=10; done;
mkdir multi-changed
cp -r multi/part-{1,2,3,4,5}.out multi-changed
for i in {6..10}; do dd if=/dev/urandom of=multi-changed/part-$i.out bs=1MB count=10; done;

Testing with aws cli

# Cleanup
$ aws s3 rm s3://l3testing/multi --recursive 

# Inital sync
$ aws s3 sync multi s3://l3testing/multi                                                                     :(
upload: multi/part-1.out to s3://l3testing/multi/part-1.out         
upload: multi/part-3.out to s3://l3testing/multi/part-3.out      
upload: multi/part-2.out to s3://l3testing/multi/part-2.out      
upload: multi/part-4.out to s3://l3testing/multi/part-4.out      
upload: multi/part-10.out to s3://l3testing/multi/part-10.out    
upload: multi/part-5.out to s3://l3testing/multi/part-5.out      
upload: multi/part-6.out to s3://l3testing/multi/part-6.out      
upload: multi/part-8.out to s3://l3testing/multi/part-8.out      
upload: multi/part-7.out to s3://l3testing/multi/part-7.out      
upload: multi/part-9.out to s3://l3testing/multi/part-9.out  

# Now changed files only should be synced 
$ aws s3 sync multi-changed/ s3://l3testing/multi/

# ERROR: No files synced!

While testing with s3cmd

# Cleanup
$ aws s3 rm s3://l3testing/multi --recursive 

# Inital sync
$ s3cmd sync -v --check-md5 multi-changed/  s3://l3testing/multi/
s3cmd sync --delete-removed multi/  s3://l3testing/multi/ 
upload: 'multi/part-1.out' -> 's3://l3testing/multi/part-1.out'  [1 of 10]
 10000000 of 10000000   100% in    1s     5.12 MB/s  done
upload: 'multi/part-10.out' -> 's3://l3testing/multi/part-10.out'  [2 of 10]
 10000000 of 10000000   100% in    1s     7.54 MB/s  done
upload: 'multi/part-2.out' -> 's3://l3testing/multi/part-2.out'  [3 of 10]
 10000000 of 10000000   100% in    1s     8.60 MB/s  done
upload: 'multi/part-3.out' -> 's3://l3testing/multi/part-3.out'  [4 of 10]
 10000000 of 10000000   100% in    1s     7.17 MB/s  done
upload: 'multi/part-4.out' -> 's3://l3testing/multi/part-4.out'  [5 of 10]
 10000000 of 10000000   100% in    1s     7.72 MB/s  done
upload: 'multi/part-5.out' -> 's3://l3testing/multi/part-5.out'  [6 of 10]
 10000000 of 10000000   100% in    1s     8.19 MB/s  done
upload: 'multi/part-6.out' -> 's3://l3testing/multi/part-6.out'  [7 of 10]
 10000000 of 10000000   100% in    1s     7.60 MB/s  done
upload: 'multi/part-7.out' -> 's3://l3testing/multi/part-7.out'  [8 of 10]
 10000000 of 10000000   100% in    1s     7.73 MB/s  done
upload: 'multi/part-8.out' -> 's3://l3testing/multi/part-8.out'  [9 of 10]
 10000000 of 10000000   100% in    1s     7.52 MB/s  done
upload: 'multi/part-9.out' -> 's3://l3testing/multi/part-9.out'  [10 of 10]
 10000000 of 10000000   100% in    1s     8.31 MB/s  done
Done. Uploaded 100000000 bytes in 12.9 seconds, 7.38 MB/s.

# Now changed files only should be synced 
s3cmd sync  --delete-removed multi-changed/  s3://l3testing/multi/ 
upload: 'multi-changed/part-10.out' -> 's3://l3testing/multi/part-10.out'  [1 of 5]
 10000000 of 10000000   100% in    1s     5.97 MB/s  done
upload: 'multi-changed/part-6.out' -> 's3://l3testing/multi/part-6.out'  [2 of 5]
 10000000 of 10000000   100% in    1s     9.45 MB/s  done
upload: 'multi-changed/part-7.out' -> 's3://l3testing/multi/part-7.out'  [3 of 5]
 10000000 of 10000000   100% in    1s     9.18 MB/s  done
upload: 'multi-changed/part-8.out' -> 's3://l3testing/multi/part-8.out'  [4 of 5]
 10000000 of 10000000   100% in    1s     8.81 MB/s  done
upload: 'multi-changed/part-9.out' -> 's3://l3testing/multi/part-9.out'  [5 of 5]
 10000000 of 10000000   100% in    1s     8.79 MB/s  done
Done. Uploaded 50000000 bytes in 5.8 seconds, 8.17 MB/s.

# SUCCESS: Everything done correctly!

Anyway, workaround would be now using s4cmd.

@blurayne
Copy link
Author

blurayne commented Oct 15, 2018

Okay, I get it! So now that's enterprise strategy for file syncs? RLY?

IMHO: node-based s3-cli and boto3-based s4cmd do the job much better and are as-fast.

Are there any plans to replace aws cli s3 syncstrategy with low-level botocore-usage with boto3 sync?

@justnance
Copy link

Thanks for reaching out @blurayne. This issue is being discussed under #1145. I'm going to add a label for closing soon.

@justnance justnance self-assigned this Oct 26, 2018
@justnance justnance added the closing-soon This issue will automatically close in 4 days unless further comments are made. label Oct 26, 2018
@no-response
Copy link

no-response bot commented Nov 2, 2018

This issue has been automatically closed because there has been no response to our request for more information from the original author. With only the information that is currently in the issue, we don't have enough information to take action. Please reach out if you have or find the answers we need so that we can investigate further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closing-soon This issue will automatically close in 4 days unless further comments are made.
Projects
None yet
Development

No branches or pull requests

2 participants