-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reenables true multithreaded s3 copy #2423
Conversation
Boto3 update removed multithreaded file copy in favor of multithreaded multipart file copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, please merge if you've tested!
|
||
# Wait for the pools to finish scheduling all the copies | ||
management_pool.close() | ||
management_pool.join() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please just check that this doesn't have a ContextManager interface maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minutes of Google searching doesn't return valuable resources towards this desire.
Have you considered adding tests as codecov is mostly red? If this is not suitable for testing feel free to skip. |
I just realized that the test for copy_dir has been set to be skipped on Travis. I'll see what i can do in the next couple days to enable some degree of testing for this method - even if not for the specific multiprocessing portion (bc i'm not sure how to test that aside from showing that increasing thread count decreases copy time). |
@dlstadther tests that are skipped on travis are tests that have shown issues with moto (mocking library for boto), it appears when testing multipart functionality, there is a race condition expressed in same tests passing and failing intermittently. |
I've spent some time trying to test the true multithreaded s3 copy, but cannot come up with a reasonable test to identify the behavior.
@Tarrasch or @ouanixi , do either of you have any suggestions? Thanks, |
bump |
I wouldn't bother to much about writing unit-tests for multi-threaded code if it's already working in production for you. I wouldn't let that block me from merging. |
In this case, I'll merge when I return from vacation. Thanks @Tarrasch ! |
Unable to provide unit-tests for multi-threading within method. Code has been running successfully copying millions of files daily in production for ~2 months. Merging... |
Boto3 update removed multithreaded file copy in favor of multithreaded multipart file copy.
Description
Re-enables true multithreaded file copy. (Logic taken directly from pre-boto3 s3.py)
Motivation and Context
Boto3's
TransferConfig
appears to enable multithreaded copy capabilities for multipart files, but not for the copying of many small files.Have you tested this? If so, how?
I've run with test code. Other than that, i'm letting travis test it.