Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reenables true multithreaded s3 copy #2423

Merged
merged 1 commit into from
Jul 9, 2018

Conversation

dlstadther
Copy link
Collaborator

Boto3 update removed multithreaded file copy in favor of multithreaded multipart file copy.

Description

Re-enables true multithreaded file copy. (Logic taken directly from pre-boto3 s3.py)

Motivation and Context

Boto3's TransferConfig appears to enable multithreaded copy capabilities for multipart files, but not for the copying of many small files.

Have you tested this? If so, how?

I've run with test code. Other than that, i'm letting travis test it.

Boto3 update removed multithreaded file copy in favor of multithreaded multipart file copy.
@dlstadther dlstadther requested a review from Tarrasch May 14, 2018 18:31
Copy link
Contributor

@Tarrasch Tarrasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, please merge if you've tested!


# Wait for the pools to finish scheduling all the copies
management_pool.close()
management_pool.join()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please just check that this doesn't have a ContextManager interface maybe?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minutes of Google searching doesn't return valuable resources towards this desire.

@dlstadther dlstadther requested review from honnix and Tarrasch May 14, 2018 18:48
@Tarrasch
Copy link
Contributor

Have you considered adding tests as codecov is mostly red? If this is not suitable for testing feel free to skip.

@dlstadther
Copy link
Collaborator Author

dlstadther commented May 15, 2018

I just realized that the test for copy_dir has been set to be skipped on Travis.

I'll see what i can do in the next couple days to enable some degree of testing for this method - even if not for the specific multiprocessing portion (bc i'm not sure how to test that aside from showing that increasing thread count decreases copy time).

@ouanixi
Copy link
Contributor

ouanixi commented May 21, 2018

@dlstadther tests that are skipped on travis are tests that have shown issues with moto (mocking library for boto), it appears when testing multipart functionality, there is a race condition expressed in same tests passing and failing intermittently.

@dlstadther
Copy link
Collaborator Author

I've spent some time trying to test the true multithreaded s3 copy, but cannot come up with a reasonable test to identify the behavior.

  • Confirmed that TestS3Client.test_copy_dir continues to run successfully with this code
  • Tried to add test to show that as threads increase, runtime reduces, but for this to be a consistent speed improvement the number and size of files has to increase such to be unreasonable (runtime)

@Tarrasch or @ouanixi , do either of you have any suggestions?

Thanks,

@dlstadther
Copy link
Collaborator Author

bump

@Tarrasch
Copy link
Contributor

Tarrasch commented Jul 4, 2018

I wouldn't bother to much about writing unit-tests for multi-threaded code if it's already working in production for you. I wouldn't let that block me from merging.

@dlstadther
Copy link
Collaborator Author

In this case, I'll merge when I return from vacation. Thanks @Tarrasch !

@dlstadther
Copy link
Collaborator Author

Unable to provide unit-tests for multi-threading within method. Code has been running successfully copying millions of files daily in production for ~2 months.

Merging...

@dlstadther dlstadther merged commit dda1478 into spotify:master Jul 9, 2018
@dlstadther dlstadther deleted the feature/s3-multi-copy branch July 9, 2018 11:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants