Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have to choose between deadlocks, unbounded executors, or internal APIs #939

Closed
mackrorysd opened this issue Dec 12, 2016 · 5 comments
Closed
Assignees
Labels
response-requested Waiting on additional info or feedback. Will move to "closing-soon" in 5 days.

Comments

@mackrorysd
Copy link

mackrorysd commented Dec 12, 2016

TransferManager documents the following:

The ExecutorService to use for the TransferManager. It is not recommended to use a single threaded executor or a thread pool with a bounded work queue as control tasks may submit subtasks that can't complete until all sub tasks complete. Using an incorrectly configured thread pool may cause a deadlock (I.E. the work queue is filled with control tasks that can't finish until subtasks complete but subtasks can't execute because the queue is filled).

An unbounded executor is going to be a tough sell in some cases. The alternative I see is an executor that has distinct resource pools for tasks that are dependent on others. However that's also not currently possible without depending on the contents of the transfer.internal.* package, which is not ideal either. I'd like to request a better mechanism be made available for having strict control over resource limits without risking deadlock. For some context on where I'm coming from, see https://issues.apache.org/jira/browse/HADOOP-13826.

@kiiadi
Copy link
Contributor

kiiadi commented Dec 14, 2016

What features are you using in TransferManager? We've been actively trying to resolve any deadlock issues that come up whilst using TransferManager (latest one was #896). In the next major-version of the SDK we'll be looking to make use of Java 7's new ForkJoinPool to better manage tasks/sub-tasks.

@mackrorysd
Copy link
Author

All the code that uses TransferManager can be seen here: https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java. We're seeing deadlocks when S3AFileSystem.rename(), which ultimately calls TransferManager.copy(), is called from multiple threads.

@rdifalco
Copy link

rdifalco commented Jan 5, 2017

Does using a bounded queue with a ThreadPoolExecutor.CallerRunsPolicy solve this issue? That BlockingThreadPoolExecutorService is not a great idea and would create deadlock issues in other systems besides just the AWS TransferManager. Deadlocks are one of the main reasons Java does not provide a bounded executor that blocks.

@mackrorysd
Copy link
Author

Using CallerRunsPolicy fixed this specific issue as long as it was used directly and not nested in the BlockingThreadPoolExecutorService, but that would leave us with the original that the back-pressure was meant to fix. For now, we've gone with having a separate unbounded threadpool just for the TransferManager, and the previous design still in use for everything else.

@kiiadi
Copy link
Contributor

kiiadi commented Aug 16, 2017

@mackrorysd I don't see much we can do in the current TransferManager without changing the behaviour to the point that it's backwards incompatible. However - we've recently released a developer preview of v2 of the SDK (see https://github.com/aws/aws-sdk-java-v2), one of the highlights is a fully non-blocking IO pluggable HTTP implementation. We haven't ported any of the high-level APIs yet but we do plan to (here's the tracking issue for TransferManager: aws/aws-sdk-java-v2#37).

I'm going to close this for now because it seems that you found a way forward that allowed you to get unblocked. We'd love to get feedback on v2 of the SDK including ideas for how to make TransferManager better.

@kiiadi kiiadi closed this as completed Aug 16, 2017
@debora-ito debora-ito added response-requested Waiting on additional info or feedback. Will move to "closing-soon" in 5 days. and removed needs-response labels Feb 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
response-requested Waiting on additional info or feedback. Will move to "closing-soon" in 5 days.
Projects
None yet
Development

No branches or pull requests

5 participants