-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPARK-2634: Change MapOutputTrackerWorker.mapStatuses to ConcurrentHashMap #1541
Conversation
QA tests have started for PR 1541. This patch merges cleanly. |
QA results for PR 1541: |
QA tests have started for PR 1541. This patch merges cleanly. |
QA results for PR 1541: |
Instead of a ConcurrentHashMap, we should actually move it to a disk backed Map - the cleanup of this datastructure is painful - which it can become extremely large; particularly for iterative algo's. We have been using mapdb for this in MapOutputTrackerWorker - and it has worked beautifully. |
Agree. Is there a PR ready? I think this is a critical bug and hope it can be fixed soon. |
When is this accessed concurrently? I looked quickly and can only find updates from the (single-threaded) DAGScheduler event loop. Is the issue that it can be read/written concurrently? |
For example, HashShuffleReader.read -> object BlockStoreShuffleFetcher.fetch -> MapOutputTracker.getServerStatuses. Different |
ping @JoshRosen, could you help take a look at this one? |
Thanks for the reminder. @kayousterhout I looked over @zsxwing's example and I agree that there's a thread-safety issue here. We can definitely have multiple concurrent block fetches that could race when accessing mapStatuses. There's a lot of other state in MapOutputTracker that's guarded with Since the synchronization logic here seems kind of messy / confusing and mapStatuses is only accessed from MapOutputTracker, maybe it would be better to just add proper synchronization around reads/writes to mapStatuses rather than converting it to a ConcurrentHashMap. |
Actually, it looks like the |
But |
I agree that |
You remind me that even if |
ConcurrentHashMap should fix all these issues I found. |
@JoshRosen do you think it's OK? |
This seems fine to me, actually, so I'm going to re-run the tests then merge it. We might want to do some other cleanup in this file, but that can wait for a separate PR. |
QA tests have started for PR 1541 at commit
|
QA tests have finished for PR 1541 at commit
|
Thank you @JoshRosen |
MapOutputTrackerWorker.mapStatuses is used concurrently, it should be thread-safe. This bug has already been fixed in #1328. Nevertheless, considering #1328 won't be merged soon, I send this trivial fix and hope this issue can be solved soon.