Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-37060][CORE] Handle driver status response from backup masters #34331

Closed
wants to merge 1 commit into from
Closed

[SPARK-37060][CORE] Handle driver status response from backup masters #34331

wants to merge 1 commit into from

Conversation

testsgmr
Copy link
Contributor

@testsgmr testsgmr commented Oct 19, 2021

What changes were proposed in this pull request?

After an improvement in SPARK-31486, contributor uses 'asyncSendToMasterAndForwardReply' method instead of 'activeMasterEndpoint.askSync' to get the status of driver. Since the driver's status is only available in active master and the 'asyncSendToMasterAndForwardReply' method iterate over all of the masters, we have to handle the response from the backup masters in the client, which the developer did not consider in the SPARK-31486 change. So drivers running in cluster mode and on a cluster with multi masters affected by this bug.

Why are the changes needed?

We need to find if the response received from a backup master client must ignore it.

Does this PR introduce any user-facing change?

No, It's only fixed a bug and brings back the ability to deploy in cluster mode on multi-master clusters.

How was this patch tested?

@github-actions github-actions bot added the CORE label Oct 19, 2021
@testsgmr testsgmr changed the title [SPARK-37060][Core] handle driver status response from backup masters [SPARK-37060][Core] Handle driver status response from backup masters Oct 19, 2021
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@testsgmr testsgmr changed the title [SPARK-37060][Core] Handle driver status response from backup masters [SPARK-37060][CORE] Handle driver status response from backup masters Oct 20, 2021
@Ngone51
Copy link
Member

Ngone51 commented Nov 12, 2021

cc @akshatb1 @srowen who involved in the previous PR: #28258

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know enough to review this. Can you CC the author of the change before this?

@testsgmr
Copy link
Contributor Author

Could someone please review these changes? CC: @cloud-fan @Ngone51 @HeartSaVioR @jiangxb1987

@Ngone51
Copy link
Member

Ngone51 commented Dec 14, 2021

@mohamadrezarostami you may need to rebase your branch to pass GA.

@testsgmr
Copy link
Contributor Author

@mohamadrezarostami you may need to rebase your branch to pass GA.

Done!

@Ngone51 Ngone51 closed this in 958c797 Dec 15, 2021
@Ngone51
Copy link
Member

Ngone51 commented Dec 15, 2021

Thanks, merged to master.

@mohamadrezarostami Could you create PRs to backport this to branch-3.2/branch-3.1?

@testsgmr testsgmr deleted the fix-bug-in-report-driver-status branch December 15, 2021 13:08
@testsgmr testsgmr restored the fix-bug-in-report-driver-status branch December 15, 2021 13:10
@testsgmr
Copy link
Contributor Author

@Ngone51
Thanks for review.
Yes, I will create PRs for branch-3.2 and branch-3.1, too.

@testsgmr testsgmr deleted the fix-bug-in-report-driver-status branch December 15, 2021 13:21
@testsgmr testsgmr restored the fix-bug-in-report-driver-status branch December 15, 2021 13:21
@testsgmr testsgmr deleted the fix-bug-in-report-driver-status branch December 15, 2021 13:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants