-
Notifications
You must be signed in to change notification settings - Fork 13.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-36451][runtime] Replaces LeaderElection#hasLeadership with LeaderElection#runAsLeader #25679
Conversation
The most-recent CI run revealed a timeout in MemoryManagerConcurrentModReleaseTest#testConcurrentModificationWhileReleasing which is independent of our changes (see FLINK-35315). |
final TestingLeaderElectionDriver.Builder driverBuilder = | ||
TestingLeaderElectionDriver.newNoOpBuilder(); | ||
TestingLeaderElectionDriver.newBuilder(new AtomicBoolean(true)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was an interesting one because I wondered why we didn't need to set the leadership pre-FLINK-36451.
The reason was a bug in the TestingLeaderContender#grantLeadership method. The old implementation confirmed the leadership and pushed a LeaderInformation event after that call even if the confirmation failed (due to leadership loss). The new implementation is cleaner in this regards
d9d09b6
to
1d23f3c
Compare
* acquired from the {@link LeaderContender} and from the {@link DefaultLeaderElectionService}. | ||
*/ | ||
@Test | ||
void testNestedDeadlockInLeadershipConfirmation() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the test that I added to verify the findings. I kept it in a separate commit to prove that the test runs into a deadlock before applying the changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is the right way to go about it.
currentLeaderConfirmed = embeddedLeaderElection; | ||
currentLeaderAddress = leaderAddress; | ||
currentLeaderProposed = null; | ||
Preconditions.checkState( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm quite worried about this because it'll just blow up an entire JobManager if leadership is lost at the wrong time, when HA would be capable of sorting this out on it's own.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This Precondition is there to indicate some bug. If the EmbeddedLeaderService
isn't properly implemented, the whole setup shouldn't be trusted, no?
assertThat(callbackTriggered).isTrue(); | ||
} | ||
|
||
public static void assertRunAsLeaderOmitsCallbackDueToLostLeadership( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public static void assertRunAsLeaderOmitsCallbackDueToLostLeadership( | |
public static void assertRunAsLeaderSkipsRunnableDueToLostLeadership( |
I think this is a bit clearer.
@@ -108,6 +113,16 @@ void testHasLeadership() throws Exception { | |||
} | |||
} | |||
|
|||
private static void assertRunAsLeaderSuccessfully( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicate with LeaderElectionTestUtils
} catch (Throwable t) { | ||
fatalError(t); | ||
} else { | ||
throw new LeadershipLostException(leaderSessionId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sold on this. It feels like we're using a special exception to do control flow (such that users can ignore certain exceptions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me, it was meant as an invalid state: The callback that's passed into the runAsLeader
method should succeed in the happy scenario. Losing the leadership in the mean time is a an "unusual" state of the system. In this sense, I felt like exceptions being the most appropriate tool to express this.
You're suggesting that we should make runAsLeader
return Completable<Boolean>
indicating whether the callback was executed or not? That would work as well. But from an interface perspective, runAsLeader
is just one way to execute the callback. One could also (theoretically) think of a supplyAsLeader
(as it's implemented in JobMasterServiceLeadershipRunner
right now). In such a case, utilizing the return value of CompletableFuture
and relying on an exception for leadership loss feels like the more natural thing to do.
*/ | ||
boolean hasLeadership(UUID leaderSessionId); | ||
CompletableFuture<Void> runAsLeader( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeeeeah I'm not sold on this.
I struggle to understand the contract as to what should be run via the method.
Is it that everything called in the LeaderContender callbacks must used this method?
It feels very wrong that this method is used to access the JRS for example (even if that internally does things asynchronously), when that has little to do with leader election, which runs counter to the idea of running as few things as possible in the leader election executor.
I also don't quite follow yet why such a big API change is necessary.
Couldn't we have saved ourselves a lot of work by just changing hasLeadership to return a CompletableFuture<Boolean>
?
Now we have this strange API where a sync confirmLeadership method exists but must basically never be called from one side because it risks dead-locks.
Can't we solve the lock acquisition problem by removing the need to call hasLeadership?
We already have an API that signals whether we are the leader or not; can't we write that outside of the contender lock into an AtomicBoolean and use that instead (while ensuring that grant/revoke don't do any heavy lifting)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it that everything called in the LeaderContender callbacks must used this method?
It feels very wrong that this method is used to access the JRS for example (even if that internally does things asynchronously), when that has little to do with leader election, which runs counter to the idea of running as few things as possible in the leader election executor.
I guess you're right on the JRS access: That one can run even if the leadership is not acquired because we only read here (i.e. no side effects).
Even the JobMasterServiceLeadershipRunner#createNewJobMasterServiceProcess
does not need to be protected by the leadership because we can start a new JobMasterServiceProcess
even without being the leader.
The crucial part is that leadership is still acquired when the JobMasterServiceProcess
instantiation is finalized and the future results are forwarded. This still needs to happen under leadership because it triggers side effects.
JobMasterServiceLeadershipRunner#handleJobAlreadyDoneAsLeader
is another operation that needs to run under leadership because it triggers side effects.
I also don't quite follow yet why such a big API change is necessary.
Couldn't we have saved ourselves a lot of work by just changing hasLeadership to return a CompletableFuture?
That's a good point. I thought about it and I think it wouldn't work: An asynchronous hasLeadership
would then run on the leaderOperations thread (in the DefaultLeaderElectionService
). But the chained callback that is called after the async call completes can run concurrently to a leadership revocation again. So that's not a viable solution.
The operations that need to run under leadership need to be performed on the leaderOperations thread. These operations need to be light-weight (i.e. ideally only future completions). And some of the operations that are currently performed under leadership (like the JRS read trigger mentioned above) can be moved out of the leadership context.
Now we have this strange API where a sync confirmLeadership method exists but must basically never be called from one side because it risks dead-locks.
Yeah, the synchronous confirmLeadership
method in the LeaderElection
interface doesn't feel right/inconsistent I have to admit. 🤔 The problem is that we need to check the running state when executing the leadership confirmation because the JobMasterServiceLeadershipRunner#close
closes the LeaderElection
outside of the lock. I.e. the confirm leadership call could be executed after the runner was already shut down.
But looking at the DefaultDispatcherRunner
implementation I notice that we're not relying on the lock for the leadership confirmation there despite having the same problem. Maybe, I have a misconception here. 🤔
Can't we solve the lock acquisition problem by removing the need to call hasLeadership?
We already have an API that signals whether we are the leader or not; can't we write that outside of the contender lock into an AtomicBoolean and use that instead (while ensuring that grant/revoke don't do any heavy lifting)?
No, we need to run certain operations as the leader. Otherwise, it could happen that the job is marked as done despite some other leader picking up the work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the chained callback that is called after the async call completes can run concurrently to a leadership revocation again
Why not? It is a matter of fact that we will occasionally do stuff while we no longer have leadership. Some S3 delete call can be on the wire while we have lost leadership. All the components like DIspatchers/JobManagers/etc can do stuff stuff while the leadership revocation message sits in the akka message queue waiting to be scheduled into the main thread.
The only place where leadership must be truly maintained during the operation is access to HA itself, which uses different primitives. Everything else is best-effort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, I guess you're right. That was my misconception here that we want to ensure leadership beyond the HA which, indeed, is not possible. Let me give it another try with the async hasLeadership approach 🤔
Reviewed by Chi on 28/11/24 business as usual progressing with committer involved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @zentol for looking into the issue. I tried to respond to your remarks. PTAL
*/ | ||
boolean hasLeadership(UUID leaderSessionId); | ||
CompletableFuture<Void> runAsLeader( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it that everything called in the LeaderContender callbacks must used this method?
It feels very wrong that this method is used to access the JRS for example (even if that internally does things asynchronously), when that has little to do with leader election, which runs counter to the idea of running as few things as possible in the leader election executor.
I guess you're right on the JRS access: That one can run even if the leadership is not acquired because we only read here (i.e. no side effects).
Even the JobMasterServiceLeadershipRunner#createNewJobMasterServiceProcess
does not need to be protected by the leadership because we can start a new JobMasterServiceProcess
even without being the leader.
The crucial part is that leadership is still acquired when the JobMasterServiceProcess
instantiation is finalized and the future results are forwarded. This still needs to happen under leadership because it triggers side effects.
JobMasterServiceLeadershipRunner#handleJobAlreadyDoneAsLeader
is another operation that needs to run under leadership because it triggers side effects.
I also don't quite follow yet why such a big API change is necessary.
Couldn't we have saved ourselves a lot of work by just changing hasLeadership to return a CompletableFuture?
That's a good point. I thought about it and I think it wouldn't work: An asynchronous hasLeadership
would then run on the leaderOperations thread (in the DefaultLeaderElectionService
). But the chained callback that is called after the async call completes can run concurrently to a leadership revocation again. So that's not a viable solution.
The operations that need to run under leadership need to be performed on the leaderOperations thread. These operations need to be light-weight (i.e. ideally only future completions). And some of the operations that are currently performed under leadership (like the JRS read trigger mentioned above) can be moved out of the leadership context.
Now we have this strange API where a sync confirmLeadership method exists but must basically never be called from one side because it risks dead-locks.
Yeah, the synchronous confirmLeadership
method in the LeaderElection
interface doesn't feel right/inconsistent I have to admit. 🤔 The problem is that we need to check the running state when executing the leadership confirmation because the JobMasterServiceLeadershipRunner#close
closes the LeaderElection
outside of the lock. I.e. the confirm leadership call could be executed after the runner was already shut down.
But looking at the DefaultDispatcherRunner
implementation I notice that we're not relying on the lock for the leadership confirmation there despite having the same problem. Maybe, I have a misconception here. 🤔
Can't we solve the lock acquisition problem by removing the need to call hasLeadership?
We already have an API that signals whether we are the leader or not; can't we write that outside of the contender lock into an AtomicBoolean and use that instead (while ensuring that grant/revoke don't do any heavy lifting)?
No, we need to run certain operations as the leader. Otherwise, it could happen that the job is marked as done despite some other leader picking up the work.
} catch (Throwable t) { | ||
fatalError(t); | ||
} else { | ||
throw new LeadershipLostException(leaderSessionId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me, it was meant as an invalid state: The callback that's passed into the runAsLeader
method should succeed in the happy scenario. Losing the leadership in the mean time is a an "unusual" state of the system. In this sense, I felt like exceptions being the most appropriate tool to express this.
You're suggesting that we should make runAsLeader
return Completable<Boolean>
indicating whether the callback was executed or not? That would work as well. But from an interface perspective, runAsLeader
is just one way to execute the callback. One could also (theoretically) think of a supplyAsLeader
(as it's implemented in JobMasterServiceLeadershipRunner
right now). In such a case, utilizing the return value of CompletableFuture
and relying on an exception for leadership loss feels like the more natural thing to do.
currentLeaderConfirmed = embeddedLeaderElection; | ||
currentLeaderAddress = leaderAddress; | ||
currentLeaderProposed = null; | ||
Preconditions.checkState( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This Precondition is there to indicate some bug. If the EmbeddedLeaderService
isn't properly implemented, the whole setup shouldn't be trusted, no?
Hi @davidradl can you explain what this comment means? 🤔 |
@XComp I was trying to indicate that the PR was proceeding without any need for the CHI workgroup (which is looking to reduce the Flink technical debt) to do anything. I can use different words if it would be clearer. If we could start using labels rather than comments this would be better - but I do not have authority to create / assign labels in Jira. Does this make sense? |
5f0c169
to
2ca1d6d
Compare
@zentol I pushed the code from the 2nd iteration. This time, |
4c12066
to
a045cf0
Compare
runIfValidLeader( | ||
private CompletableFuture<Void> createNewJobMasterServiceProcessIfValidLeader( | ||
UUID leaderSessionId) { | ||
return runIfValidLeader( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it a problem that this runs in the leadership operation thread now? I suppose not, but it's the one thing that stood out to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// the heavy lifting of the JobMasterServiceProcess instantiation is still
// done asynchronously (see DefaultJobMasterServiceFactory#createJobMasterService
// executing the logic on the leaderOperation thread in the DefaultLeaderElectionService
// should be, therefore, fine
Yeah, that should be fine. I added a comment to the code to clarify this.
One could think of making the interfaces for creating the JobMasterService
more explicit. Moving the instantiation out of the leadership check should be fine as well. But I was hesitant to touch that now since having the instantiation trigger being protected by the leadership is something that was part of the code for a while. 🤔
… the confirmLeadership call because the check happens internally
… to be triggered as a leader since it's read access only
… asynchronously A test case is added that illustrates the concurrent lock acquisition problem in the DefaultLeaderElectionService
Last force-push (diff) squashed the commits and ran |
What is the purpose of the change
See reasoning in FLINK-36451 comment
Brief change log
runAsLeader
method that executes callbacks on the leadership main threadhasLeadership
calls withrunAsLeader
confirmLeadership
method on theleaderOperationExecutor
Verifying this change
hasLeadership
tests in unit test classesDoes this pull request potentially affect one of the following parts:
@Public(Evolving)
: noDocumentation