[ML Data Frame] Refactor stop logic (#42644) #42763
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In
AsyncTwoPhaseIndexer. finishAndSetState
theonStop
andonAbort
methods could be called from inside an atomic update. Both those methods are abstract designed to be overridden and that overriding, by an implementor who is not aware of the restrictions (i.e. me), may introduce side effects which are not safe.I'm not sure what the behaviour is if another thread tries to
get()
the atomic reference during a call toupdateAndGet()
but the docs warn against it. I believe this is the cause of the CI failures seen in #42344.https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/concurrent/atomic/AtomicReference.html#updateAndGet(java.util.function.UnaryOperator)
The methods
onStop
andonFinish
must be called before state is saved (1 to set the correct Data Frame state and 2 to increment the checkpoint) but this ordering means there is a race betweenonStop
completing the persistent task anddoSaveState
updating the persistent task parameters. Luckily after #41942 it is no longer necessary to update the persistent task as all state is persisted and restored from the index so I have remove the p. task update fromdoSaveState
.ClientDataFrameIndexer.onStop
was also persisting state which is not necessary asdoSaveState
is called afteronStop
by the base class.There is also a refactoring change to
AsyncTwoPhaseIndexer.stop
to make it work the same asabort
.abort
does not callonAbort
if the indexer is not running it is left to the client code to handle that,stop
should use the same pattern as it is confusing to use 2 different paradigms in the class interface.Finally this un-mutes tests muted for #42641
Closes #42344