Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML Data Frame] Refactor stop logic (#42644) #42763

Merged
merged 1 commit into from
Jun 3, 2019

Conversation

benwtrent
Copy link
Member

In AsyncTwoPhaseIndexer. finishAndSetState the onStop and onAbort methods could be called from inside an atomic update. Both those methods are abstract designed to be overridden and that overriding, by an implementor who is not aware of the restrictions (i.e. me), may introduce side effects which are not safe.

I'm not sure what the behaviour is if another thread tries to get() the atomic reference during a call to updateAndGet() but the docs warn against it. I believe this is the cause of the CI failures seen in #42344.

The function should be side-effect-free, since it may be re-applied when attempted updates fail due to contention among threads.

https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/concurrent/atomic/AtomicReference.html#updateAndGet(java.util.function.UnaryOperator)

The methods onStop and onFinish must be called before state is saved (1 to set the correct Data Frame state and 2 to increment the checkpoint) but this ordering means there is a race between onStop completing the persistent task and doSaveState updating the persistent task parameters. Luckily after #41942 it is no longer necessary to update the persistent task as all state is persisted and restored from the index so I have remove the p. task update from doSaveState.

ClientDataFrameIndexer.onStop was also persisting state which is not necessary as doSaveState is called after onStop by the base class.

There is also a refactoring change to AsyncTwoPhaseIndexer.stop to make it work the same as abort. abort does not call onAbort if the indexer is not running it is left to the client code to handle that, stop should use the same pattern as it is confusing to use 2 different paradigms in the class interface.

Finally this un-mutes tests muted for #42641

Closes #42344

* Revert "invalid test"

This reverts commit 9dd8b52.

* Testing

* mend

* Revert "[ML Data Frame] Mute Data Frame tests"

This reverts commit 5d837fa.

* Call onStop and onAbort outside atomic update

* Don’t update CS

* Tidying up

* Remove invalid test that asserted logic that has been removed

* Add stopped event

* Revert "Add stopped event"

This reverts commit 02ba992.

* Adding check for STOPPED in saveState
@benwtrent benwtrent added >test Issues or PRs that are addressing/adding tests backport :ml/Transform Transform v7.3.0 labels May 31, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

@benwtrent benwtrent merged commit 0253927 into elastic:7.x Jun 3, 2019
@benwtrent benwtrent deleted the backport/7.x/pr-42644 branch June 3, 2019 11:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport :ml/Transform Transform >test Issues or PRs that are addressing/adding tests v7.3.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants