Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix flaky testFailStuckSplitTasks unit test #13441

Merged
merged 1 commit into from
Aug 15, 2022

Conversation

leetcode-1533
Copy link
Contributor

@leetcode-1533 leetcode-1533 commented Aug 1, 2022

Description

Fix flaky testFailStuckSplitTasks unit test.

  1. Added wait for task executors to pick up tasks
  2. Close task executor after the unit test finished.

There is a race condition between unit test thread and the task executor thread. Added checking via settableFuture in the unit test thread to ensure task executor to process splits first.

Is this change a fix, improvement, new feature, refactoring, or other?

Fix

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

Query Engine

How would you describe this change to a non-technical end user or system administrator?

Fix flaky testFailStuckSplitTasks unit test.

Related issues, pull requests, and links

#12392
#13437

Documentation

(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

(x) No release notes entries required.
( ) Release notes entries required with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@leetcode-1533 leetcode-1533 force-pushed the FixUnitTest branch 3 times, most recently from c801efc to 97f32a8 Compare August 5, 2022 00:21
Copy link
Member

@phd3 phd3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@leetcode-1533 leetcode-1533 force-pushed the FixUnitTest branch 2 times, most recently from bf7f754 to 62bc7ff Compare August 5, 2022 22:45
@@ -264,6 +265,10 @@ public void testFailStuckSplitTasks()
// Here we explicitly enqueue an indefinite running split runner
taskExecutor.enqueueSplits(taskHandle, false, ImmutableList.of(mockSplitRunner));
taskExecutor.start();
// wait for the task executor to start processing the split
while (!mockSplitRunner.isStarted()) {
Thread.sleep(1000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I assume it might be reasonable to have another future that would identify a split finished execution

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The split is never finished // interrupted. It is only terminated at taskExecutor.stop().

The unit test only verified that the sqlTaskManager could identify the task with a long-running split and mark the task to the failed state.

It is because the sqlTaskManager issues failTask() for the task, which eventually issues a list of callback functions while doing the state transition of the task. In a real Trino cluster, one of which is responsible for interrupting the DriverSplitRunner(an implementation of SplitRunner), however, in the unit test, there is no such callback function. And it is another unit test that checks when doing state changes, the registered callback functions can be called successfully.

Let me know if you have a suggestion for improvement!

@leetcode-1533 leetcode-1533 force-pushed the FixUnitTest branch 3 times, most recently from 1c267b8 to 7cf70e0 Compare August 12, 2022 00:16
Copy link
Contributor

@arhimondr arhimondr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % comments

@arhimondr arhimondr merged commit fa0c96b into trinodb:master Aug 15, 2022
@github-actions github-actions bot added this to the 393 milestone Aug 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants