Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wait for pending ml tasks in docs tests #44123

Merged
merged 4 commits into from
Jul 15, 2019

Conversation

davidkyle
Copy link
Member

@davidkyle davidkyle commented Jul 9, 2019

#43271 describes the problem where PUTing a ml job or data frame causes a notification document (saying something like Job X created) to be written to the ml-notifications index. This is done async and can occur after the test has finished and the teardown deleting indices has completed causing the index to be recreated and leaking into the next test.

This is a known issue XPackRestIT handles this by waiting for pending tasks to complete. This change adds the same step to DocsClientYamlTestSuiteIT

Unmutes the muted ml and data frame tests and closes #43271

XPackRestIT also has logic to stop datafeeds and close jobs post test that isn't necessary here as none of the tests start a job or data frame but may be required in the future

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-docs

@davidkyle davidkyle requested a review from nik9000 July 10, 2019 14:35
@After
public void cleanup() throws Exception {
if (isMachineLearningTest() || isDataFrameTest()) {
ESRestTestCase.waitForPendingTasks(adminClient());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how bad it'd be to do this after every test. I don't feel great about relying on stuff in the test name. It just feels a bit too magical.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a little bit complicated because Rollups do the wait in the base ESRestTestCase

Additionally some tests leave tasks running. get-follow-info.asciidoc line 38 is a good example as it creates various CCR tasks which will be waited on indefinitely unless the test teardown is run. Interestingly what appears to be happening is the @After method of this class is called before the test teardown

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly what appears to be happening is the @After method of this class is called before the test teardown

Weird!

I'm not a big fan of leaving things running in those tests either. Is there a way you could do something like the rollups here? It looks like it only cares about rollup style jobs. Does ml have something similar?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah rollups filter the waiting tasks with taskName.startsWith("xpack/rollup/job") and we can do something similar with ml jobs but the action causing the leakage in #43271 is indexing a document not an ml task. Waiting for all tasks catches unexpected issues and actually helps debugging tests that have failed due to leakage from a previous test, experience from using this in XPackRestIT has shown that it is very valuable.

If I remove the if (isMachineLearningTest() || isDataFrameTest()) { check then the tests that fail with pending tasks are ccr and rollup. I'll look into what's happening there and maybe there is a way of removing the _if ml ... _ conditional

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a look at the Rollup and CCR tests, unfortunately it is not possible to wait for pending tasks after every test because those tests require special handling. I cannot see a way to simplify the logic and I think the current code is best as it is explicitly for the ml & data frame tests.

Also as more xpack feature snippet testing is added I would expect more usages of the pattern e.g. if (isSecurityTest()) { // security specific cleanup

Using the test name to determine if the test is an ml test is a valid use. XPackRestIT set the precedent some time ago and it has not caused problems there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really not a fan of looking at the test name. I know XPackRestIT does it and I think it is sneaky black magic that will cause tests to fail in very difficult ways to trace. One badly named test invoking ml will cause subsequent tests to fail. Sometimes. Randomly.

I'm ok with merging this, but I'd really like a follow up issue to remove it somehow. Because I'm 100% sure somebody is going to lose many hours to debugging errors caused by a funny named test one day.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you detect a data frame test or ML test by looking at the public API somehow? Like by looking for jobs or something.....

@After
public void cleanup() throws Exception {
if (isMachineLearningTest() || isDataFrameTest()) {
ESRestTestCase.waitForPendingTasks(adminClient());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really not a fan of looking at the test name. I know XPackRestIT does it and I think it is sneaky black magic that will cause tests to fail in very difficult ways to trace. One badly named test invoking ml will cause subsequent tests to fail. Sometimes. Randomly.

I'm ok with merging this, but I'd really like a follow up issue to remove it somehow. Because I'm 100% sure somebody is going to lose many hours to debugging errors caused by a funny named test one day.

@After
public void cleanup() throws Exception {
if (isMachineLearningTest() || isDataFrameTest()) {
ESRestTestCase.waitForPendingTasks(adminClient());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you detect a data frame test or ML test by looking at the public API somehow? Like by looking for jobs or something.....

@davidkyle davidkyle merged commit 4402cf3 into elastic:master Jul 15, 2019
@davidkyle davidkyle deleted the docs-tests-wait-for-pending branch July 15, 2019 10:58
davidkyle added a commit that referenced this pull request Jul 15, 2019
ML and Data Frame tests should wait for pending tasks
davidkyle added a commit that referenced this pull request Jul 15, 2019
ML and Data Frame tests should wait for pending tasks
@jpountz jpountz added the >test Issues or PRs that are addressing/adding tests label Jul 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>test Issues or PRs that are addressing/adding tests v7.3.0 v7.4.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI] Internal data frame indices can cause unrelated docs tests to fail
5 participants