-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wait for pending ml tasks in docs tests #44123
Wait for pending ml tasks in docs tests #44123
Conversation
Pinging @elastic/es-docs |
@After | ||
public void cleanup() throws Exception { | ||
if (isMachineLearningTest() || isDataFrameTest()) { | ||
ESRestTestCase.waitForPendingTasks(adminClient()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder how bad it'd be to do this after every test. I don't feel great about relying on stuff in the test name. It just feels a bit too magical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a little bit complicated because Rollups do the wait in the base ESRestTestCase
Additionally some tests leave tasks running. get-follow-info.asciidoc
line 38 is a good example as it creates various CCR tasks which will be waited on indefinitely unless the test teardown is run. Interestingly what appears to be happening is the @After
method of this class is called before the test teardown
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly what appears to be happening is the
@After
method of this class is called before the test teardown
Weird!
I'm not a big fan of leaving things running in those tests either. Is there a way you could do something like the rollups here? It looks like it only cares about rollup style jobs. Does ml have something similar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah rollups filter the waiting tasks with taskName.startsWith("xpack/rollup/job")
and we can do something similar with ml jobs but the action causing the leakage in #43271 is indexing a document not an ml task. Waiting for all tasks catches unexpected issues and actually helps debugging tests that have failed due to leakage from a previous test, experience from using this in XPackRestIT
has shown that it is very valuable.
If I remove the if (isMachineLearningTest() || isDataFrameTest()) {
check then the tests that fail with pending tasks are ccr and rollup. I'll look into what's happening there and maybe there is a way of removing the _if ml ... _ conditional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a look at the Rollup and CCR tests, unfortunately it is not possible to wait for pending tasks after every test because those tests require special handling. I cannot see a way to simplify the logic and I think the current code is best as it is explicitly for the ml & data frame tests.
Also as more xpack feature snippet testing is added I would expect more usages of the pattern e.g. if (isSecurityTest()) { // security specific cleanup
Using the test name to determine if the test is an ml test is a valid use. XPackRestIT set the precedent some time ago and it has not caused problems there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really not a fan of looking at the test name. I know XPackRestIT does it and I think it is sneaky black magic that will cause tests to fail in very difficult ways to trace. One badly named test invoking ml will cause subsequent tests to fail. Sometimes. Randomly.
I'm ok with merging this, but I'd really like a follow up issue to remove it somehow. Because I'm 100% sure somebody is going to lose many hours to debugging errors caused by a funny named test one day.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you detect a data frame test or ML test by looking at the public API somehow? Like by looking for jobs or something.....
@After | ||
public void cleanup() throws Exception { | ||
if (isMachineLearningTest() || isDataFrameTest()) { | ||
ESRestTestCase.waitForPendingTasks(adminClient()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really not a fan of looking at the test name. I know XPackRestIT does it and I think it is sneaky black magic that will cause tests to fail in very difficult ways to trace. One badly named test invoking ml will cause subsequent tests to fail. Sometimes. Randomly.
I'm ok with merging this, but I'd really like a follow up issue to remove it somehow. Because I'm 100% sure somebody is going to lose many hours to debugging errors caused by a funny named test one day.
@After | ||
public void cleanup() throws Exception { | ||
if (isMachineLearningTest() || isDataFrameTest()) { | ||
ESRestTestCase.waitForPendingTasks(adminClient()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you detect a data frame test or ML test by looking at the public API somehow? Like by looking for jobs or something.....
ML and Data Frame tests should wait for pending tasks
ML and Data Frame tests should wait for pending tasks
#43271 describes the problem where PUTing a ml job or data frame causes a notification document (saying something like Job X created) to be written to the
ml-notifications
index. This is done async and can occur after the test has finished and the teardown deleting indices has completed causing the index to be recreated and leaking into the next test.This is a known issue XPackRestIT handles this by waiting for pending tasks to complete. This change adds the same step to
DocsClientYamlTestSuiteIT
Unmutes the muted ml and data frame tests and closes #43271
XPackRestIT
also has logic to stop datafeeds and close jobs post test that isn't necessary here as none of the tests start a job or data frame but may be required in the future