-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix and improve the reporting of errors from timeouts and crashes #82
Conversation
18e5a5b
to
7c8a050
Compare
bd02082
to
cf72317
Compare
writing tests for timeouts is pretty tricky i added tests to |
<property name="dd_tags[perf.eval_time]" value="30.008791499"></property> | ||
</properties> | ||
</testcase> | ||
<testcase name="Timeout first, pass after" timestamp="2023-06-21T12:12:48.263" time="30.029" tests="1" skipped="0" failures="0" errors="0"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since timeouts are not supported with nworkers=0
(#87), the "timeout tests" both run once and pass once (hence we have the addition here of an entry each for testcase name="Timeout always"
and testcase name="Timeout first, pass after"
)
Timed out after 5s evaluating test item "Timeout always" (run=2) | ||
</error> | ||
</testcase> | ||
<testcase name="Timeout always" timestamp="2023-06-21T13:40:26.634" time="0.0" tests="1" skipped="0" failures="0" errors="1"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All 3 runs of Timeout always
should be here, i.e. there should be 3 <testcase name="Timeout always"
entries
Timed out after 5s evaluating test item "Timeout first, pass after" (run=1) | ||
</error> | ||
</testcase> | ||
<testcase name="Timeout first, pass after" timestamp="2023-06-21T13:40:33.616" time="0.062" tests="1" skipped="0" failures="0" errors="0"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there should be 2 <testcase name="Timeout first, pass after"
(one timeout, then one pass)
src/ReTestItems.jl
Outdated
Recording test error." | ||
record_worker_terminated!(testitem, run_number) | ||
elseif e isa TestSetFailure | ||
# We already printed the error and recorded the testset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where do we print the error? It seems we previously had a block that was removed above that did println(DEFAULT_STDOUT[])
. were we double printing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we had
if !(e isa TestSetFailure)
println(DEFAULT_STDOUT[])
# Explicitly show captured logs or say there weren't any in case we're about
# to terminte the worker
_print_captured_logs(DEFAULT_STDOUT[], testitem, nretries + 1)
end
but that moved into the if e isa TimeoutException
block (since it doesn't actually apply in the worker crash case i don't think? but maybe it should be in that branch too?)
if e isa TimeoutException
@debugv 1 "Test item $(repr(testitem.name)) timed out. Terminating worker $worker"
# Explicitly show captured logs or say there weren't any before terminating worker
println(DEFAULT_STDOUT[])
_print_captured_logs(DEFAULT_STDOUT[], testitem, run_number)
terminate!(worker)
wait(worker)
...
For the e isa TestSetFailure
case, we only go to this whole catch
block when we call throw_if_failed
, and we have already called print_errors_and_captured_logs
before that
Does this addition to the comment help?
# We already printed the error and recorded the testset. | |
# We already printed the error and recorded the testset before calling `throw_if_failed`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we still want to try and print in the worker crash case, right @Drvi? like, ideally we get the segfault backtrace printed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yeah, makes sense -- i was misled by the in case we're about to terminate the worker
comment, which i thought meant these logs wouldn't exist if the worker had already terminated, but that's not the case (idk what that comment was meaning to say)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, did a little refactor again so that everything related to the simple test failures is handled above, rather than purposefully throwing and then handling some stuff down here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in case we're about to terminate the worker
I don't remember, but I think that at some point we just wanted to be extra sure we didn't lose any logs from dead workers, so we explicitly print "No Captured Logs" to signal that the reporting process didn't fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM; merge away!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall great cleanup. I left one question that I wasn't sure I followed.
test/reference/
JUnit XML reports"test item \"foo\" didn't succeed"
"Worker aborted evaluating test item \"foo\""
or"Timed out after XmYs evaluating test item \"foo\""