Intermittent test failures #19

digitalcora · 2024-08-01T20:40:11Z

This project's test suite fails about half the time for me locally, and CI seems to have a similar ratio.

When all tests pass, re-running with the same random seed usually (but not always!) results in all tests passing.
When tests fail, re-running with the same random seed almost always results in some tests failing, though not exactly the same ones every time.

This suggests some element of random-seed-dependency in the failures, but it's not perfectly reproducible.

The failures I've observed are:

can connect with a path/query — most frequent by far. Usually just says ** (exit) shutdown, with no further information or stack trace, and no warnings/errors in the captured logs. Sometimes instead it's "the process is not alive or there's no process currently associated with the given name". This might be Crash on shutdown in dispatch_awaiting_callers PSPDFKit-labs/bypass#120, a long-standing issue in Bypass, which hasn't been especially well-maintained lately. (Case for looking into Lasso?)
logs a message on invalid status — somewhat frequently, the assertion fails with captured log content being "". This test involves a Process.sleep(100) so it may just be a timing thing where we are not waiting long enough.
applies idle timeout — very rarely times out on the final assert_receive.
reconnects when it can't make a TCP connection
- very rarely EXITs with :badarg, no further information.
- very rarely times out on the final assert_receive.
- rarely fails with ** (exit) no process, no further information (only seen on Actions so far). This might be the same issue as can connect with a path/query.

Additionally, I sometimes see the following error logs, regardless of whether all tests pass or some fail. I include them here in case they're relevant to addressing the test failures.

[label: {:erl_prim_loader, :file_error}, report: 'File operation error: emfile. Target: /.../erlang/24.3.4.17/lib/snmp-5.12.0.3/ebin/Elixir.Mint.TransportError.beam. Function: get_file. Process: code_server.'] — can appear only a few times or hundreds of times. In some cases the Target module is Elixir.Plug.Cowboy.Translator.beam instead. EMFILE is an OS-level error code for "too many open file descriptors".
{removed_failing_handler,'Elixir.Logger'} — printed many times, partially interleaved with itself (???), followed by a DEBUG REPORT with the error remove_handler_failed and the reason attempting_syncronous_call_to_self. No idea with this one.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intermittent test failures #19

Intermittent test failures #19

digitalcora commented Aug 1, 2024 •

edited

Loading

Intermittent test failures #19

Intermittent test failures #19

Comments

digitalcora commented Aug 1, 2024 • edited Loading

digitalcora commented Aug 1, 2024 •

edited

Loading