Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent test failures #19

Open
digitalcora opened this issue Aug 1, 2024 · 0 comments
Open

Intermittent test failures #19

digitalcora opened this issue Aug 1, 2024 · 0 comments

Comments

@digitalcora
Copy link
Contributor

digitalcora commented Aug 1, 2024

This project's test suite fails about half the time for me locally, and CI seems to have a similar ratio.

  • When all tests pass, re-running with the same random seed usually (but not always!) results in all tests passing.
  • When tests fail, re-running with the same random seed almost always results in some tests failing, though not exactly the same ones every time.

This suggests some element of random-seed-dependency in the failures, but it's not perfectly reproducible.

The failures I've observed are:

  • can connect with a path/query — most frequent by far. Usually just says ** (exit) shutdown, with no further information or stack trace, and no warnings/errors in the captured logs. Sometimes instead it's "the process is not alive or there's no process currently associated with the given name". This might be Crash on shutdown in dispatch_awaiting_callers PSPDFKit-labs/bypass#120, a long-standing issue in Bypass, which hasn't been especially well-maintained lately. (Case for looking into Lasso?)

  • logs a message on invalid status — somewhat frequently, the assertion fails with captured log content being "". This test involves a Process.sleep(100) so it may just be a timing thing where we are not waiting long enough.

  • applies idle timeout — very rarely times out on the final assert_receive.

  • reconnects when it can't make a TCP connection

    • very rarely EXITs with :badarg, no further information.
    • very rarely times out on the final assert_receive.
    • rarely fails with ** (exit) no process, no further information (only seen on Actions so far). This might be the same issue as can connect with a path/query.

Additionally, I sometimes see the following error logs, regardless of whether all tests pass or some fail. I include them here in case they're relevant to addressing the test failures.

  • [label: {:erl_prim_loader, :file_error}, report: 'File operation error: emfile. Target: /.../erlang/24.3.4.17/lib/snmp-5.12.0.3/ebin/Elixir.Mint.TransportError.beam. Function: get_file. Process: code_server.'] — can appear only a few times or hundreds of times. In some cases the Target module is Elixir.Plug.Cowboy.Translator.beam instead. EMFILE is an OS-level error code for "too many open file descriptors".

  • {removed_failing_handler,'Elixir.Logger'} — printed many times, partially interleaved with itself (???), followed by a DEBUG REPORT with the error remove_handler_failed and the reason attempting_syncronous_call_to_self. No idea with this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant