Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent testbed failure in CI on Linux #2648

Closed
freakboy3742 opened this issue Jun 13, 2024 · 10 comments · Fixed by #2658
Closed

Intermittent testbed failure in CI on Linux #2648

freakboy3742 opened this issue Jun 13, 2024 · 10 comments · Fixed by #2658
Labels
bug A crash or error in behavior. linux The issue relates Linux support.

Comments

@freakboy3742
Copy link
Member

Describe the bug

We have an intermittent build failure on the GTK testbed test, with a segfault from the PyGObject layer.

Steps to reproduce

Run the Linux testbed test. The failure isn't especially reproducible; re-running the test suite almost always passes. this is the most recent example.

Expected behavior

Test suite should pass without error.

Screenshots

No response

Environment

  • Operating System: Ubuntu 22.04
  • Python version: 3.10
  • Software versions:
    • Toga: 0.4.5+

Logs

tests/widgets/test_selection.py::test_flex_horizontal_widget_size <- tests/widgets/properties.py PASSED [ 64%]
tests/widgets/test_selection.py::test_font <- tests/widgets/properties.py PASSED [ 64%]
tests/widgets/test_selection.py::test_font_attrs <- tests/widgets/properties.py PASSED [ 65%]
tests/widgets/test_selection.py::test_item_titles Fatal Python error: Aborted

Current thread 0x00007f2d05dfb640 (most recent call first):
  Garbage-collecting
  File "/usr/lib/python3.10/inspect.py", line 2969 in __init__
  File "/usr/lib/python3.10/inspect.py", line 2370 in _signature_from_function
  File "/usr/lib/python3.10/inspect.py", line 2463 in _signature_from_callable
  File "/usr/lib/python3.10/inspect.py", line 3002 in from_callable
  File "/usr/lib/python3.10/inspect.py", line 3254 in signature
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/pytest_asyncio/plugin.py", line 240 in _add_kwargs
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/pytest_asyncio/plugin.py", line 278 in _asyncgen_fixture_wrapper
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/fixtures.py", line 907 in call_fixture_func
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/fixtures.py", line 1128 in pytest_fixture_setup
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/pluggy/_callers.py", line 103 in _multicall
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/pluggy/_hooks.py", line 513 in __call__
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/fixtures.py", line 1074 in execute
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/fixtures.py", line 676 in _compute_fixture_value
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/fixtures.py", line 590 in _get_active_fixturedef
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/fixtures.py", line 568 in getfixturevalue
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/fixtures.py", line 549 in _fillfixtures
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/python.py", line 1792 in setup
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/runner.py", line 492 in setup
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/runner.py", line 155 in pytest_runtest_setup
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/pluggy/_callers.py", line 103 in _multicall
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/pluggy/_hooks.py", line 513 in __call__
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/runner.py", line 260 in <lambda>
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/runner.py", line 339 in from_call
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/runner.py", line 259 in call_runtest_hook
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/runner.py", line 220 in call_and_report
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/runner.py", line 125 in runtestprotocol
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/runner.py", line 112 in pytest_runtest_protocol
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/pluggy/_callers.py", line 103 in _multicall
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/pluggy/_hooks.py", line 513 in __call__
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/main.py", line 349 in pytest_runtestloop
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/pluggy/_callers.py", line 103 in _multicall
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/pluggy/_hooks.py", line 513 in __call__
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/main.py", line 324 in _main
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/main.py", line 270 in wrap_session
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/main.py", line 317 in pytest_cmdline_main
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/pluggy/_callers.py", line 103 in _multicall
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/pluggy/_hooks.py", line 513 in __call__
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/_pytest/config/__init__.py", line 167 in main
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app/tests/testbed.py", line 29 in run_tests
  File "/usr/lib/python3.10/threading.py", line 953 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007f2d17eb5480 (most recent call first):
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/gi/overrides/Gio.py", line 42 in run
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/gbulb/glib_events.py", line 839 in run
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/gbulb/gtk.py", line 39 in run
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/gbulb/glib_events.py", line 886 in run_forever
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/toga_gtk/app.py", line 198 in main_loop
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app_packages/toga/app.py", line 632 in main_loop
  File "/home/runner/work/toga/toga/testbed/build/testbed/ubuntu/jammy/testbed-0.0.1/usr/lib/testbed/app/tests/testbed.py", line 163 in <module>
  File "/usr/lib/python3.10/runpy.py", line 86 in _run_code
  File "/usr/lib/python3.10/runpy.py", line 196 in _run_module_as_main

Extension modules: gi._gi, cairo._cairo, gi._gi_cairo, PIL._imaging, PIL._imagingft (total: 5)

Test suite didn't report a result.

Additional context

No response

@freakboy3742 freakboy3742 added bug A crash or error in behavior. linux The issue relates Linux support. labels Jun 13, 2024
@rmartin16
Copy link
Member

rmartin16 commented Jun 14, 2024

I was finally able to get a stacktrace for this:

#10 0x00007ba90fcd93fd in WTFCrashWithInfo () at /usr/src/webkit2gtk-2.44.2-0ubuntu0.24.04.1/build-soup3/WTF/Headers/wtf/Assertions.h:780
#11 WebKit::WebProcessPool::pageEndUsingWebsiteDataStore () at /usr/src/webkit2gtk-2.44.2-0ubuntu0.24.04.1/Source/WebKit/UIProcess/WebProcessPool.cpp:1314
#12 0x00007ba9101fe050 in WebKit::WebProcessProxy::removeWebPage () at /usr/src/webkit2gtk-2.44.2-0ubuntu0.24.04.1/Source/WebKit/UIProcess/WebProcessProxy.cpp:843
#13 0x00007ba910200555 in WebKit::WebPageProxy::close () at /usr/src/webkit2gtk-2.44.2-0ubuntu0.24.04.1/Source/WebKit/UIProcess/WebPageProxy.cpp:1558
#14 0x00007ba9102dea7b in webkitWebViewBaseDispose () at /usr/src/webkit2gtk-2.44.2-0ubuntu0.24.04.1/Source/WebKit/UIProcess/API/gtk/WebKitWebViewBase.cpp:858
#15 0x00007ba94c2593fe in g_object_unref (_object=0x58ca47818e30) at ../../../gobject/gobject.c:4381

The first assert is apparently failing...

void WebProcessPool::pageEndUsingWebsiteDataStore(WebPageProxy& page, WebsiteDataStore& dataStore)
{
    RELEASE_ASSERT(RunLoop::isMain());
    auto sessionID = dataStore.sessionID();
    RELEASE_ASSERT(m_sessionToPageIDsMap.isValidKey(dataStore.sessionID()));
    auto iterator = m_sessionToPageIDsMap.find(sessionID);
    RELEASE_ASSERT(iterator != m_sessionToPageIDsMap.end());
    ...

https://github.com/WebKit/WebKit/blob/f736325e66bfa8e85f85387299448476f3e1fb3c/Source/WebKit/UIProcess/WebProcessPool.cpp#L1312-L1318

sooo....its not running on the main thread....but only gets upset sometimes....

@rmartin16
Copy link
Member

fwiw, finally got a stacktrace on ubuntu 22.04 as well to confirm they match:

#10 0x000070097faf421b in WTFCrashWithInfo(int, char const*, char const*, int) () at WTF/Headers/wtf/Assertions.h:780
#11 WebKit::WebProcessPool::pageEndUsingWebsiteDataStore(WebKit::WebPageProxy&, WebKit::WebsiteDataStore&) ()
    at ./Source/WebKit/UIProcess/WebProcessPool.cpp:1314
#12 0x000070098003d0ae in WebKit::WebProcessProxy::removeWebPage(WebKit::WebPageProxy&, WebKit::WebProcessProxy::EndsUsingDataStore) () at ./Source/WebKit/UIProcess/WebProcessProxy.cpp:843
#13 0x000070098003f68c in WebKit::WebPageProxy::close() () at ./Source/WebKit/UIProcess/WebPageProxy.cpp:1558
#14 0x0000700980120c95 in webkitWebViewBaseDispose() () at ./Source/WebKit/UIProcess/API/gtk/WebKitWebViewBase.cpp:858
#15 0x00007009cbcd9ed1 in g_object_unref (_object=<optimized out>) at ../../../gobject/gobject.c:3648

@freakboy3742
Copy link
Member Author

FWIW: There's an existing hack in the testbed's webkit tests to work around something very similar to this. The issue seems to be that Webkit starts its own threads, and the lifecycle of those threads interferes with pytest's process of creating and destroying widgets. It's likely only an issue because the testbed is rapidly creating and destroying widgets; I don't know if the existing workaround can be patched, or if we need to revisit this entirely.

@freakboy3742
Copy link
Member Author

Also - of interest: you're seeing the crash in Webkit (which we've at least seen before); but the CI failure was occurring in test_selection.

@rmartin16
Copy link
Member

Also - of interest: you're seeing the crash in Webkit (which we've at least seen before); but the CI failure was occurring in test_selection.

AFAICT, the precise point at which the crash occurs in indeterminate because it happens in the garbage collector.

@rmartin16
Copy link
Member

rmartin16 commented Jun 15, 2024

I've been playing around with this, though, and can more or less reliably invoke the crash.

The WebView is created in the thread for the testbed app....but if I call gc.collect often from the main thread, it'll trigger the assertion failure because the thread trying to dispose of the WebView is not the app thread. That's my theory, though....I've never thought much about how GC works in threads.

@rmartin16
Copy link
Member

Added a few print statements and a pytest autouse fixture to call gc.collect():

> briefcase dev -r --test -- -s -qqq -rP tests/widgets/test_webview.py 

[testbed] Running test suite in dev environment...
===========================================================================
App thread id: 0x7d7617b3fb80
Waiting for app to be ready for testing... ready.
Running gc in thread ID: 0x7d76092006c0
Created WebView in thread ID: 0x7d7617b3fb80
Running gc in thread ID: 0x7d76092006c0
Fatal Python error: Aborted

Current thread 0x00007d76092006c0 (most recent call first):
  Garbage-collecting
  File "/home/user/github/beeware/toga/testbed/tests/conftest.py", line 24 in gc_collect
  ...

Thread 0x00007d7617b3fb80 (most recent call first):
  File "/home/user/.pyenv/versions/briefcase-3.12/lib/python3.12/site-packages/gbulb/glib_events.py", line 175 in __callback__
  File "/home/user/.pyenv/versions/briefcase-3.12/lib/python3.12/site-packages/gi/overrides/Gio.py", line 42 in run
  ...

@freakboy3742
Copy link
Member Author

Another example of a crash; this time in splitcontainer. That indicates it's at least partially a Heisenbug; but also of interest is that the failure is before the WebView tests.

@rmartin16
Copy link
Member

Yeah; that's because garbage collecting MapView can cause the error as well.

@rmartin16
Copy link
Member

rmartin16 commented Jun 15, 2024

I had overlooked this apparently but the testbed tests already contain one mitigation for this crash introduced in 6c8877c:

if toga.platform.current_platform == "linux":
# On Gtk, ensure that the WebView is garbage collection before the next test
# case. This prevents a segfault at GC time likely coming from the test suite
# running in a thread and Gtk WebViews sharing resources between instances.
del widget
gc.collect()

The same exists in the MapView tests.

This must be decreasing the likelihood of the issue occurring....but the WebKit2 WebView must still be escaping garbage collection until later allowing the crash to still happen.

Another approach altogether here may be to instead keep the WebView referenced somewhere so garbage collection doesn't try to do anything with them....

[edit] I now see I was being directed to this earlier in the conversation :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A crash or error in behavior. linux The issue relates Linux support.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants