Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Linux] Run glib event loop on dedicated glib Matter context #25960

Merged
merged 13 commits into from
May 9, 2023

Conversation

arkq
Copy link
Contributor

@arkq arkq commented Apr 4, 2023

Problem

Currently Linux uses GLib global main context when running main even loop. This causes issues with glib-based applications which use default main context in their global main event loop. Such an example is a Linux application based on Qt (Qt on Linux runs GLib global main context in order to link seamlessly with glib-based libraries). The problem is that on Linux platform chip::DeviceLayer::PlatformManagerImpl::_InitChipStack() waits until main event loop is ready to process events on given context (currently, global context). E.g.:

import chip.native
from PySide6.QtWidgets import QApplication

app = QApplication()   # this grabs glib global main context
chip.native.Init(0)    # DEADLOCK: (we need event loop to be running, but our `g_main_loop_run()` waits until context will be released)
app.exec()             # grabbed context will be processed here

See this for reference: #23320 (comment)

This PR is a follow-up for #25916

Changes

  • do not use glib global main context in our glib main event loop
  • rewrite all places where global context was used, to use glib Matter main context (functions like g_idle_add() or g_io_add_watch_full() can not be used any more)
  • add asserts to check whether g_main_context_get_thread_default() was set in all places where g_signal_connect() or GIO call are made

Testing

Tested Python bindings:

>>> import chip.native
>>> from PySide6.QtWidgets import QApplication
>>> app = QApplication()
>>> chip.native.Init(0)
[1680609380.171509][1551207:1551207] CHIP:CTL: Setting attestation nonce to random value
[1680609380.172885][1551207:1551207] CHIP:CTL: Setting CSR nonce to random value
[1680609380.173417][1551207:1551207] CHIP:DL: ChipLinuxStorage::Init: Using KVS config file: /tmp/chip_kvs
[1680609380.173493][1551207:1551207] CHIP:DL: ChipLinuxStorage::Init: Using KVS config file: /tmp/chip_factory.ini
[1680609380.173560][1551207:1551207] CHIP:DL: ChipLinuxStorage::Init: Using KVS config file: /tmp/chip_config.ini
[1680609380.173605][1551207:1551207] CHIP:DL: ChipLinuxStorage::Init: Using KVS config file: /tmp/chip_counters.ini
[1680609380.173770][1551207:1551207] CHIP:DL: writing settings to file (/tmp/chip_counters.ini-yTuqpO)
[1680609380.173865][1551207:1551207] CHIP:DL: renamed tmp file to file (/tmp/chip_counters.ini)
[1680609380.173879][1551207:1551207] CHIP:DL: NVS set: chip-counters/reboot-count = 3 (0x3)
[1680609380.174271][1551207:1551207] CHIP:DL: Got Ethernet interface: eno1
[1680609380.174567][1551207:1551207] CHIP:DL: Found the primary Ethernet interface:eno1
[1680609380.175111][1551207:1551207] CHIP:DL: Got WiFi interface: wlxf4f26d1a6d27
[1680609380.175629][1551207:1551207] CHIP:DL: Found the primary WiFi interface:wlxf4f26d1a6d27
>>> app.exec()

Tested BLE commissioning with

./chip-lighting-app --wifi
./out/linux-x64-chip-tool/chip-tool pairing ble-wifi 1 <SSID> <password> 20202021 3840

Postscriptum

For anyone reading this description because of some break caused by this PR:

In order to quickly verify that the problem is with glib global main context, one can add this snippet at the beginning of PlatformManagerImpl::_InitChipStack():

    g_thread_unref(g_thread_new(
        "gmain-global",
        [](void *) -> void * {
            g_main_loop_run(g_main_loop_new(nullptr, FALSE));
            return nullptr;
        },
        nullptr));

If this will help, it means that somewhere there is a source attached to the glib global main context. The next step is to find this place and use GLibMatterContextAttachSource() if applicable or run particular block of code with GLibMatterContextInvokeSync(). For finding GIO calls attached to glib global main context, one can use gdb -x <script> --args <cmd> with script:

start
# using global context in g_main_context_ref_thread_default
# https://github.com/GNOME/glib/blob/180713772f4e7bcdddf2c793f2f34a498184ed15/glib/gmain.c#L999
break ../../../glib/gmain.c:999
commands
backtrace
end
# run app
continue

@arkq
Copy link
Contributor Author

arkq commented Apr 4, 2023

@yzm157 can you verify on your side whether this PR fixes the issue with Qt?

@arkq arkq changed the title Glib matter context [Linux] Run glib event loop on dedicated glib Matter context Apr 4, 2023
@arkq arkq requested a review from msandstedt April 4, 2023 12:02
@github-actions
Copy link

github-actions bot commented Apr 4, 2023

PR #25960: Size comparison from d81d4e3 to 2f96585

Decreases (1 build for cc32xx)
platform target config section d81d4e3 2f96585 change % change
cc32xx lock CC3235SF_LAUNCHXL .debug_info 20302987 20302986 -1 -0.0
Full report (1 build for cc32xx)
platform target config section d81d4e3 2f96585 change % change
cc32xx lock CC3235SF_LAUNCHXL 0 0 0 0.0
(read only) 644409 644409 0 0.0
(read/write) 203848 203848 0 0.0
.ARM.attributes 44 44 0 0.0
.ARM.exidx 8 8 0 0.0
.bss 197248 197248 0 0.0
.comment 194 194 0 0.0
.data 1480 1480 0 0.0
.debug_abbrev 933099 933099 0 0.0
.debug_aranges 87608 87608 0 0.0
.debug_frame 301328 301328 0 0.0
.debug_info 20302987 20302986 -1 -0.0
.debug_line 2679748 2679748 0 0.0
.debug_loc 2824679 2824679 0 0.0
.debug_ranges 286200 286200 0 0.0
.debug_str 3039406 3039406 0 0.0
.ramVecs 780 780 0 0.0
.resetVecs 64 64 0 0.0
.rodata 105953 105953 0 0.0
.shstrtab 232 232 0 0.0
.stab 204 204 0 0.0
.stabstr 441 441 0 0.0
.stack 2048 2048 0 0.0
.strtab 377422 377422 0 0.0
.symtab 256768 256768 0 0.0
.text 536336 536336 0 0.0

@github-actions
Copy link

github-actions bot commented Apr 4, 2023

PR #25960: Size comparison from d81d4e3 to 58ba991

Full report (1 build for cc32xx)
platform target config section d81d4e3 58ba991 change % change
cc32xx lock CC3235SF_LAUNCHXL 0 0 0 0.0
(read only) 644409 644409 0 0.0
(read/write) 203848 203848 0 0.0
.ARM.attributes 44 44 0 0.0
.ARM.exidx 8 8 0 0.0
.bss 197248 197248 0 0.0
.comment 194 194 0 0.0
.data 1480 1480 0 0.0
.debug_abbrev 933099 933099 0 0.0
.debug_aranges 87608 87608 0 0.0
.debug_frame 301328 301328 0 0.0
.debug_info 20302987 20302987 0 0.0
.debug_line 2679748 2679748 0 0.0
.debug_loc 2824679 2824679 0 0.0
.debug_ranges 286200 286200 0 0.0
.debug_str 3039406 3039406 0 0.0
.ramVecs 780 780 0 0.0
.resetVecs 64 64 0 0.0
.rodata 105953 105953 0 0.0
.shstrtab 232 232 0 0.0
.stab 204 204 0 0.0
.stabstr 441 441 0 0.0
.stack 2048 2048 0 0.0
.strtab 377422 377422 0 0.0
.symtab 256768 256768 0 0.0
.text 536336 536336 0 0.0

@yzm157
Copy link
Contributor

yzm157 commented Apr 7, 2023

@yzm157 can you verify on your side whether this PR fixes the issue with Qt?

Looks like it works, works well.

arkq added 11 commits April 14, 2023 10:00
All D-Bus proxy objects created with GLib *_new_for_bus and
*_new_for_bus_sync functions have to be created with thread default main
context set. Otherwise, proxy object internal signals will be scheduled
on GLib global main context. We do not want that, because we do not know
whether Matter SDK will be used in GLib-based application or not, so we
can not run global main even loop on our own - all Matter-related GLib
signals shell be run on a dedicated Matter GLib context.
@arkq arkq force-pushed the glib-matter-context branch from 2f8e778 to ec685cd Compare April 14, 2023 08:01
@github-actions
Copy link

PR #25960: Size comparison from 62974a0 to ec685cd

Increases (1 build for cc32xx)
platform target config section 62974a0 ec685cd change % change
cc32xx lock CC3235SF_LAUNCHXL .debug_info 20326060 20326061 1 0.0
Full report (1 build for cc32xx)
platform target config section 62974a0 ec685cd change % change
cc32xx lock CC3235SF_LAUNCHXL 0 0 0 0.0
(read only) 643081 643081 0 0.0
(read/write) 203848 203848 0 0.0
.ARM.attributes 44 44 0 0.0
.ARM.exidx 8 8 0 0.0
.bss 197248 197248 0 0.0
.comment 194 194 0 0.0
.data 1480 1480 0 0.0
.debug_abbrev 933224 933224 0 0.0
.debug_aranges 87760 87760 0 0.0
.debug_frame 302028 302028 0 0.0
.debug_info 20326060 20326061 1 0.0
.debug_line 2687403 2687403 0 0.0
.debug_loc 2838361 2838361 0 0.0
.debug_ranges 288040 288040 0 0.0
.debug_str 3042379 3042379 0 0.0
.ramVecs 780 780 0 0.0
.resetVecs 64 64 0 0.0
.rodata 104393 104393 0 0.0
.shstrtab 232 232 0 0.0
.stab 204 204 0 0.0
.stabstr 441 441 0 0.0
.stack 2048 2048 0 0.0
.strtab 377626 377626 0 0.0
.symtab 256832 256832 0 0.0
.text 536568 536568 0 0.0

@github-actions
Copy link

PR #25960: Size comparison from de0dfcf to a84d073

Decreases (1 build for cc32xx)
platform target config section de0dfcf a84d073 change % change
cc32xx lock CC3235SF_LAUNCHXL .debug_info 20330829 20330828 -1 -0.0
Full report (1 build for cc32xx)
platform target config section de0dfcf a84d073 change % change
cc32xx lock CC3235SF_LAUNCHXL 0 0 0 0.0
(read only) 643249 643249 0 0.0
(read/write) 203848 203848 0 0.0
.ARM.attributes 44 44 0 0.0
.ARM.exidx 8 8 0 0.0
.bss 197248 197248 0 0.0
.comment 194 194 0 0.0
.data 1480 1480 0 0.0
.debug_abbrev 933224 933224 0 0.0
.debug_aranges 87792 87792 0 0.0
.debug_frame 302140 302140 0 0.0
.debug_info 20330829 20330828 -1 -0.0
.debug_line 2687904 2687904 0 0.0
.debug_loc 2838960 2838960 0 0.0
.debug_ranges 288072 288072 0 0.0
.debug_str 3042335 3042335 0 0.0
.ramVecs 780 780 0 0.0
.resetVecs 64 64 0 0.0
.rodata 104401 104401 0 0.0
.shstrtab 232 232 0 0.0
.stab 204 204 0 0.0
.stabstr 441 441 0 0.0
.stack 2048 2048 0 0.0
.strtab 377963 377963 0 0.0
.symtab 256976 256976 0 0.0
.text 536728 536728 0 0.0

@andy31415 andy31415 merged commit 0ada46a into project-chip:master May 9, 2023
@arkq arkq deleted the glib-matter-context branch May 9, 2023 19:03
tianfeng-yang referenced this pull request May 10, 2023
…is started (#26338)

* chip-repl hits the Code is unsafe/racy assert when BLE commissioning is started

* Add lock/unlock to CancelTimer

* Fix a few other issue

* Restyle

* Small fix

* Last few fixes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants