-
Notifications
You must be signed in to change notification settings - Fork 0
Segfault in macOS message_filters tests #189
Comments
@nburek can you please investigate this problem and work on a fix. Thanks. |
Yes, I'll start investigating. |
Update: I poked quite a bit at the source for log4cxx and its underlying library, APR. The basic problem has to do with the C++ static object destruction order across translation units. Essentially the "APRInitializer" destructor here can get called way before the rest of the shutdown of the system. If that happens, then anything allocated out of the APR "pools" (which is basically everything in log4cxx) is now invalid, and future accesses to them may result in a segmentation fault. The problem is reported on the upstream bug tracker here and here. There does seem to be some changes in newer log4cxx that tries to avoid this. However, there are no current releases with that fix available in them (so it is not available in homebrew or in the Linux packages). With all of that said, the issues above have been open for a decade, so it seems like we should probably start considering options for a different logging backend. |
Great information, @clalancette . Given that this is breaking a number of things it seems like the safest action is to roll back making rcl_logging_log4cxx the default 3rd party logger for the dashing release. Reading all the information in the bug tracker, it looks like the recent changes to avoid the issue are based on the patch that Ubuntu made when they switched to log4cxx as their logger. They note that this creates a memory leak on shutdown, but that they were okay with that because running on a modern operating system allows the memory to be reclaimed when the process dies. I don't think that this is a fully acceptable solution for the default logging backend in ROS2. We could explore creating a log4cxx vendor package to pickup a lot of the bug fixes from the past couple of years since the last release and also implement the dynamic loading feature for logging backends. That would allow ROS 2 to ship with rcl_logging_noop and users who are okay with the pitfalls could use rcl_logging_log4cxx. However I think this work will take us past the Dashing release. |
FWIW, I played around with using another logging library today, spdlog. I have a backend implementation in the While that spdlog implementation may be an option going forward, I have to agree with @nburek that the lowest-risk option is to go back to noop and revisit this after Dashing. |
This is now "fixed" by ros2/rcl#436 (message filter tests stopped segfaulting in https://ci.ros2.org/view/nightly/job/nightly_osx_debug/1251), so I'll close this one out. |
Some tests are failing with a segfault on macOS, e.g. https://ci.ros2.org/view/nightly/job/nightly_osx_debug/1243/testReport/junit/(root)/projectroot/message_filters_test_simple/
I did some debugging on this on mini2. The crash always happens when the program is quitting. Further, the crash always happens while trying to cleanup log4cxx via
rcl_logging_log4cxx.cpp::rcl_logging_external_shutdown
.From poking around in the debugger and at log4cxx directly, the eventual result is that during cleanup log4cxx fails to acquire a mutex with an EINVAL. The only reason that this can really happen is if the mutex is uninitialized, or was destroyed. By adding some additional debugging, I could see that there are some object lifetime issues in log4cxx which cause the mutex in question to be destroyed before this final cleanup. Going deeper, there is a "pool" of memory that the mutex is allocated from at the apr level (Apache Portable Runtime, which log4cxx uses). There should be several pools and subpools, but for some reason when one of the subpools gets destroyed, it actually goes and frees the memory for all of the pools, which renders the mutex invalid. This feels like a bug in log4cxx or apr (1.6.5 on macOS), but it may be possible to work around it by some judicious pointer wrangling. More debugging needed.
The text was updated successfully, but these errors were encountered: