Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault on Exit (Qt) #1095

Closed
hannes101 opened this issue Jun 10, 2020 · 39 comments · Fixed by #1768
Closed

Segmentation fault on Exit (Qt) #1095

hannes101 opened this issue Jun 10, 2020 · 39 comments · Fixed by #1768
Labels
Bug HELP-WANTED Used by 24pullrequests.com to suggest issues Medium Python-Specific only for certain versions of Python Qt Qt bugs, code or features Reproduced

Comments

@hannes101
Copy link

We got a downstream bug report that backintime fails with a segmentation fault, whenever it is closed in the current fedora rawhide.
Please see the downstream bug report at
https://bugzilla.redhat.com/show_bug.cgi?id=1844781

@bentolor
Copy link

bentolor commented Sep 4, 2022

Same error here after Upgrade from Ubuntu 20.04 LTS to Ubuntu 22.04 LTS.

@buhtz
Copy link
Member

buhtz commented Sep 4, 2022

Thanks for reporting. But please add some more context information's

  • Who is "we"?
  • Fedora is know but what is "rawhide"? Please avoid using release names but concrete version numbers.
  • You point to a redhat report but isn't it Fedora? We can not have all Distro connections in our minds. Tell us if Fedora is something from redhat. ;)

@buhtz
Copy link
Member

buhtz commented Sep 4, 2022

Same error here after Upgrade from Ubuntu 20.04 LTS to Ubuntu 22.04 LTS.

Can you please report the backintime versions that are used in that two Ubuntu release? Thanks.

@bentolor
Copy link

bentolor commented Sep 4, 2022

Same error here after Upgrade from Ubuntu 20.04 LTS to Ubuntu 22.04 LTS.

Can you please report the backintime versions that are used in that two Ubuntu release? Thanks.

Hi @Codeberg-AsGithubAlternative-buhtz In both installations I used backintime 1.3.2 from the PPA. Besides the distribution upgrade I also switched from X11 to Wayland in case that could be somewhat related.

@buhtz
Copy link
Member

buhtz commented Sep 4, 2022

Dear Benjamin,
thanks a lot for that very valuable info.

@emtiu
Copy link
Member

emtiu commented Sep 6, 2022

I've also been seeing segfaults upon exiting backintime-qt on Kubuntu as well as Manjaro (both using Python 3.10). They seem to have no impact on functionality, and I'm not yet sure if they're 100% reproducible.

Any good ideas on debugging?

@emtiu
Copy link
Member

emtiu commented Sep 6, 2022

There is further debugging information in #1227, and in the downstream bug (mentioned above): https://bugzilla.redhat.com/show_bug.cgi?id=1844781

@emtiu emtiu changed the title Segmentation fault with python 3.9 Segmentation fault on Exit Sep 6, 2022
@emtiu
Copy link
Member

emtiu commented Sep 7, 2022

Another stacktrace is reported in #1271, which I'm closing as a duplicate.

@aryoda
Copy link
Contributor

aryoda commented Sep 8, 2022

@emtiu Good work (sharp eyes ;-) I didn't realize that there are duplicated issues. I have just briefly checked the stack traces and think these issues are really duplicates since the segfault is caused in the the same location:

#1095: #0 0x00007f20a6756be0 in PyCFunction_Type () from /lib64/libpython3.9.so.1.0
#1227: #0 0x00007f54228284e0 PyCFunction_Type (libpython3.10.so.1.0 + 0x3ad4e0)
#1271: #0 0x0000557b65db38a0 in PyCFunction_Type ()

If have searched all issues containing the keyword in:body in:comments PyCFunction_Type and in:body in:comments in:title is:open segfault and could not find more issues so there are no more duplicates at the moment.

Any good ideas on debugging?

Debugging of segfaults if difficult (requires eg. gdb + installed debug symbols, eg. via the *-dbg packages in Ubuntu) of the library the causes the segfault according to the stacktrace

Also the developer/debugger needs to understand the segfault-causing library a little bit to debug it
so I think it would be best to

a) try to make the crash reproducible with a minimal reproducible (code) example (MRE)
b) open an issue at the library developer site that includes the MRE
c) let the developers debug the code and/or git bisect their changes to find the commit that introduced the problem

@emtiu
Copy link
Member

emtiu commented Sep 8, 2022

a) try to make the crash reproducible with a minimal reproducible (code) example (MRE)

Hmm, that's tricky. From what I've seen, it' not 100% reproducable, but it happens almost every time that backintime exits.

My working hypothesis is: backintime segfaults almost always on GUI quit, but most users don't notice, because: a) there's no loss of functionality, and b) the graphical desktop environment doesn't show the error when it happens.

@aryoda
Copy link
Contributor

aryoda commented Sep 8, 2022

If I find the time I could write some basic GUI-related unit tests (just checking for the best tools to do this) and hopefully one unit tests provokes this segfault...

@emtiu
Copy link
Member

emtiu commented Sep 8, 2022

Digging around the discussions in https://bugzilla.redhat.com/show_bug.cgi?id=1844781 and https://forum.manjaro.org/t/python-crash-when-exiting-back-in-time/102856/11, I see three hints on a possible root of the problem:

  1. https://www.riverbankcomputing.com/static/Docs/PyQt5/gotchas.html#crashes-on-exit – quote:

"When the Python interpreter leaves a scope (for example when it returns from a function) it will potentially garbage collect all objects local to that scope. The order in which it is done is, in effect, random. Theoretically this can cause problems because it may mean that the C++ destructors of any wrapped Qt instances are called in an order that Qt isn’t expecting and may result in a crash. However, in practice, this is only likely to be a problem when the application is terminating.

As a way of mitigating this possiblity PyQt5 ensures that the C++ destructors of any QObject instances owned by Python are invoked before the destructor of any QCoreApplication instance is invoked. Note however that the order in which the QObject destructors are invoked is still random."

  1. https://bugzilla.redhat.com/show_bug.cgi?id=1844781#c14 (referring to the above documentation) – quote:

Based on that documentation, the order in which the Qt objects QObjects are destroyed on exit is random to some extent. If the memory for a Qt object was freed before it was used by functions in python-pyqt5-sip like sip_api_get_address in my trace from Fedora 36 from comment 12 or those in the valgrind invalid reads and writes in my original report, then this type of crash might be the result. The backintime maintainers could look at whether the order in which its QObjects are destroyed when closing could be changed somehow to avoid this problem.

  1. https://www.riverbankcomputing.com/pipermail/pyqt/2022-January/044458.html – a mailing list thread describing a similar error, also documenten in this Debian bug report: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=998897

I don't know enough about Qt or C++ to make sense of any of it, but maybe it helps someone else looking at this.

@aryoda
Copy link
Contributor

aryoda commented Sep 8, 2022

Good summary and yes Matt Fagnani did a damn good job to narrow down the problem and in the end the challenge is (if it should really be caused by "use-after-freed") that at the moment the error happens (segfault) the stacktrace does not directly point to the code line that freed the object before (because freeing memory happens non-deterministically and most probably async - edit: not true for C).

This is why I suggest git bisect while keeping all other dependencies unchanged to find the commit that introduced the problem.

I am out, I know Qt, C, C++, Valgrind and gdb, but this bug is too time consuming for me ATM, I hope it will be diagnosed and fixed upstream.

@emtiu
Copy link
Member

emtiu commented Sep 9, 2022

Adding Python-Specific only for certain versions of Python , because only Python 3.9 and 3.10 seem affected right now.

@emtiu emtiu added Python-Specific only for certain versions of Python Qt Qt bugs, code or features labels Sep 9, 2022
@emtiu
Copy link
Member

emtiu commented Sep 10, 2022

Here's another piece to the puzzle – but be careful, there's a good chance it will melt your brain: 1d63ced#commitcomment-21596448

@aryoda
Copy link
Contributor

aryoda commented Oct 7, 2022

In my Manjaro VM I also get a crash with memory dump now-and-then when I exit BiT-qt.

Again in PyCFunction_Type with Python 3.10.7:

#0 0x00007f6ea978f4e0 PyCFunction_Type (libpython3.10.so.1.0 + 0x3a94e0)

Module libstdc++.so.6 with build-id 735a3d0cc7699fd69337361cba4aedb644b2a7ed
                                                Module libQt5Core.so.5 with build-id 10f403c84cef570e5302c7b1f0d8db34f8293f71
                                                Module libQt5Gui.so.5 with build-id c3306c1f059a4d83b3ece5ca28ce2a147fc154e3
                                                Module QtGui.abi3.so with build-id 535cb23d62380e2c5ba799aa6597a0514c478b9d
                                                Module _sha512.cpython-310-x86_64-linux-gnu.so with build-id fc29dd2925467c7320756880fd6097a434a80383
                                                Module _random.cpython-310-x86_64-linux-gnu.so with build-id 3ba0c8aa724ca018e1e92d2450db4140960f5fad
                                                Module _bisect.cpython-310-x86_64-linux-gnu.so with build-id 22aab3ad2d659a48dc9cf0644ea649dd6af5d02b
                                                Module liblzma.so.5 with build-id d08f5868cd5adcc6b7c53bf1725aac65bd4539cd
                                                Module _lzma.cpython-310-x86_64-linux-gnu.so with build-id e09d86931d10be53314416ae42b4acc83bad99db
                                                Module libbz2.so.1.0 with build-id 919597c477c9b2cb9cdbb7745ed6494ac0e6da60
                                                Module _bz2.cpython-310-x86_64-linux-gnu.so with build-id e0c85fab13c0ac6e0116ade839fbc014a49de646
                                                Module libz.so.1 with build-id fefe3219a96d682ec98fcfb78866b8594298b5a2
                                                Module zlib.cpython-310-x86_64-linux-gnu.so with build-id f0004944c412854a4f2789a472b725c604bb4a41
                                                Module select.cpython-310-x86_64-linux-gnu.so with build-id 0a22fb2dce45bf7b54b47cb578c6514b5f5e99ad
                                                Module _posixsubprocess.cpython-310-x86_64-linux-gnu.so with build-id a0d950372300ef708a942ca0ede06c4dba112919
                                                Module fcntl.cpython-310-x86_64-linux-gnu.so with build-id 0abd0b41ae49f4c1f42f8eb46f2177f3793907cb
                                                Module _datetime.cpython-310-x86_64-linux-gnu.so with build-id a3c402a3da9060f2ab4a3b8687075bf8369ace05
                                                Module math.cpython-310-x86_64-linux-gnu.so with build-id 15ac91d5b595d5e8e572133fa1f393fb0cad45cd
                                                Module ld-linux-x86-64.so.2 with build-id 075a6ad9f1c3f9cbb5f3301186bbe68c6a477808
                                                Module libm.so.6 with build-id 2c8ff1d29b255da5b7371efd5caf57444d622838
                                                Module libc.so.6 with build-id 90b9e4f641f8752292698389f241cbf0ff49d687
                                                Module libpython3.10.so.1.0 with build-id ae3d4703207944dec3b048362873b7794eedf306
                                                Module python3.10 with build-id c0d0b166e6a6b76fc6156beacc8c71e7ad9ba2bc
                                                Stack trace of thread 1277:
                                                #0  0x00007f6ea978f4e0 PyCFunction_Type (libpython3.10.so.1.0 + 0x3a94e0)
                                                ELF object binary architecture: AMD x86-64

@emtiu emtiu pinned this issue Oct 8, 2022
@aryoda
Copy link
Contributor

aryoda commented Oct 10, 2022

If sip is really the (only) reason for the segfaults (which I am not sure in case of the PyCFunction_Type calls) there might be a fix on its way (implemented 8 months ago):

https://www.riverbankcomputing.com/hg/sip/rev/072b8949de41

It was fixed in all sip versions (or at least in 6.5maint) but I am not sure if it is already contained in PyQt5-sip
so it would be great if someone having this crash could manually pip-install the newest sip version to test it...

https://pypi.org/project/PyQt5-sip/

@aryoda
Copy link
Contributor

aryoda commented Jan 9, 2024

I vote to add a simple FAQ entry about that problem

Yes and would like to keep this issue open until I have so much time left that I can do a multi-day trace & debug session since I can reproduce the problem on several distros (almost deterministically)...

@DerekVeit
Copy link
Contributor

I wrote a script to run and close backintime-qt 10 times, logging to a file. I consistently get the segfault message about 5 out of 10 times. This is with the distro installation of 1.2.1 on Linux Mint 21.2.

If I add this line anywhere in MainWindow.closeEvent, the segfault message never happens at all:

self.qapp.removeEventFilter(self.mouseButtonEventFilter)

@buhtz
Copy link
Member

buhtz commented Jun 22, 2024

Hello Derek,
that is a fascinating "solution". How did you come across this lead? What is the origin of this idea?

I remember the mouseButtonEventFilter. I did not understand what it is doing.
It was never clear to me why the BIT main window use class ExtraMouseButtonEventFilter instead of just overloading mousePressEvent() in the main window. It seems that the class ExtraMouseButtonEventFilter is used because mousePressEvent() won't work because the main windows child widgets to steel the mouse events.

Can you provide that script? I would like to increase the repetitions.

Depending on earlier experiments with the seg fault problem I am assuming that the "connection" between that problem and the mouse event filter it is only a coincidence. But we should give it a try of course.

Best,
Christian

EDIT: The event filter instance is a member of the MainWindow but it is installed to the QApplication object. When the window closes, Python might garbage collect its members including the filter. The QApplication is destroyed later. There the filter is still installed but garbage collected by Python interpreter. QApplication object might touch the event filter at its end again and seg fault because it is not present anymore.

EDIT2: I support the proposal of Derek. Would you like to provide a PR for this? On a long run we encapsulate the widgets of the main window into their own classes. Then we might get rid of the need for an global event filter and then can simply use mousePressEvent() on our custom fileview widget.

@DerekVeit
Copy link
Contributor

I was looking for something that might be destroyed out of order, and seeing that filter added without a corresponding removal looked like a possibility.

Here is the test script I'm using: run_and_quit_backintime.sh

You can call it with a numeric argument for how many times to run and close. It uses xdotool to close.

It's using ctrl+w for the version 1.2.1 that I'm running, so just change that to ctrl+q for a newer version.

I would be happy to make a PR for it.

@emtiu
Copy link
Member

emtiu commented Jun 22, 2024

Wonderful! This problem has been bugging us for so long. Thanks for the good work :)

@buhtz
Copy link
Member

buhtz commented Jun 22, 2024

I would be happy to make a PR for it.

Great. Let me know if you need assistance. Don't hesitate to ask.

@DerekVeit
Copy link
Contributor

Thanks. I tried to conform my commit message and such, but I would be glad to get any pointers on my first PR here.

DerekVeit pushed a commit to DerekVeit/backintime that referenced this issue Jun 23, 2024
DerekVeit pushed a commit to DerekVeit/backintime that referenced this issue Jun 26, 2024
@buhtz buhtz closed this as completed in 071fb4b Jun 28, 2024
@aryoda
Copy link
Contributor

aryoda commented Jun 28, 2024

@DerekVeit Excellent work (and approach), thanks a lot for helping us!

What I have learned as "take-away" from this issue is that debugging is not always the most-efficient way (but hard-core) for non-deterministic segfaults but white-box code-analysis and perhaps even disabling some code (perfectly via bisecting) is another excellent way to find the culprit 😄

@DerekVeit
Copy link
Contributor

@aryoda Thanks! That's what I was thinking after reading your posts and Benjamin's. I had just started along that strategy of methodically removing things. But since the widget structure should probably all get torn down in the normal way, I was starting by looking for anything special outside of that, and self.setMouseButtonNavigation() was one of the first such things I saw.

I've been using Back In Time for some years, so I'm grateful and glad to help too.

@buhtz
Copy link
Member

buhtz commented Aug 5, 2024

Hello Derek,
would you like to work further on that problem. 😆

It seems there are still segmentation faults. Was reported as a side problem in #1828. I can confirm and reproduce with latest dev version.

Steps to reproduce:

  1. Have no config file or delete it.
  2. Start BIT without any configuration.
  3. Click on "No" when being asked if you want to restore an old config.
  4. The Manage profiles (aka Settings) window comes up.
  5. Close it via "Cancel".
  6. Seg fault should appear (in ~50% of all cases).

@buhtz buhtz reopened this Aug 5, 2024
@DerekVeit
Copy link
Contributor

I was able to reproduce this too. I've made a modified version of the testing script for it.

run_and_quit_backintime_no-config.sh

And I see the reason. I added the fix in MainWindow.closeEvent, but that is only called if the application has started and then has a close event. In the unconfigured case, MainWindow.__init__ returns early on not config.isConfigured() but after the event filter has been added, and then the same condition is used at the bottom of the module to just skip executing the application. In that case it never closes, because it never really starts.

Moving the removeEventFilter call from closeEvent down to the bottom after the qapp.exec_() (replacing self with mainWindow) ensures that it happens in the unconfigured case as well as with a normal close. Running it 100+ times with both the normal close and the early exit, it doesn't segfault now.

If this sounds good, I can make another PR for this improved version of the fix.

@buhtz
Copy link
Member

buhtz commented Aug 6, 2024

Sounds like a good solution to me. "Make it so." 🚀 Thank you very much for your efforts to help with this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug HELP-WANTED Used by 24pullrequests.com to suggest issues Medium Python-Specific only for certain versions of Python Qt Qt bugs, code or features Reproduced
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants