-
-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ftl crash #949
Comments
I've had the same crash twice today; FTL log gist is here Edit: Also, the pi's networking was fine throughout this. I'd be surprised if it's related, but I have also been having ISP issues today so my modem's been up and down and the switch the pi is on was restarted a few times earlier today (but neither crash lined up with the switch restart) |
This part:
looks suspicious. However, the following part worked:
So I suspect your router simply does not reply to internal pings and this is not really suggesting a network issue. Unfortunately, your bug trace does not contain the necessary details for me to know what was the cause of the issue. Can you somehow reproduce/force this crash to happen? I just tried restarting my Pi-hole with no network cable attached but I don't see a crash myself. @BlueSpaceCanary This may still have to do with intermittend Internet failures, however, I haven't seen something like this even when coding for Pi-hole while in the train (which is, at least in Germany, the definition of unreliable Internet).
doesn't tell us much, did both crashed only show these lines as backtrace? |
Yep, unfortunately that's all there was for both crashes |
@DL6ER just had a third, and given that they seem like they're going to be pretty regular so it shouldn't take too long to catch one, I'm wondering whether there's an easy way for me to have FTL run in gdb or lldb so next time I can maybe grab a bit more useful information for you? |
There is a way |
@DL6ER caught a crash for you. Just the crash itself + backtrace:
|
I'll leave the gdb session open, so let me know if there's any vars whose values I should grab for you :) I'm not sure, but I think this one happened when I toggled a group state (i.e. re-enabling a few regex domain blocks via a group). I definitely hadn't noticed any DNS issues before that, and immediately after doing it things were broken. |
Guess I might need to grab a debug build for that 😅
EDIT: Oh, just saw a8266a7 which looks like it might be relevant given the |
Real-time Signals are expected and not a crash. They are used to tell FTL to reload domains from the database and similar things. Just type See no. 4 of https://docs.pi-hole.net/ftldns/debugging/ To avoid the |
@DL6ER right after the SIG34 in the pihole-FTL thread, the database thread received a SIGABRT :) I did a debug build and am running that now, but I did run into a couple errors. The first was just needing to grab one more library that wasn't in https://docs.pi-hole.net/ftldns/compile/#debian-ubuntu-raspbian (
so I went ahead and stuck If I catch another crash I'll drop info in here |
Caught another, also immediately after toggling group state:
Since it seems to be somewhat reproduceable on my machine and I'm seeing segfaults and sigaborts in different spots, I'll try running FTL in valgrind this evening and poking at it |
Which version of the compiler are you using? There was a bug in the optimizations strategies of some older Thanks for the |
I won't have time til this evening, but I'll see if I can tickle another crash out of it. After just starting it up, the only thing I see outside sqlite or libarmmem is a bunch of EDIT: Hm, I made a couple hacks to get the admin console to believe that FTL is running, but even so, when running under valgrind I don't seem to be able to change the configured state of FTL. Toggling groups or adlists, using Disable, etc has no effect on the server when running in valgrind, but the underlying config does change and killing & restarting FTL causes it to pick up the new state. I'll see if I can figure out why, since I need to toggle those in FTL to try to produce a crash |
This shouldn't be something for us to worry about.
Oh yes, that's expected.
|
Ok, just repeatedly toggling a group off, reloading, and then toggling it back on and reloading again does seem to reliably produce a crash eventually, though it sometimes takes a bit. Valgrind log: https://gist.github.com/BlueSpaceCanary/a3bf6180d1caed088d1afa5e1ac6833c Valgrind suppressions just in case: https://gist.github.com/BlueSpaceCanary/0e0d33784335d0f9416c56f5d99b17e7 pihole-FTL.log from startup to crash: https://gist.github.com/BlueSpaceCanary/d192252938b6ba2d92f2e0c805c8e8a9 The crash happened after re-enabling a group with a few regexes, which is missing at the very end of the log before the crash |
I am getting a segmentation fault continuously for every 255 seconds. I had initially thought it was a configuration error, and then cleared all config and let it start up as a fresh installation and am still receiving this crash:
I am running this in docker. EDIT: Actually, I think this crash was related to a different issue I was encountering. My docker container was continuously terminating and restarting the instance due to a dns configuraiton issue (pi-hole/docker-pi-hole#342). Solving this issue appears to have solved the frequent crashing for me. |
@BlueSpaceCanary Thanks, I just had time to look at your |
I’m also experiencing a FTL segfault with my pi and GCP instances. |
@DL6ER sure thing, here's one with |
@bjaudon Your two crash reports (one on x86_64, one on ARMv6) show two crashes at two different locations, however, both happening when trying to access memory on disk. I'm putting this on my "look for this as well" list. @BlueSpaceCanary Unfortunately, this still doesn't show anything revealing of significance. I'm a bit confused why there are so many incorrect Line 522 in 0790cf7
(both conflinebuffer and keystr are checked against NULL before, guaranteed null-terminated and locally heap-allocated)
Still trying to reproduce locally, it seems like nothing else really works, unfortunately. |
Worst come to worst, if you're not able to find anything this week I can brush off my C and try to track it down when I'm off work over the holidays since I at least have an easy reproduction to test against. (I'm also memtesting my pi to definitively rule out a hardware issue on the off chance there's a sticking bit which somehow predictably ends up in the physical addresses backing FTL's address space) |
I'm still really hoping to find this quickly. Chances are good as long as I find anything giving me something crashing. I tried various regular expressions, sending signals in quick succession and a few more. I found something, I'm still checking it, though. Could you run
and see if the crashes are resolved? I do not really have high hopes on that this fixing of a real edge-case in the regex process will also resolve what you're seeing, but we will only know once we try. |
No dice, unfortunately https://gist.github.com/BlueSpaceCanary/5c9ef240c21b282560d497a154c00dd4 |
Okay, next try: The crash you're observing happens in the SQLite3 engine, so let's try to rule out a hidden database corruption on disk. You may loose history data, however, we'll first simply put the database somewhere else so the data is still there. If this turns out to be the solution and you want the data back, we can reimport if from the old into the new database later. Could you try the following?
|
Still gives the same sqlite crash. I also tried disabling writing to the db with I've noticed when I run in valgrind I sometimes get this double free and sometimes don't. I haven't caught it in gdb, but I wonder if something is sometimes overwriting one of the list addresses so they end up pointing to the same memory? Also noticed some
Setting size to zero & building locally didn't solve the crashes, but it has made it really hard to reproduce the crash specifically while running under valgrind 🤷 (it also did seem to eliminate the warnings about writing to null) |
Ah, thanks, very good catch! I see I added zeroing the number in the Concerning the double-free: This is strange as well, however, FTL has its own mechanism to detect (and avoid!) this in-vivo so it shouldn't be the cause for any crashes. This region of the code also doesn't have to do with the database and it's not really looking like a dangling pointer to me: There is no offset into the block so it is likely "just" some code freeing something twice:
Line 334 in 0790cf7
Edit Wait, there is an Lines 332 to 336 in 0790cf7
Does this mean we're freeing this somewhere else where it is not explicitly set to |
After some trial and error checking different things with vgdb, I've at least got a partial answer. In I haven't spent any time in gravity-db.c at all yet, but I'll try to catch a few problem calls to |
Thanks, yeah, we definitely want to set the global regex pointer to
Have you checked that To check if this is nevertheless the case, you could check if FTL also report an incorrect number of regexes for the respective types. Watch out for a line like
in your |
Yep, things lined up with what I'd expect. Here's an example segment of a gdb session (with some stepping through gravityDB's calls into sqlite snipped out). The next call into piholeFTL.log does show 0 whitelist & 0 blacklist regex filters:
which seems arguably right because while the regexes are in gravityDB, this was toggling the group to disable them. But if that log line is supposed to report present-but-disabled expressions as well it's definitely wrong since my test setup has 1 of each, and toggling the group back on shows them:
The binary I'm running is built with EDIT: Well, I tested a quick hack to just preempt the double free by nulling the appropriate global in |
Actually, rereading what you said, |
@DL6ER still no crashes for me on the dev branch. Thanks so much for your time and for getting a fix checked in quickly once we'd found the cause! |
The next version of FTL has been released. Please update and run
to get back on-track. Thanks for helping us to make Pi-hole better for us all! If you have any issues, please either reopen this ticket or (preferably) create a new ticket describing the issues in further detail and only reference this ticket. This will help us to help you best. |
Looks like the Raspberry ip address failed on the ethernet port and then the
FTL failed. We woke in the morning and had access to the local network but not the internet. the router had a good connectetion to the wan, Pihole web was also operating but had no DNS.
On initial install i had to select "other ports" in monitoring settings, system listen on all interfaces 1 hop as the only way to make it work and actually listen, then it behaved as expected.
This is begining from the debug file,
and then the FTL crash section again from the debug file.
*** [ INITIALIZING ] Sourcing setup variables
[i] Sourcing /etc/pihole/setupVars.conf...
*** [ DIAGNOSING ]: Core version
[i] Core: v5.2 (https://discourse.pi-hole.net/t/how-do-i-update-pi-hole/249)
[i] Remotes: origin https://github.com/pi-hole/pi-hole.git (fetch)
origin https://github.com/pi-hole/pi-hole.git (push)
[i] Branch: master
[i] Commit: v5.2-0-gfee1b8b
*** [ DIAGNOSING ]: Web version
[i] Web: v5.2 (https://discourse.pi-hole.net/t/how-do-i-update-pi-hole/249)
[i] Remotes: origin https://github.com/pi-hole/AdminLTE.git (fetch)
origin https://github.com/pi-hole/AdminLTE.git (push)
[i] Branch: master
[i] Commit: v5.2-0-g2c2d9f5
*** [ DIAGNOSING ]: FTL version
[✓] FTL: v5.3.1
*** [ DIAGNOSING ]: lighttpd version
[i] 1.4.53
*** [ DIAGNOSING ]: php version
[i] 7.3.19
*** [ DIAGNOSING ]: Operating system
[i] dig return code: 0
[i] dig response: "Raspbian=9,10 Ubuntu=16,18,20 Debian=9,10 Fedora=31,32 CentOS=7,8"
[✓] Distro: Raspbian
[✓] Version: 10
*** [ DIAGNOSING ]: SELinux
[i] SELinux not detected
*** [ DIAGNOSING ]: FirewallD
[i] Firewalld service inactive
*** [ DIAGNOSING ]: Processor
[✓] armv6l
*** [ DIAGNOSING ]: Networking
[✗] No IPv4 address(es) found on the enxb827ebbcc05f interface.
[✗] No IPv6 address(es) found on the enxb827ebbcc05f interface.
[i] Default IPv4 gateway: 192.168.1.1
[✗] Gateway did not respond. (https://discourse.pi-hole.net/t/why-is-a-default-gateway-important-for-pi-hole/3546)
*** [ DIAGNOSING ]: Ports in use
*:514 rsyslogd (IPv4)
*:514 rsyslogd (IPv6)
*:22 sshd (IPv4)
*:22 sshd (IPv6)
[80] is in use by lighttpd
[80] is in use by lighttpd
[53] is in use by pihole-FTL
[53] is in use by pihole-FTL
[4711] is in use by pihole-FTL
[4711] is in use by pihole-FTL
*** [ DIAGNOSING ]: Name resolution (IPv4) using a random blocked domain and a known ad-serving domain
[✓] hardyskills.com is 0.0.0.0 via localhost (127.0.0.1)
[✓] hardyskills.com is 0.0.0.0 via Pi-hole (192.168.1.2)
[✓] doubleclick.com is 142.250.66.206 via a remote, public DNS server (8.8.8.8)
*** [ DIAGNOSING ]: Discovering active DHCP servers (takes 10 seconds)
Scanning all your interfaces for DHCP servers
Timeout: 10 seconds
Offered IP address: 192.168.1.109
Server IP address: 192.168.1.1
Relay-agent IP address: N/A
DHCP options:
Message type: DHCPOFFER (2)
server-identifier: 192.168.1.1
lease-time: 120 ( 2m )
renewal-time: 60 ( 1m )
rebinding-time: 105 ( 1m 45s )
netmask: 255.255.255.0
broadcast: 192.168.1.255
dns-server: 192.168.1.2
wpad-server: (length 1)
router: 192.168.1.1
--- end of options ---
DHCP packets received on interface eth0: 1
DHCP packets received on interface lo: 0
*** [ DIAGNOSING ]: Pi-hole processes
[✓] lighttpd daemon is active
[✓] pihole-FTL daemon is active
*** [ DIAGNOSING ]: Pi-hole-FTL full status
● pihole-FTL.service - LSB: pihole-FTL daemon
Loaded: loaded (/etc/init.d/pihole-FTL; generated)
Active: active (exited) since Tue 2020-12-01 08:04:05 ACDT; 32min ago
Docs: man:systemd-sysv-generator(8)
Process: 402 ExecStart=/etc/init.d/pihole-FTL start (code=exited, status=0/SUCCESS)
Dec 01 08:03:18 raspberrypi systemd[1]: Starting LSB: pihole-FTL daemon...
Dec 01 08:03:19 raspberrypi pihole-FTL[402]: Not running
Dec 01 08:03:21 raspberrypi su[431]: (to pihole) root on none
Dec 01 08:03:21 raspberrypi su[431]: pam_unix(su:session): session opened for user pihole by (uid=0)
Dec 01 08:04:05 raspberrypi pihole-FTL[402]: FTL started!
Dec 01 08:04:05 raspberrypi systemd[1]: Started LSB: pihole-FTL daemon.
-rw-r--r-- 1 pihole pihole 89021 Dec 1 08:34 /var/log/pihole-FTL.log
-----head of pihole-FTL.log------
[2020-12-01 03:45:40.275 3712M] Resizing "FTL-dns-cache" from 86016 to (5632 * 16) == 90112 (/dev/shm: 2.5MB used, 93.4MB total)
[2020-12-01 04:01:22.997 3712M] Reloading DNS cache
[2020-12-01 04:01:23.169 3712M] Blocking status is enabled
[2020-12-01 04:01:23.185 3712M] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[2020-12-01 04:01:23.185 3712M] ----------------------------> FTL crashed! <----------------------------
[2020-12-01 04:01:23.186 3712M] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[2020-12-01 04:01:23.186 3712M] Please report a bug at https://github.com/pi-hole/FTL/issues
[2020-12-01 04:01:23.186 3712M] and include in your report already the following details:
[2020-12-01 04:01:23.187 3712M] FTL has been running for 137192 seconds
[2020-12-01 04:01:23.187 3712M] FTL branch: master
[2020-12-01 04:01:23.188 3712M] FTL version: v5.3.1
[2020-12-01 04:01:23.188 3712M] FTL commit: e1db31d
[2020-12-01 04:01:23.188 3712M] FTL date: 2020-11-28 21:59:12 +0100
[2020-12-01 04:01:23.189 3712M] FTL user: started as pihole, ended as pihole
[2020-12-01 04:01:23.190 3712M] Compiled for armv6hf (compiled on CI) using arm-linux-gnueabihf-gcc (crosstool-NG crosstool-ng-1.22.0-88-g8460611) 4.9.3
[2020-12-01 04:01:23.190 3712M] Process details: MID: 3712
[2020-12-01 04:01:23.191 3712M] PID: 3712
[2020-12-01 04:01:23.191 3712M] TID: 3712
[2020-12-01 04:01:23.191 3712M] Name: pihole-FTL
[2020-12-01 04:01:23.192 3712M] Received signal: Segmentation fault
[2020-12-01 04:01:23.192 3712M] at address: (nil)
[2020-12-01 04:01:23.192 3712M] with code: SEGV_MAPERR (Address not mapped to object)
[2020-12-01 04:01:23.226 3712M] Backtrace:
[2020-12-01 04:01:23.228 3712M] B[0000]: /usr/bin/pihole-FTL(+0x4a2f8) [0x49d2f8]
[2020-12-01 04:01:23.818 3712M] L[0000]: /root/project/src/signals.c:194
[2020-12-01 04:01:23.850 3712M] B[0001]: /lib/arm-linux-gnueabihf/libc.so.6(__default_rt_sa_restorer+0) [0xb6d95130]
[2020-12-01 04:01:23.850 3712M] ------ Listing content of directory /dev/shm ------
[2020-12-01 04:01:23.851 3712M] File Mode User:Group Filesize Filename
[2020-12-01 04:01:23.854 3712M] rwxrwxrwx root:root 260 .
[2020-12-01 04:01:23.855 3712M] rwxr-xr-x root:root 4K ..
[2020-12-01 04:01:23.856 3712M] rw------- pihole:pihole 4K FTL-per-client-regex
[2020-12-01 04:01:23.868 3712M] rw------- pihole:pihole 90K FTL-dns-cache
[2020-12-01 04:01:23.869 3712M] rw------- pihole:pihole 20K FTL-overTime
[2020-12-01 04:01:23.870 3712M] rw------- pihole:pihole 2M FTL-queries
[2020-12-01 04:01:23.872 3712M] rw------- pihole:pihole 29K FTL-upstreams
-----tail of pihole-FTL.log------
[2020-12-01 08:04:05.169 475M] Error: Found unknown status 12 in long term database!
[2020-12-01 08:04:05.170 475M] Timestamp: 1606757184
[2020-12-01 08:04:05.170 475M] Continuing anyway...
[2020-12-01 08:04:05.180 475M] Error: Found unknown status 12 in long term database!
[2020-12-01 08:04:05.181 475M] Timestamp: 1606757212
[2020-12-01 08:04:05.181 475M] Continuing anyway...
[2020-12-01 08:04:05.204 475M] Error: Found unknown status 12 in long term database!
[2020-12-01 08:04:05.204 475M] Timestamp: 1606757396
[2020-12-01 08:04:05.205 475M] Continuing anyway...
[2020-12-01 08:04:05.206 475M] Error: Found unknown status 12 in long term database!
[2020-12-01 08:04:05.206 475M] Timestamp: 1606757405
[2020-12-01 08:04:05.207 475M] Continuing anyway...
[2020-12-01 08:04:05.238 475M] Imported 26301 queries from the long-term database
[2020-12-01 08:04:05.250 475M] -> Total DNS queries: 26301
[2020-12-01 08:04:05.251 475M] -> Cached DNS queries: 4875
[2020-12-01 08:04:05.251 475M] -> Forwarded DNS queries: 10544
[2020-12-01 08:04:05.251 475M] -> Blocked DNS queries: 10496
[2020-12-01 08:04:05.252 475M] -> Unknown DNS queries: 0
[2020-12-01 08:04:05.262 475M] -> Unique domains: 1586
[2020-12-01 08:04:05.262 475M] -> Unique clients: 9
[2020-12-01 08:04:05.273 475M] -> Known forward destinations: 2
[2020-12-01 08:04:05.274 475M] Successfully accessed setupVars.conf
[2020-12-01 08:04:05.394 519M] PID of FTL process: 519
[2020-12-01 08:04:05.398 519M] INFO: FTL is running as user pihole (UID 999)
[2020-12-01 08:04:05.433 519/T520] Listening on port 4711 for incoming IPv4 telnet connections
[2020-12-01 08:04:05.447 519/T522] Listening on Unix socket
[2020-12-01 08:04:05.457 519/T521] Listening on port 4711 for incoming IPv6 telnet connections
[2020-12-01 08:04:05.466 519M] Reloading DNS cache
[2020-12-01 08:04:05.484 519M] Blocking status is enabled
[2020-12-01 08:04:05.889 519/T523] Compiled 0 whitelist and 0 blacklist regex filters for 9 clients in 24.7 msec
[2020-12-01 08:05:29.189 519M] Resizing "FTL-strings" from 40960 to (45056 * 1) == 45056 (/dev/shm: 2.1MB used, 93.4MB total)
[2020-12-01 08:06:30.888 519M] Resizing "FTL-dns-cache" from 4096 to (512 * 16) == 8192 (/dev/shm: 2.1MB used, 93.4MB total)
[2020-12-01 08:15:34.185 519M] Resizing "FTL-dns-cache" from 8192 to (768 * 16) == 12288 (/dev/shm: 2.1MB used, 93.4MB total)
[2020-12-01 08:34:11.560 519M] Resizing "FTL-strings" from 45056 to (49152 * 1) == 49152 (/dev/shm: 2.1MB used, 93.4MB total)
[2020-12-01 08:34:51.008 519M] Resizing "FTL-dns-cache" from 12288 to (1024 * 16) == 16384 (/dev/shm: 2.1MB used, 93.4MB total)
The text was updated successfully, but these errors were encountered: