-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
packetbeat nightly151201180656 crashing psql #565
Comments
@opb1978 it would be really great if you could! |
no problem, should I send it again to @andrewkroh ? I would like to gpg encrypt the file before sending... |
Yeah, if you already have his gpg key, then that would be easiest. Thanks! |
I actually tried nightly builds one before but the version string in the nightly builds seams to be wrong and did not want to repack the debian package. maybe you could have a look into this some time, will do the repack now by hand: dpkg: error processing archive packetbeat_nightly.latest_amd64.deb (--install): |
Yeah, sorry, that is still an open issue. elastic/beats-packer#40 |
@andrewkroh did a retest with nightly151201180656 still having a psql Problem. I will send you a download link for the pcap file. |
We tried your PCAP and were not able to reproduce using the latest build from master. The nightly build that you used is from 2015-12-01 (based on the filename) and the fix was not introduced until 2015-12-10. We only store the past 2 weeks of nightly builds, so did you possibly use a version that you had downloaded in the past? |
@opb1978 please get the most recent nightly. I checked builds up to 2015-12-10 being able to reproduce the original panic. More recent builds should be fine. |
sorry for the confusion will retry with the latest nightly build. You where right I downloaded before and repacked the wrong version. Will update here soon! |
did some retesting with the nightly build and got again some errors. @andrewkroh I have sent you a download link yesterday. |
Right after I received the latest PCAP, I give it a try (but I forgot to update this issue). I was not able to reproduce any panics with it. urso also tried the PCAP and could not reproduce. |
We were just chatting about this and @urso came up with a theory that we should investigate further. It might explain why we can reproduce it from the PCAP, but you are seeing an issue in production. If the pgsql transactions are growing larger than 10MB, then the stream is dropped. But if there is some faulty state management (i.e. the state is not reset properly) then this could lead to potential issues. |
Do you need any more tests for tracking down this problem? We could also do some remote testing on our Systems. |
Thanks for your help. Unfortunately my theory was wrong. To track down the issue I need a trace reliably reproducing the issue, so I can minify the trace until I can identify the problem. One can test a pcap in bash/zsh with:
We can build a small script creating and testing a dump for some Stacktrace:
This script will create a trace for 60 seconds and checks if the traces generates an error by running packetbeat with -t -N -I trace.pcap . -N and -I guarantee this packetbeat instance is reading packets from trace file only and will not forward any events to elasticsearch/logstash. You can run the script next to your running packetbeat instance (still memory/disk/cpu will be used to create the trace). Update check function and time intervals if required. |
I have been running this script now since hours and no errors. If I start packetbeat again normally the problem occurs after some minutes. I have been capturing on interface "any" because this is how packetbeat would be running. I have put the script into a screen, maybe it will produce the problem after some time. Just a guess, maybe the problem is occurring while transferring to elasticsearch? As I disabled the normal process (causing to many errors and SMS) we could start the replay of the pcap file with transferring to elastic search. I can easily remove this sheds again. |
Hmm... bug seems to be hiding. Problem is, if we run with '-t', we alter timestamps and timely behavior. So another options would be to modify the script to:
Doing changes 1 and 2, the script will capture traffic for DURATION minutes and afterwards check the pcap for DURATION minutes. |
Good news, I've finally got a trace send reproducing the error. Put quite some effort into hardening the pgsql parser today. See #825 |
just updated to packetbeat 1.0.1 and checked if the issue #342 is now fixed in this version.
after running for about 10 minutes I got this error in the log file:
2015-12-18T23:07:11.921545+01:00 somehost /usr/bin/packetbeat[13560]: log.go:114: Stacktrace: /go/src/github.com/elastic/beats/libbeat/logp/log.go:114 (0x48c5c6)#12/usr/local/go/src/runtime/asm_amd64.s:437 (0x47d8fe)#12/usr/local/go/src/runtime/panic.go:423 (0x44d4f9)#12/usr/local/go/src/runtime/panic.go:18 (0x44ba39)#12/go/src/github.com/elastic/beats/packetbeat/protos/pgsql/pgsql.go:279 (0x512203)#12/go/src/github.com/elastic/beats/packetbeat/protos/pgsql/pgsql.go:610 (0x5146de)#12/go/src/github.com/elastic/beats/packetbeat/protos/pgsql/pgsql.go:707 (0x51515d)#12/go/src/github.com/elastic/beats/packetbeat/protos/tcp/tcp.go:87 (0x521093)#12/go/src/github.com/elastic/beats/packetbeat/protos/tcp/tcp.go:173 (0x5221cd)#12/go/src/github.com/elastic/beats/packetbeat/decoder/decoder.go:136 (0x6c8ad1)#12/go/src/github.com/elastic/beats/packetbeat/sniffer/sniffer.go:352 (0x5337a9)#12/go/src/github.com/elastic/beats/packetbeat/packetbeat.go:212 (0x422f2b)#12/usr/local/go/src/runtime/asm_amd64.s:1696 (0x47fc41)
seams to be still a problem here.
I can do a capture again if needed!
The text was updated successfully, but these errors were encountered: