-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IcngaDB fails with: Prepared statement contains too many placeholders - Overdue indicators not chunked #147
Comments
Hey @dmitriy-terzeman, thank you for reporting! The best thing you can do to help, is to enable debug logging in
Cheers |
Hello @N-o-X , sorry for the delay with reply, i've played around with it and was able to finally reproduce the bug. for reproduction you need a really high loaded environment, 100k services with 1 minute frequency or more. I think it's related with the redis stream queue and limitation for insert in MySQL(in a single transaction), due to the long icinga2 master restart/reload a lot of services misses actual state and icingadb try to update it in bulk, on high load you are getting too big queue. Also i've did some research on the error name and looks like it related to mysql driver, also found some good topics on https://stackoverflow.com/questions/18100782/import-of-50k-records-in-mysql-gives-general-error-1390-prepared-statement-con Need to checkout how we slice array of messages to not reach the limit: Line 693 in 49891df
Debug log is not readable in this case:
the first line is really big i inserted only part of it to show how it looks, i can send full debug log if you interested in inspecting it. With best regards, |
Same problem here:
Versions: |
It only happens with a huge amount of objects (I played around with 25.000 Hosts/250.000 Services). But neither a restart of icingadb.service, a reboot of the entire system or reducing the objects (e.g. to 1 Host/10 Services) don't solve the problem. The icingadb.service still denies to start with the prior error message. |
We'll fix this asap. |
@dmitriy-terzeman @mwaldmueller could you, if possible, please test this PR (#165)? Packages:
Thanks in advance! |
@dmitriy-terzeman sorry, I did mean #165 not #65 :D |
ohh, ok :D With best regards, |
Thanks, but I still got the error:
|
Okay, please try the following:
If it still crashes after that, we'll need to dig deeper. Thanks for helping! |
@N-o-X what should be done to check if its Overdue related? i saw a lot of updates on that front |
@dmitriy-terzeman that could really be it! Try the following:
Our overdue code doesn't have any chunking mechanism, so it's something that can break for sure. |
And its most "updated" part as i can see now, since i don't have too much state changes or history (most of the checks that i have - dummy checks, i use them for simple load tests) P.S. I use 1min frequency checks, and server reload on the big environment can take more then 1min |
works in my case 👍 |
have no chance to test it right now(i destroyed cluster where i tested it) I have some ideas how to provide additional testing with this one 1# Crush emulation with restore:
2# Emulate normal work of Icinga2:
any thoughts on this one? @N-o-X |
Hi @dmitriy-terzeman, sorry for the late reply. Thanks for your ideas! I've used your testing methods and changed them up a bit. Two points on this:
In the end I came up with the following test procedure: Config:
Steps
@dmitriy-terzeman @mwaldmueller I've prepared packages (based on #168) that should fix those issues. Please test them, if you got time and a working setup.
Thanks for your help! |
@N-o-X by the way for what reasons Overdue pushed to the database, i don't see it like a state change or providing any historical or reporting information, instead it can cause to a lot of unnecessary updates in the database. Example: if i'll have a lot of checks with 10sec interval, they will be pushed to the database every 20 sec (double time of interval as i expect), it can cause a big load for "nothing", if it does not provide historical reporting but only "current status", does it make sense to disable pushing Overdues to the database and let IcingaWeb to grab that info directly from Redis? With best regards, |
@dmitriy-terzeman if your monitoring system is healthy, the should be very few database updates caused by our overdue handling. Please correct me on that, if I'm wrong. In my opinion, everything that's an important change, should be updated in the database. Overdue is one of those things - at least for me. I'm also not sure, why you would have overdue indicators synced every 20 seconds? Greetings |
Description
In high loaded environment i'm facing IcngaDB crash with the following errors:
How to reproduce:
Currently i have lack of info for deeper diagnostics and not able to reproduce it, error occurs randomly on high loaded environment with 100k services per minute.
Environment:
Standalone icinga2 master: 8 cores / 16 GB ram
2 Satellite zones in HA(4 icinga2 instances as satellites): 8 cores / 16 GB ram
1 IcingaDB + Redis on dedicated host: 8 cores / 16 GB ram
MySQL cluster on dedicated hosts: 8 cores / 16 GB ram
CentOS Linux release 7.7.1908 (Core) on each server
Amount of active services: 102010 services per minute
Please let me know if i can provide any additional debug information for you.
With best regards,
Dmitriy.
The text was updated successfully, but these errors were encountered: