-
Notifications
You must be signed in to change notification settings - Fork 580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Acknowledgement-Sync between masters are not working #9652
Comments
ref/NC/773490 |
ref/IP/44076 |
Please provide step-by-step instructions like
|
Notes
|
Not fully reproduced yet, but a step in the right direction:
|
I think this is our problem: icinga2/lib/icinga/checkable-check.cpp Lines 273 to 276 in 8281527
|
Working theory: Master2...
|
@julianbrost If you don’t mind, I'd restrict the if shown above to are we not syncing replay log now. |
Sounds fine. The worst that could happen then would be left-over ack comment that's cleaned up on the next check result (if the checkable isn't acknowledged then). If there's a ack - recovery - ack again sequence in the replay log, you could possibly end up with two ack comments. This could additionally be addressed by deleting all non-persistent ack comments except the newest if the checkable is acked. |
How to reproduce
I love it. Not.
|
@Wintermute2k6 Disappointing. Once @julianbrost approves #9718 feel free to test https://git.icinga.com/packages/icinga2/-/jobs/291890/artifacts/download |
@Wintermute2k6 Only code style things are left, so go for it. |
@Al2Klimov Feedback from Affected Users from the latest Artifacts: Leider keine Verbesserung. Im Prinzip das selbe Verhalten wie zu Beginn. Master1: Bestätigt, aber kein Kommentar mehr und in der History der Eintrag, dass die Bestätigung entfert wurde |
@Al2Klimov any progress with a newer version to test after the last one also didn't work out ? |
Notes
Call me blind or stupid, but if I didn’t completely screwed up #9718, it has to fix the problem. Unconditionally. I see two options (not mutually exclusive):
|
@Wintermute2k6 Please wait with that, I'll put log messages next to all the bullet points above. Also: Did the customer install the packages on both masters and restarted at least master1? (master2 doesn’t matter, it's stopped and started after X hours anyway.) |
The notes look plausible (I haven't cross-checked the code though). If the patched versions were installed correctly and the issue still appears, then new/extended debug logs are probably a good idea as we may have overlooked something and that could give us hints. |
CC @tbauriedel |
Hi guys, I would like to refer to my bug from last month again. We are still not able to reproduce the described behaviour. |
#9718 is on 2.14. Once 2.14 is released you can test that. |
Describe the bug
We had to reinstall our Icinga-Cluster from before CentOS 7 with RHEL9, Icinga 2.13.6, Icingaweb2 2.11.2 now. We did this step by step - that means no server in parallel. During the installation from our master servers we notized that the acknowledgements was not synched between the downtime from one of the master. Especially the acknowledgements which we had to do during the downtime of the setup for some monitored hosts and services objects.
If the second master goes up again it happens that the done acknowledgements gots removed. That means the comment was deleted and only the service was marked as acknowledged. BUT on the other server which was reinstalled the same host/service object was not marked as acknowledged.
e.g.
log_duration is set to 3d on every cluster node. During the observation the log_duration on the masters was set to the standard value: 1d. So there is no node where this is deactivated.
We tried to reproduce this on the test system. We stopped the master 2 for a specific time and did some acknowlegement in the meanwhile. This is the result:
I know this should be fixed like comment there https://community.icinga.com/t/acknowledgements-not-syncing-between-masters/6220/32. But it looks like this bug is back.
The text was updated successfully, but these errors were encountered: