Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Share "Last reload attempt failed" time across Icinga process tree on *nix #8429

Merged
merged 1 commit into from
May 30, 2023

Conversation

Al2Klimov
Copy link
Member

@Al2Klimov Al2Klimov commented Oct 30, 2020

... as only the umbrella process knows that time,
but the icinga check running in the main process also needs to know it.

fixes #8428

TODO

@icinga-probot icinga-probot bot added this to the 2.13.0 milestone Oct 30, 2020
@icinga-probot icinga-probot bot added area/checks Check execution and results area/configuration DSL, parser, compiler, error handling bug Something isn't working labels Oct 30, 2020
@Al2Klimov Al2Klimov force-pushed the bugfix/last-reload-attempt-failed-8428 branch 2 times, most recently from 75f76b0 to 733d032 Compare October 30, 2020 16:41
@Al2Klimov Al2Klimov requested a review from N-o-X November 13, 2020 16:14
@Al2Klimov Al2Klimov modified the milestones: 2.13.0, 2.14.0 Jun 2, 2021
@Al2Klimov
Copy link
Member Author

@cla-bot check

@Al2Klimov
Copy link
Member Author

@julianbrost Would you prefer Boost shared memory?

@julianbrost
Copy link
Contributor

The deal with shared memory across processes is that if you use it, you start relying on implementation-defined behavior. This answer on Stack Overflow provides a pretty good analysis for std::atomic<> in shared memory. Summary being that if it's lock-free (which in itself isn't guaranteed), then it should probably be fine (but also, not guaranteed).

So in general, I prefer to avoid shared memory if feasible but this boils down to whether we have a suitable alternative. In this sense, it's closely related to #9445: the timestamp here should be the same as the last time an existing worker was allowed to process config updates in #9445. As we consider both PRs for 2.14, I'll take a closer look at both soon and come back to this once I know more.

If the answer is shared memory, it should probably be an abstraction like in 58c15bc. And whatever is communicated over that should be as safe as possible, that should mean at least testing or asserting that std::atomic<double> is lock free or having a look if Boost provides something suitable (i.e. they did the testing for us).

@Al2Klimov Al2Klimov force-pushed the bugfix/last-reload-attempt-failed-8428 branch from 733d032 to ebceb02 Compare April 13, 2023 15:41
@Al2Klimov
Copy link
Member Author

Don't worry. Nobody's gonna support Icinga on your toaster. It runs NetBSD doesn’t mean it runs Icinga :)

@Al2Klimov Al2Klimov force-pushed the bugfix/last-reload-attempt-failed-8428 branch 2 times, most recently from b078222 to ebceb02 Compare April 13, 2023 15:48
@Al2Klimov Al2Klimov self-assigned this Apr 13, 2023
@Al2Klimov
Copy link
Member Author

GHA is wrong. Works for me.

@Al2Klimov Al2Klimov removed their assignment Apr 14, 2023
@Al2Klimov Al2Klimov force-pushed the bugfix/last-reload-attempt-failed-8428 branch 3 times, most recently from a6c82de to b994c03 Compare May 5, 2023 13:33
@Al2Klimov
Copy link
Member Author

Compiles even on my favourite marginal OS. 🎉

… *nix

... as only the umbrella process knows that time,
but the icinga check running in the main process also needs to know it.

refs #8428
@Al2Klimov Al2Klimov force-pushed the bugfix/last-reload-attempt-failed-8428 branch from b994c03 to 5c330e9 Compare May 8, 2023 12:42
@Al2Klimov Al2Klimov removed their assignment May 8, 2023
@Al2Klimov Al2Klimov marked this pull request as ready for review May 8, 2023 12:44
@Al2Klimov Al2Klimov requested a review from julianbrost May 9, 2023 07:57
@Al2Klimov
Copy link
Member Author

Apropos.

@sthen Consider disabling unity builds by default (net/icinga/core2). I did this locally in Makefile and was able to compile Icinga + deps (actually deps + Icinga ;) ) with only 2G RAM.

@sthen
Copy link
Contributor

sthen commented May 12, 2023 via email

@Al2Klimov Al2Klimov mentioned this pull request May 15, 2023
3 tasks
@Al2Klimov Al2Klimov self-assigned this May 16, 2023
@Al2Klimov
Copy link
Member Author

Even this isn’t working.

@yhabteab Any idea why?

@yhabteab
Copy link
Member

Even this isn’t working.

@yhabteab Any idea why?

Boost warns about the following problem:

When constructing a class with static members, each process has its own copy of the static member, so updating a static member in one process does not change the value of the static member the another process. So be careful with these classes. Static members are not dangerous if they are just constant variables initialized when the process starts, but they don't change at all (for example, when used like enums) and their value is the same for all processes.

It seems that you have encountered exactly the same problem here. When the worker process is initially launched, it obtains a copy of this member variable and it won't notice that it's meanwhile updated by other (the umbrella) processes.

@Al2Klimov Al2Klimov removed the request for review from julianbrost May 17, 2023 08:34
@Al2Klimov Al2Klimov marked this pull request as draft May 17, 2023 08:35
@Al2Klimov Al2Klimov force-pushed the bugfix/last-reload-attempt-failed-8428 branch from 51f01a4 to 5c330e9 Compare May 17, 2023 10:00
@Al2Klimov Al2Klimov marked this pull request as ready for review May 17, 2023 10:01
@Al2Klimov Al2Klimov removed their assignment May 17, 2023
@Al2Klimov Al2Klimov requested a review from julianbrost May 17, 2023 10:01
@Al2Klimov
Copy link
Member Author

Even this isn’t working.

Just because I've reloaded not the node the check was scheduled on. 🐘

A simple icinga service, a config syntax error, a reload – and it complains.

@julianbrost
Copy link
Contributor

Just because I've reloaded not the node the check was scheduled on.

Not only for the intermediate versions you pushed but also with the previous version? So every "this doesn't work" you said this week was a false alarm?

@Al2Klimov
Copy link
Member Author

Especially with the previous version! I.e. 5c330e9

Copy link
Member

@yhabteab yhabteab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually works well on my end, and the code is also (to me) fine compared to the previous version.

@julianbrost julianbrost merged commit b0899d9 into master May 30, 2023
@icinga-probot icinga-probot bot deleted the bugfix/last-reload-attempt-failed-8428 branch May 30, 2023 09:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/checks Check execution and results area/configuration DSL, parser, compiler, error handling bug Something isn't working cla/signed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

icinga check: "Last reload attempt failed at ..." never appears on *nix
4 participants