Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How is the "Monitoring Status" calculated? #24

Open
kristofver opened this issue Jan 15, 2019 · 6 comments
Open

How is the "Monitoring Status" calculated? #24

kristofver opened this issue Jan 15, 2019 · 6 comments
Labels

Comments

@kristofver
Copy link

Allthough in the host "Monitoring" tab all services are shown as OK, the "Monitoring Status" shows Critical.
In Icinga2, everything is green.
I'm using FQDNs both in Icinga and The Foreman.
I can see 2 things that might influence this:

  • I'm not using hostalive as check_command but cluster-zone which results in a status message like "Zone '***' is connected. Log lag: less than 1 millisecond"
  • I have a servername.domain and servername.oob.domain dns record. The host is known to Foreman and Icinga as servername.domain but I'm using the IP-address linked to servername.oob.domain in Icinga for the monitoring over an out-of-band interface. Could this confuse the monitoring smart proxy? Why would it only influence the monitoring status and not the service statuses? Is the monitoring proxy using interfaces information from Foreman in some way?

Any ideas on this?
kind regards,
Kristof

@timogoebel
Copy link
Member

@kristofver
Copy link
Author

Given the to_status function, can u see a reason why a host with below monitoring results could still have Monitoring Status Critical?

foreman=# select * from monitoring_results where host_id = 15;
 id  | host_id |                service                 | result | downtime | acknowledged |         timestamp
-----+---------+----------------------------------------+--------+----------+--------------+----------------------------
 293 |      15 | mem                                    |      0 | f        | f            | 2019-01-15 08:59:38.675023
 176 |      15 | Host Check                             |      0 | f        | f            | 2019-01-15 09:00:28.794356
 190 |      15 | **redacted**                           |      0 | f        | f            | 2019-01-15 09:00:11.487482
 195 |      15 | **redacted**                           |      0 | f        | f            | 2019-01-15 09:00:26.921551
 221 |      15 | ping4                                  |      0 | f        | f            | 2019-01-15 08:59:56.651465
 232 |      15 | ntp                                    |      0 | f        | f            | 2019-01-15 09:00:28.403995
 234 |      15 | disk                                   |      0 | f        | f            | 2019-01-15 09:00:06.08392
 263 |      15 | ssh                                    |      0 | f        | f            | 2019-01-15 08:59:57.859941
 272 |      15 | load                                   |      0 | f        | f            | 2019-01-15 09:00:04.407032
(9 rows)

@timogoebel
Copy link
Member

Can you use foreman-rake console and try the following?

Host::Managed.find_by(name: 'my-super.host.com').get_status(HostStatus::MonitoringStatus).to_status

@kristofver
Copy link
Author

That returns 0.

I've put some debug statements in the to_status function but it doesn't get called when refreshing the host properties page in the foreman. The function to_label does get called and returns Critical.

I did some more troubleshooting. The HostStatus::MonitoringStatus record is a lot older than the monitoring_results records.

foreman=# select * from monitoring_results where host_id = 26;
 id  | host_id |                       service                        | result | downtime | acknowledged |         timestamp
-----+---------+------------------------------------------------------+--------+----------+--------------+----------------------------
 286 |      26 | load                                                 |      0 | f        | f            | 2019-01-15 08:59:49.232184
 298 |      26 | disk                                                 |      0 | f        | f            | 2019-01-15 08:59:33.737705
 175 |      26 | Host Check                                           |      0 | f        | f            | 2019-01-15 09:00:24.100271
 192 |      26 | ***                                                  |      0 | f        | f            | 2019-01-15 09:00:17.077651
 196 |      26 | ***                                                  |      0 | f        | f            | 2019-01-15 08:59:59.033319
 215 |      26 | ntp                                                  |      0 | f        | f            | 2019-01-15 09:00:28.406188
 264 |      26 | ping4                                                |      0 | f        | f            | 2019-01-15 08:59:54.60494
 265 |      26 | mem                                                  |      0 | f        | f            | 2019-01-15 09:00:15.267184
 274 |      26 | ssh                                                  |      0 | f        | f            | 2019-01-15 08:59:45.435173
(9 rows)

foreman=# select * from host_status where host_id = 26;
 id  |               type                | status | host_id |        reported_at
-----+-----------------------------------+--------+---------+----------------------------
 365 | ForemanOpenscap::ComplianceStatus |      0 |      26 | 2019-01-15 09:01:12.734863
 359 | Katello::PurposeUsageStatus       |      0 |      26 | 2019-01-15 09:01:12.851821
 345 | HostStatus::BuildStatus           |      0 |      26 | 2019-01-15 09:01:12.729576
 373 | HostStatus::ExecutionStatus       |      0 |      26 | 2019-01-15 09:01:12.755233
 358 | Katello::PurposeRoleStatus        |      0 |      26 | 2019-01-15 09:01:12.82413
 363 | HostStatus::ConfigurationStatus   |      0 |      26 | 2019-01-16 18:33:42
 455 | HostStatus::MonitoringStatus      |      2 |      26 | 2019-01-13 11:12:31.693512
 356 | Katello::ErrataStatus             |      0 |      26 | 2019-01-15 09:01:12.759535
 355 | Katello::SubscriptionStatus       |      0 |      26 | 2019-01-15 09:01:12.767729
 357 | Katello::PurposeSlaStatus         |      0 |      26 | 2019-01-15 09:01:12.795652
 360 | Katello::PurposeAddonsStatus      |      0 |      26 | 2019-01-15 09:01:12.879249
 361 | Katello::PurposeStatus            |      0 |      26 | 2019-01-15 09:01:12.906951
 456 | Katello::TraceStatus              |      0 |      26 | 2019-01-15 09:01:12.911751

I then did a reboot of the monitored server. The monitoring_results records where updated, but the HostStatus::MonitoringStatus record is still the old one from 2019-01-13.

foreman=# select * from monitoring_results where host_id = 26;
 id  | host_id |                       service                        | result | downtime | acknowledged |         timestamp
-----+---------+------------------------------------------------------+--------+----------+--------------+----------------------------
 286 |      26 | load                                                 |      0 | f        | f            | 2019-01-16 19:02:44.485407
 298 |      26 | disk                                                 |      0 | f        | f            | 2019-01-16 19:02:44.50284
 175 |      26 | Host Check                                           |      0 | f        | f            | 2019-01-16 19:02:32.77024
 196 |      26 | ***                                                  |      0 | f        | f            | 2019-01-16 19:02:42.91751
 264 |      26 | ***                                                  |      0 | f        | f            | 2019-01-16 19:02:33.475878
 265 |      26 | mem                                                  |      0 | f        | f            | 2019-01-16 19:02:43.258151
 274 |      26 | ssh                                                  |      0 | f        | f            | 2019-01-16 19:02:28.217909
 192 |      26 | ***                                                  |      0 | f        | f            | 2019-01-15 09:00:17.077651
 215 |      26 | ntp                                                  |      0 | f        | f            | 2019-01-15 09:00:28.406188
(9 rows)

foreman=# select * from host_status where host_id = 26;
 id  |               type                | status | host_id |        reported_at
-----+-----------------------------------+--------+---------+----------------------------
 365 | ForemanOpenscap::ComplianceStatus |      0 |      26 | 2019-01-15 09:01:12.734863
 359 | Katello::PurposeUsageStatus       |      0 |      26 | 2019-01-15 09:01:12.851821
 345 | HostStatus::BuildStatus           |      0 |      26 | 2019-01-16 18:23:30.864883
 373 | HostStatus::ExecutionStatus       |      0 |      26 | 2019-01-15 09:01:12.755233
 358 | Katello::PurposeRoleStatus        |      0 |      26 | 2019-01-15 09:01:12.82413
 363 | HostStatus::ConfigurationStatus   |      2 |      26 | 2019-01-16 19:02:18
 455 | HostStatus::MonitoringStatus      |      2 |      26 | 2019-01-13 11:12:31.693512
 356 | Katello::ErrataStatus             |      0 |      26 | 2019-01-15 09:01:12.759535
 355 | Katello::SubscriptionStatus       |      0 |      26 | 2019-01-15 09:01:12.767729
 357 | Katello::PurposeSlaStatus         |      0 |      26 | 2019-01-15 09:01:12.795652
 360 | Katello::PurposeAddonsStatus      |      0 |      26 | 2019-01-15 09:01:12.879249
 361 | Katello::PurposeStatus            |      0 |      26 | 2019-01-15 09:01:12.906951
 456 | Katello::TraceStatus              |      0 |      26 | 2019-01-15 09:01:12.911751
(13 rows)

Any clues? When would the host_status record normally be updated? Is this something the monitoring smart proxy does?

BTW, thanks for your support!

@timogoebel
Copy link
Member

timogoebel commented Jan 17, 2019

I suspect that the status is not properly refreshed.
Can you try if this helps?

Host::Managed.find_by(name: 'my-super.host.com').get_status(HostStatus::MonitoringStatus).refresh!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants