Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ConfigItem: Fix infinite recursion caused by ignore_on_error when … #9406

Merged
merged 2 commits into from
Oct 13, 2022

Conversation

yhabteab
Copy link
Member

@yhabteab yhabteab commented Jun 20, 2022

…committing an item

When committing an item with ignore_on_error flag set fails, the Commit() method only returns nullptr
and the current item is not being dropped from m_Items. CommittNewItems() also doesn't check the return
value of Commit() but just continues and tries to commit all items from m_Items in recursive call. Since
this corrupt item is never removed from m_Items, it ends up in an endless recursion till it finally crashes.

Test config

object Host "dummy-check" ignore_on_error {
    address = "localhost"
}

Before

see #8824 (comment)

And....:

~/Workspace/icinga2 (master ✗) prefix/sbin/icinga2 daemon -C
[2022-06-20 14:38:46 +0200] information/cli: Icinga application loader (version: v2.13.0-312-g452252244; debug)
[2022-06-20 14:38:46 +0200] information/cli: Loading configuration file(s).
[2022-06-20 14:38:46 +0200] information/ConfigItem: Committing config item(s).
[2022-06-20 14:38:46 +0200] information/ApiListener: My API identity: satellite
[1]    74144 segmentation fault  prefix/sbin/icinga2 daemon -C

Performance

Check this out:

[2022-06-20 12:43:44 +0200] information/WorkQueue: #3 (DaemonUtility::LoadConfigFiles) items: 4, rate: 3.13333/s (188/min 324/5min 324/15min); empty in infinite time, your task handler isn't able to keep up
[2022-06-20 10:58:44 +0200] information/cli: Icinga application loader (version: v2.13.0-312-g452252244; debug)
[2022-06-20 10:58:44 +0200] information/cli: Loading configuration file(s).
[2022-06-20 10:58:50 +0200] information/ConfigItem: Committing config item(s).
[2022-06-20 10:58:50 +0200] information/ApiListener: My API identity: satellite
[2022-06-20 10:59:00 +0200] information/WorkQueue: #3 (DaemonUtility::LoadConfigFiles) items: 0, rate: 2.2/s (132/min 132/5min 132/15min);
[2022-06-20 10:59:00 +0200] information/WorkQueue: #4 (ApiListener, RelayQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2022-06-20 10:59:00 +0200] information/WorkQueue: #5 (ApiListener, SyncQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2022-06-20 11:00:10 +0200] information/WorkQueue: #3 (DaemonUtility::LoadConfigFiles) items: 4, rate: 3.2/s (192/min 324/5min 324/15min); empty in 10 seconds
[2022-06-20 11:04:00 +0200] information/WorkQueue: #3 (DaemonUtility::LoadConfigFiles) items: 0, rate:  0/s (0/min 1576/5min 1708/15min);
[2022-06-20 11:04:00 +0200] information/WorkQueue: #5 (ApiListener, SyncQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2022-06-20 11:04:00 +0200] information/WorkQueue: #4 (ApiListener, RelayQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2022-06-20 11:04:25 +0200] information/ConfigItem: Instantiated 50012 Notifications.
[2022-06-20 11:04:25 +0200] information/ConfigItem: Instantiated 1 ScheduledDowntime.
[2022-06-20 11:04:25 +0200] information/ConfigItem: Instantiated 100011 Services.
[2022-06-20 11:04:25 +0200] information/ConfigItem: Instantiated 1 ApiListener.
[2022-06-20 11:04:25 +0200] information/ConfigItem: Instantiated 244 CheckCommands.
[2022-06-20 11:04:25 +0200] information/ConfigItem: Instantiated 1 NotificationComponent.
[2022-06-20 11:04:25 +0200] information/ConfigItem: Instantiated 1 CheckerComponent.
[2022-06-20 11:04:25 +0200] information/ConfigItem: Instantiated 1 FileLogger.
[2022-06-20 11:04:25 +0200] information/ConfigItem: Instantiated 1 User.
[2022-06-20 11:04:25 +0200] information/ConfigItem: Instantiated 2 Endpoints.
[2022-06-20 11:04:25 +0200] information/ConfigItem: Instantiated 50001 Hosts.
[2022-06-20 11:04:25 +0200] information/ConfigItem: Instantiated 2 HostGroups.
[2022-06-20 11:04:25 +0200] information/ConfigItem: Instantiated 1 IcingaApplication.
[2022-06-20 11:04:25 +0200] information/ConfigItem: Instantiated 2 NotificationCommands.
[2022-06-20 11:04:25 +0200] information/ConfigItem: Instantiated 4 Zones.
[2022-06-20 11:04:25 +0200] information/ConfigItem: Instantiated 3 ServiceGroups.
[2022-06-20 11:04:25 +0200] information/ConfigItem: Instantiated 3 TimePeriods.
[2022-06-20 11:04:25 +0200] information/ConfigItem: Instantiated 1 UserGroup.
[2022-06-20 11:04:26 +0200] information/ScriptGlobal: Dumping variables to file '/Users/yhabteab/CLionProjects/icinga2/prefix/var/cache/icinga2/icinga2.vars'
[2022-06-20 11:04:26 +0200] information/cli: Finished validating the configuration file(s).
prefix/sbin/icinga2 daemon -C  504.89s user 18.22s system 150% cpu 5:48.26 total

After

[2022-06-24 10:21:57 +0200] notice/ConfigObject: Ignoring config object 'dummy-check' of type 'Host' due to errors: Error: Validation failed for object 'dummy-check' of type 'Host'; Attribute 'check_command': Attribute must not be empty.
Location: in /Users/yhabteab/Workspace/icinga2/prefix/etc/icinga2/zones.d/master/test-config.conf: 27:1-27:41
/Users/yhabteab/Workspace/icinga2/prefix/etc/icinga2/zones.d/master/test-config.conf(25): }
/Users/yhabteab/Workspace/icinga2/prefix/etc/icinga2/zones.d/master/test-config.conf(26): 
/Users/yhabteab/Workspace/icinga2/prefix/etc/icinga2/zones.d/master/test-config.conf(27): object Host "dummy-check" ignore_on_error {
                                                                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
/Users/yhabteab/Workspace/icinga2/prefix/etc/icinga2/zones.d/master/test-config.conf(28):   address = "localhost"
/Users/yhabteab/Workspace/icinga2/prefix/etc/icinga2/zones.d/master/test-config.conf(29): }
[2022-06-24 10:21:57 +0200] information/ConfigItem: Instantiated 11 Notifications.
[2022-06-24 10:21:57 +0200] information/ConfigItem: Instantiated 10 Services.
[2022-06-24 10:21:57 +0200] information/ConfigItem: Instantiated 1 ApiListener.
[2022-06-24 10:21:57 +0200] information/ConfigItem: Instantiated 244 CheckCommands.
[2022-06-24 10:21:57 +0200] information/ConfigItem: Instantiated 1 FileLogger.
[2022-06-24 10:21:57 +0200] information/ConfigItem: Instantiated 2 ApiUsers.
[2022-06-24 10:21:57 +0200] information/ConfigItem: Instantiated 1 Endpoint.
[2022-06-24 10:21:57 +0200] information/ConfigItem: Instantiated 1 CheckerComponent.
[2022-06-24 10:21:57 +0200] information/ConfigItem: Instantiated 1 Host.
[2022-06-24 10:21:57 +0200] information/ConfigItem: Instantiated 1 IcingaApplication.
[2022-06-24 10:21:57 +0200] information/ConfigItem: Instantiated 1 User.
[2022-06-24 10:21:57 +0200] information/ConfigItem: Instantiated 2 NotificationCommands.
[2022-06-24 10:21:57 +0200] information/ConfigItem: Instantiated 3 Zones.
[2022-06-24 10:21:57 +0200] information/ConfigItem: Instantiated 4 TimePeriods.
[2022-06-24 10:21:57 +0200] information/ConfigItem: Instantiated 1 NotificationComponent.
[2022-06-24 10:21:57 +0200] information/ScriptGlobal: Dumping variables to file '/Users/yhabteab/Workspace/icinga2/prefix/var/cache/icinga2/icinga2.vars'
[2022-06-24 10:21:57 +0200] notice/WorkQueue: Stopped WorkQueue threads for 'DaemonUtility::LoadConfigFiles'
[2022-06-24 10:21:57 +0200] information/cli: Finished validating the configuration file(s)

Performance

 time prefix/sbin/icinga2 daemon -C    
[2022-06-20 10:55:23 +0200] information/cli: Icinga application loader (version: v2.13.0-313-ge8cd42d12; debug)
[2022-06-20 10:55:23 +0200] information/cli: Loading configuration file(s).
[2022-06-20 10:55:29 +0200] information/ConfigItem: Committing config item(s).
[2022-06-20 10:55:29 +0200] information/ApiListener: My API identity: yonass-mbp.fritz.box
[2022-06-20 10:55:39 +0200] information/WorkQueue: #4 (DaemonUtility::LoadConfigFiles) items: 0, rate: 2.33333/s (140/min 140/5min 140/15min);
[2022-06-20 10:55:39 +0200] information/WorkQueue: #5 (ApiListener, RelayQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2022-06-20 10:55:39 +0200] information/WorkQueue: #6 (ApiListener, SyncQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2022-06-20 10:58:03 +0200] information/ConfigItem: Instantiated 1 ApiListener.
[2022-06-20 10:58:03 +0200] information/ConfigItem: Instantiated 244 CheckCommands.
[2022-06-20 10:58:03 +0200] information/ConfigItem: Instantiated 1 FileLogger.
[2022-06-20 10:58:03 +0200] information/ConfigItem: Instantiated 2 ApiUsers.
[2022-06-20 10:58:03 +0200] information/ConfigItem: Instantiated 1 Endpoint.
[2022-06-20 10:58:03 +0200] information/ConfigItem: Instantiated 1 CheckerComponent.
[2022-06-20 10:58:03 +0200] information/ConfigItem: Instantiated 50001 Hosts.
[2022-06-20 10:58:03 +0200] information/ConfigItem: Instantiated 2 HostGroups.
[2022-06-20 10:58:03 +0200] information/ConfigItem: Instantiated 1 IcingaApplication.
[2022-06-20 10:58:03 +0200] information/ConfigItem: Instantiated 1 User.
[2022-06-20 10:58:03 +0200] information/ConfigItem: Instantiated 2 NotificationCommands.
[2022-06-20 10:58:03 +0200] information/ConfigItem: Instantiated 3 Zones.
[2022-06-20 10:58:03 +0200] information/ConfigItem: Instantiated 3 ServiceGroups.
[2022-06-20 10:58:03 +0200] information/ConfigItem: Instantiated 4 TimePeriods.
[2022-06-20 10:58:03 +0200] information/ConfigItem: Instantiated 1 UserGroup.
[2022-06-20 10:58:03 +0200] information/ConfigItem: Instantiated 1 NotificationComponent.
[2022-06-20 10:58:03 +0200] information/ConfigItem: Instantiated 1 Downtime.
[2022-06-20 10:58:03 +0200] information/ConfigItem: Instantiated 50012 Notifications.
[2022-06-20 10:58:03 +0200] information/ConfigItem: Instantiated 1 ScheduledDowntime.
[2022-06-20 10:58:03 +0200] information/ConfigItem: Instantiated 100011 Services.
[2022-06-20 10:58:03 +0200] information/ScriptGlobal: Dumping variables to file '/Users/yhabteab/Workspace/icinga2/prefix/var/cache/icinga2/icinga2.vars'
[2022-06-20 10:58:03 +0200] information/cli: Finished validating the configuration file(s).
prefix/sbin/icinga2 daemon -C  444.81s user 9.16s system 271% cpu 2:47.24 total

fixes #8824

@yhabteab yhabteab requested a review from julianbrost June 20, 2022 10:25
@cla-bot cla-bot bot added the cla/signed label Jun 20, 2022
@icinga-probot icinga-probot bot added area/configuration DSL, parser, compiler, error handling bug Something isn't working labels Jun 20, 2022
@julianbrost
Copy link
Contributor

Please update the PR description to specify the config that was used in each case and clearly state when a process did not terminate when it should or when it crashed.

Also you have a typo in the PR title and commit message, you might want to fix that as well.

@yhabteab yhabteab changed the title ConifigItem: Fix infinite recursion caused by ignore_on_error when … ConfigItem: Fix infinite recursion caused by ignore_on_error when … Jun 20, 2022
@yhabteab yhabteab force-pushed the bugfix/handle-ignore-on-error-uses-cases-properly branch from e8cd42d to 4f7edd6 Compare June 20, 2022 12:43
@yhabteab
Copy link
Member Author

Please update the PR description to specify the config that was used in each case and clearly state when a process did not terminate when it should or when it crashed.

Updated! However, right now it just crashes, though you can check the referenced issue comment to see this repetition over and over again. You can also test the config snippet in notice mode (icinga2 daemon -C -x notice) and you will see the validation error for a few seconds before it crashes.

@yhabteab yhabteab force-pushed the bugfix/handle-ignore-on-error-uses-cases-properly branch 2 times, most recently from a678466 to 0249b36 Compare June 20, 2022 12:59
@julianbrost
Copy link
Contributor

One thing I don't yet understand: Minimal example to reproduce this seems to be something like object Host "x" ignore_on_error {}, however when doing object Service "x" ignore_on_error {}, things work perfectly fine here. What's the difference between Host and Service here? Both fail the validation due to missing check_command.

@julianbrost
Copy link
Contributor

The difference between Host and Service objects exists because services are considered "unnamed items" (not because they don't have a name, but at that stage of config loading, the name is not yet known because it depends on the host name which only is known after evaluating the service body). And items are handled differently based on whether they are considered named. If they aren't, committing is tried only once (due to the swap at the end):

ItemList newUnnamedItems;
for (const ConfigItem::Ptr& item : m_UnnamedItems) {
if (item->m_ActivationContext != context) {
newUnnamedItems.push_back(item);
continue;
}
if (item->m_Abstract || item->m_Object)
continue;
items.emplace_back(item, true);
}
m_UnnamedItems.swap(newUnnamedItems);

So unregistering named items seems to indeed be the correct thing to do here.

@julianbrost
Copy link
Contributor

julianbrost commented Jul 20, 2022

Please also share the config you used for your performance test.

prefix/sbin/icinga2 daemon -C 504.89s user 18.22s system 150% cpu 5:48.26 total

prefix/sbin/icinga2 daemon -C 444.81s user 9.16s system 271% cpu 2:47.24 total

I don't think the difference in these numbers is plausible for your change on the newItems vector. After all, the the number of append operations should stay identical (unless you're discarding lots of objects with ignore_on_error) and also unlikely to explain 60s in CPU time difference, so I rather expect there was some strong background noise while you tested this.

The change to newItems also makes a slight difference in behavior: if something is skipped, it is now no longer added there (which sounds desirable).

@yhabteab yhabteab force-pushed the bugfix/handle-ignore-on-error-uses-cases-properly branch from 0249b36 to 2282553 Compare September 7, 2022 09:24
@yhabteab
Copy link
Member Author

yhabteab commented Sep 7, 2022

Please also share the config you used for your performance test.

Config
object CheckCommand "dummy" {
  import "dummy-check-command"

  vars.dummy_state = 0
  vars.dummy_text = "Check was successful."
}

template Host "generic-host" {
  max_check_attempts = 3
  check_interval = 1m
  retry_interval = 30s

  check_command = "dummy"
}

template Service "generic-service" {
  max_check_attempts = 5
  check_interval = 1m
  retry_interval = 30s
}

object Host NodeName {
  import "generic-host"

  address = "127.0.0.1"
  address6 = "::1"

  vars.os = "Linux"
}

for (idx in range(20000)) {
  if ((idx % 2) == 0) {
    var hostname = "icinga-" + string(idx)
  } else {
    var hostname = "dummy-" + string(idx)
  }

  object Host hostname {
    import "generic-host"

    address = "127.0.0.1"

    if (match("*icinga*", this.name)) {
      address = "10.211.55.31"
    }

    vars.is_dummy = true
  }
}

for (i in range(10)) {
  apply Service "random" + i {
    import "generic-service"
    check_command = "dummy"

    assign where true
  }
}

Tests:

Before
~/Workspace/icinga2 (bugfix/handle-ignore-on-error-uses-cases-properly ✗) time prefix/sbin/icinga2 daemon -C
[2022-09-07 11:07:43 +0200] information/cli: Icinga application loader (version: v2.13.0-429-g86b63a57a; debug)
[2022-09-07 11:07:43 +0200] information/cli: Loading configuration file(s).
[2022-09-07 11:07:44 +0200] information/ConfigItem: Committing config item(s).
[2022-09-07 11:07:44 +0200] information/ApiListener: My API identity: yonass-mbp.int.netways.de
[2022-09-07 11:07:54 +0200] information/WorkQueue: #4 (DaemonUtility::LoadConfigFiles) items: 0, rate: 2.2/s (132/min 132/5min 132/15min);
[2022-09-07 11:07:54 +0200] information/WorkQueue: #6 (ApiListener, SyncQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2022-09-07 11:07:54 +0200] information/WorkQueue: #5 (ApiListener, RelayQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2022-09-07 11:10:52 +0200] information/ConfigItem: Instantiated 1 ApiListener.
[2022-09-07 11:10:52 +0200] information/ConfigItem: Instantiated 1 CheckCommand.
[2022-09-07 11:10:52 +0200] information/ConfigItem: Instantiated 1 NotificationComponent.
[2022-09-07 11:10:52 +0200] information/ConfigItem: Instantiated 1 CheckerComponent.
[2022-09-07 11:10:52 +0200] information/ConfigItem: Instantiated 1 FileLogger.
[2022-09-07 11:10:52 +0200] information/ConfigItem: Instantiated 1 ApiUser.
[2022-09-07 11:10:52 +0200] information/ConfigItem: Instantiated 1 Endpoint.
[2022-09-07 11:10:52 +0200] information/ConfigItem: Instantiated 20001 Hosts.
[2022-09-07 11:10:52 +0200] information/ConfigItem: Instantiated 1 IcingaApplication.
[2022-09-07 11:10:52 +0200] information/ConfigItem: Instantiated 3 Zones.
[2022-09-07 11:10:52 +0200] information/ConfigItem: Instantiated 1 IcingaDB.
[2022-09-07 11:10:52 +0200] information/ConfigItem: Instantiated 200010 Services.
[2022-09-07 11:10:52 +0200] information/ScriptGlobal: Dumping variables to file '/Users/yhabteab/Workspace/icinga2/prefix/var/cache/icinga2/icinga2.vars'
[2022-09-07 11:10:52 +0200] information/cli: Finished validating the configuration file(s).
prefix/sbin/icinga2 daemon -C  494,60s user 9,12s system 261% cpu 3:12,90 total
After
~/Workspace/icinga2 (bugfix/handle-ignore-on-error-uses-cases-properly ✗) time prefix/sbin/icinga2 daemon -C
[2022-09-07 11:20:49 +0200] information/cli: Icinga application loader (version: v2.13.0-431-g22825535d; debug)
[2022-09-07 11:20:49 +0200] information/cli: Loading configuration file(s).
[2022-09-07 11:20:51 +0200] information/ConfigItem: Committing config item(s).
[2022-09-07 11:20:51 +0200] information/ApiListener: My API identity: yonass-mbp.int.netways.de
[2022-09-07 11:21:01 +0200] information/WorkQueue: #4 (DaemonUtility::LoadConfigFiles) items: 0, rate: 2.2/s (132/min 132/5min 132/15min);
[2022-09-07 11:21:01 +0200] information/WorkQueue: #6 (ApiListener, RelayQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2022-09-07 11:21:01 +0200] information/WorkQueue: #7 (ApiListener, SyncQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2022-09-07 11:23:39 +0200] information/ConfigItem: Instantiated 1 IcingaDB.
[2022-09-07 11:23:39 +0200] information/ConfigItem: Instantiated 1 ApiListener.
[2022-09-07 11:23:39 +0200] information/ConfigItem: Instantiated 1 CheckCommand.
[2022-09-07 11:23:39 +0200] information/ConfigItem: Instantiated 1 NotificationComponent.
[2022-09-07 11:23:39 +0200] information/ConfigItem: Instantiated 1 CheckerComponent.
[2022-09-07 11:23:39 +0200] information/ConfigItem: Instantiated 1 FileLogger.
[2022-09-07 11:23:39 +0200] information/ConfigItem: Instantiated 1 ApiUser.
[2022-09-07 11:23:39 +0200] information/ConfigItem: Instantiated 1 Endpoint.
[2022-09-07 11:23:39 +0200] information/ConfigItem: Instantiated 20001 Hosts.
[2022-09-07 11:23:39 +0200] information/ConfigItem: Instantiated 1 IcingaApplication.
[2022-09-07 11:23:39 +0200] information/ConfigItem: Instantiated 3 Zones.
[2022-09-07 11:23:39 +0200] information/ConfigItem: Instantiated 200010 Services.
[2022-09-07 11:23:39 +0200] information/ScriptGlobal: Dumping variables to file '/Users/yhabteab/Workspace/icinga2/prefix/var/cache/icinga2/icinga2.vars'
[2022-09-07 11:23:39 +0200] information/cli: Finished validating the configuration file(s).
prefix/sbin/icinga2 daemon -C  482,10s user 7,73s system 282% cpu 2:53,49 total

lib/config/configitem.cpp Outdated Show resolved Hide resolved
lib/config/configitem.cpp Outdated Show resolved Hide resolved
…ommitting an item

When committing an item with `ignore_on_error` flag set fails, the `Commit()` method only returns `nullptr`
and the current item is not being dropped from `m_Items`. `CommittNewItems()` also doesn't check the return
value of `Commit()` but just continues and tries to commit all items from `m_Items` in recursive call. Since
this corrupt item is never removed from `m_Items`, it ends up in an endless recursion till it finally crashes.
This also improves the performance a bit, as we longer have to iterate over the items
and copy them into the new items vector.
@yhabteab yhabteab force-pushed the bugfix/handle-ignore-on-error-uses-cases-properly branch from 2282553 to 400117e Compare October 12, 2022 11:27
@yhabteab yhabteab requested a review from julianbrost October 12, 2022 11:28
@julianbrost
Copy link
Contributor

By the way, I didn't really consider the claimed performance improvement in the review as this PR isn't about performance and it doesn't look like it makes things slower. But I invite you to try if you can actually reproduce your numbers, i.e. run your test a few more times and see if one is indeed consistently faster.

@julianbrost julianbrost merged commit dd7009d into master Oct 13, 2022
@icinga-probot icinga-probot bot deleted the bugfix/handle-ignore-on-error-uses-cases-properly branch October 13, 2022 07:57
@yhabteab yhabteab added the backported Fix was included in a bugfix release label Oct 31, 2022
@julianbrost julianbrost added this to the 2.14.0 milestone Nov 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/configuration DSL, parser, compiler, error handling backported Fix was included in a bugfix release bug Something isn't working cla/signed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Creation an object with ignore_on_error flag set causes an infinite recursion
2 participants