Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Satellite reload fails with Systemd on config sync (git master only) #6127

Closed
dnsmichi opened this issue Feb 27, 2018 · 8 comments · Fixed by #6163
Closed

Satellite reload fails with Systemd on config sync (git master only) #6127

dnsmichi opened this issue Feb 27, 2018 · 8 comments · Fixed by #6163
Assignees
Labels
area/distributed Distributed monitoring (master, satellites, clients) bug Something isn't working
Milestone

Comments

@dnsmichi
Copy link
Contributor

dnsmichi commented Feb 27, 2018

Expected Behavior

Automated config reload takes place.

Current Behavior

Config reload takes place, but child process cannot take over as the socket is not entirely closed.

[root@icinga2-satellite1 ~]# ps aux | grep icinga2
root     12267  0.0  0.0 112660   976 pts/0    S+   16:46   0:00 grep --color=auto icinga2
[root@icinga2-satellite1 ~]# systemctl status icinga2
● icinga2.service - Icinga host/service/network monitoring system
   Loaded: loaded (/usr/lib/systemd/system/icinga2.service; enabled; vendor preset: disabled)
   Active: reloading (reload) (Result: exit-code) since Tue 2018-02-27 16:26:00 CET; 26min ago
  Process: 11248 ExecStart=/usr/sbin/icinga2 daemon -e ${ICINGA2_ERROR_LOG} (code=exited, status=1/FAILURE)
  Process: 10019 ExecStartPre=/usr/lib/icinga2/prepare-dirs /etc/sysconfig/icinga2 (code=exited, status=0/SUCCESS)
 Main PID: 11248 (code=exited, status=1/FAILURE)
   CGroup: /system.slice/icinga2.service

Feb 27 16:28:15 icinga2-satellite1.vagrant.demo.icinga.com icinga2[10025]: [2018-02-27 16:28:15 +0100] information/ApiListener: Finished syncing endpoint 'icing...aster'.
Feb 27 16:28:15 icinga2-satellite1.vagrant.demo.icinga.com icinga2[10025]: [2018-02-27 16:28:15 +0100] information/ApiListener: Applying config update from endp...aster'.
Feb 27 16:28:15 icinga2-satellite1.vagrant.demo.icinga.com icinga2[10025]: [2018-02-27 16:28:15 +0100] information/ApiListener: Updating configuration file: /va...mestamp
Feb 27 16:28:15 icinga2-satellite1.vagrant.demo.icinga.com systemd[1]: icinga2.service: Supervising process 11248 which is not our child. We'll most likely not ... exits.
Feb 27 16:28:15 icinga2-satellite1.vagrant.demo.icinga.com icinga2[10025]: [2018-02-27 16:28:15 +0100] information/ApiListener: Updating configuration file: /va...ts.conf
Feb 27 16:28:15 icinga2-satellite1.vagrant.demo.icinga.com icinga2[10025]: [2018-02-27 16:28:15 +0100] information/ApiListener: Applying configuration file upda...02212).
Feb 27 16:28:15 icinga2-satellite1.vagrant.demo.icinga.com icinga2[10025]: [2018-02-27 16:28:15 +0100] information/ApiListener: Restarting after configuration change.
Feb 27 16:28:15 icinga2-satellite1.vagrant.demo.icinga.com icinga2[10025]: [2018-02-27 16:28:15 +0100] information/Application: Got reload command: Starting new instance.
Feb 27 16:28:15 icinga2-satellite1.vagrant.demo.icinga.com icinga2[10025]: [2018-02-27 16:28:15 +0100] information/Application: Reload requested, letting new pr...e over.
Feb 27 16:28:15 icinga2-satellite1.vagrant.demo.icinga.com systemd[1]: icinga2.service: main process exited, code=exited, status=1/FAILURE
Hint: Some lines were ellipsized, use -l to show in full.
[2018-02-27 16:28:15 +0100] warning/TlsStream: TLS stream was disconnected.
[2018-02-27 16:28:15 +0100] warning/JsonRpcConnection: API client disconnected for identity 'icinga2-master1.vagrant.demo.icinga.com'
[2018-02-27 16:28:15 +0100] warning/ApiListener: Removing API client for endpoint 'icinga2-master1.vagrant.demo.icinga.com'. 0 API clients left.
[2018-02-27 16:28:15 +0100] information/ApiListener: New client connection for identity 'icinga2-master1.vagrant.demo.icinga.com' from [192.168.33.101]:34026
[2018-02-27 16:28:15 +0100] information/ApiListener: Requesting new certificate for this Icinga instance from endpoint 'icinga2-master1.vagrant.demo.icinga.com'.
[2018-02-27 16:28:15 +0100] information/ApiListener: Sending config updates for endpoint 'icinga2-master1.vagrant.demo.icinga.com' in zone 'master'.
[2018-02-27 16:28:15 +0100] information/ApiListener: Finished sending config file updates for endpoint 'icinga2-master1.vagrant.demo.icinga.com' in zone 'master'.
[2018-02-27 16:28:15 +0100] information/ApiListener: Syncing runtime objects to endpoint 'icinga2-master1.vagrant.demo.icinga.com'.
[2018-02-27 16:28:15 +0100] information/ApiListener: Finished syncing runtime objects to endpoint 'icinga2-master1.vagrant.demo.icinga.com'.
[2018-02-27 16:28:15 +0100] information/ApiListener: Finished sending runtime config updates for endpoint 'icinga2-master1.vagrant.demo.icinga.com' in zone 'master'.
[2018-02-27 16:28:15 +0100] information/ApiListener: Sending replay log for endpoint 'icinga2-master1.vagrant.demo.icinga.com' in zone 'master'.
[2018-02-27 16:28:15 +0100] information/ApiListener: Replayed 4 messages.
[2018-02-27 16:28:15 +0100] information/ApiListener: Finished sending replay log for endpoint 'icinga2-master1.vagrant.demo.icinga.com' in zone 'master'.
[2018-02-27 16:28:15 +0100] information/ApiListener: Finished syncing endpoint 'icinga2-master1.vagrant.demo.icinga.com' in zone 'master'.
[2018-02-27 16:28:15 +0100] information/ApiListener: Applying config update from endpoint 'icinga2-master1.vagrant.demo.icinga.com' of zone 'master'.
[2018-02-27 16:28:15 +0100] information/ApiListener: Updating configuration file: /var/lib/icinga2/api/zones/satellite//.timestamp
[2018-02-27 16:28:15 +0100] information/ApiListener: Updating configuration file: /var/lib/icinga2/api/zones/satellite//_etc/hosts.conf
[2018-02-27 16:28:15 +0100] information/ApiListener: Applying configuration file update for path '/var/lib/icinga2/api/zones/satellite' (2730 Bytes). Received timestamp '2018-02-27 16:28:15 +0100' (1519745295.591368), Current timestamp '2018-02-27 16:14:02 +0100' (1519744442.402212).
[2018-02-27 16:28:15 +0100] information/ApiListener: Restarting after configuration change.
[2018-02-27 16:28:15 +0100] information/Application: Got reload command: Starting new instance.
[2018-02-27 16:28:15 +0100] information/Application: Reload requested, letting new process take over.
[2018-02-27 16:28:15 +0100] information/NotificationComponent: 'notification' started.
[2018-02-27 16:28:15 +0100] information/ApiListener: 'api' started.
[2018-02-27 16:28:15 +0100] information/ApiListener: Adding new listener on port '5665'
[2018-02-27 16:28:15 +0100] critical/TcpSocket: Invalid socket: Address already in use
Context:
	(0) Activating object 'api' of type 'ApiListener'

[2018-02-27 16:28:15 +0100] critical/ApiListener: Cannot bind TCP socket for host '' on port '5665'.
Context:
	(0) Activating object 'api' of type 'ApiListener'

[2018-02-27 16:28:15 +0100] critical/ApiListener: Cannot add listener on host '' for port '5665'.
Context:
	(0) Activating object 'api' of type 'ApiListener'

^C
[root@icinga2-satellite1 ~]# cat /var/run/icinga2/icinga2.pid
10025
[root@icinga2-satellite1 ~]# ps aux | grep icinga2
root     12267  0.0  0.0 112660   976 pts/0    S+   16:46   0:00 grep --color=auto icinga2

Possible Solution

Investigate on why Systemd thinks that the process failed with exit code 1. It seems that the FDs are not properly closed on reload, which causes the child process not to entirely take over.

Seems to come from the change with the reload in #5996

commit c418a9611e82dd6694f09b46a88ae478fab3c161
Author: Jean Flach <jean-marcel.flach@icinga.com>
Date:   Wed Jan 17 13:52:23 2018 +0100

    Add systemd watchdog and adjust reload behaviour

Steps to Reproduce (for bugs)

git clone https;//github.com/icinga/icinga-vagrant
cd icinga-vagrant/distributed
vagrant up

Terminal 2

vagrant ssh icinga2-satellite1
sudo -i
tail -f /var/log/icinga2/icinga2.log

systemctl status icinga2
ps aux | grep icinga2

Terminal 1

vagrant ssh icinga2-master1
sudo -i
vim /etc/icinga2/zones.d/hosts.conf

<add something in the object, save and exit>

systemctl restart icinga2

Context

Testing the latest development snapshot.

Your Environment

  • Version used (icinga2 --version):
[root@icinga2-satellite1 ~]# icinga2 --version
icinga2 - The Icinga 2 network monitoring daemon (version: v2.8.1-488-g98bcca5)

Copyright (c) 2012-2018 Icinga Development Team (https://www.icinga.com/)
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl2.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Application information:
  Installation root: /usr
  Sysconf directory: /etc
  Run directory: /run
  Local state directory: /var
  Package data directory: /usr/share/icinga2
  State path: /var/lib/icinga2/icinga2.state
  Modified attributes path: /var/lib/icinga2/modified-attributes.conf
  Objects path: /var/cache/icinga2/icinga2.debug
  Vars path: /var/cache/icinga2/icinga2.vars
  PID path: /run/icinga2/icinga2.pid

System information:
  Platform: CentOS Linux
  Platform version: 7 (Core)
  Kernel: Linux
  Kernel version: 3.10.0-693.17.1.el7.x86_64
  Architecture: x86_64

Build information:
  Compiler: GNU 4.8.5
  Build host: unknown
  • Operating System and version: CentOS 7.4
  • Enabled features (icinga2 feature list):
[root@icinga2-satellite1 ~]# icinga2 feature list
Disabled features: command compatlog debuglog elasticsearch gelf graphite influxdb livestatus opentsdb perfdata statusdata syslog
Enabled features: api checker ido-mysql mainlog notification
  • Config validation (icinga2 daemon -C):
  • If you run multiple Icinga 2 instances, the zones.conf file (or icinga2 object list --type Endpoint and icinga2 object list --type Zone) from all affected nodes.
[root@icinga2-satellite1 ~]# cat /etc/icinga2/zones.conf
# This file is managed by Puppet. DO NOT EDIT.

object Endpoint "icinga2-master1.vagrant.demo.icinga.com"  {
}

object Endpoint "icinga2-satellite1.vagrant.demo.icinga.com"  {
  host = "192.168.33.102"
}

object Zone "global-templates"  {
  global = true
}

object Zone "master"  {
  endpoints = [ "icinga2-master1.vagrant.demo.icinga.com", ]
}

object Zone "satellite"  {
  endpoints = [ "icinga2-satellite1.vagrant.demo.icinga.com", ]
  parent = "master"
}
@dnsmichi dnsmichi added bug Something isn't working area/distributed Distributed monitoring (master, satellites, clients) labels Feb 27, 2018
@dnsmichi dnsmichi added this to the 2.9.0 milestone Feb 27, 2018
@dnsmichi
Copy link
Contributor Author

This clearly is a problem how Systemd handles the reload. If I start icinga2 on the CLI, the reload works like a charm.

Might be related to #6082 but in this specific case, the PID file is not removed.

[2018-02-27 17:04:23 +0100] notice/JsonRpcConnection: Received 'event::SetNextCheck' message from 'icinga2-master1.vagrant.demo.icinga.com'
[2018-02-27 17:04:23 +0100] notice/JsonRpcConnection: Received 'event::SetNextCheck' message from 'icinga2-master1.vagrant.demo.icinga.com'
[2018-02-27 17:04:23 +0100] notice/JsonRpcConnection: Received 'event::SetNextCheck' message from 'icinga2-master1.vagrant.demo.icinga.com'
[2018-02-27 17:04:23 +0100] notice/ApiListener: Relaying 'event::SetNextCheck' message
[2018-02-27 17:04:23 +0100] notice/ApiListener: Relaying 'event::SetNextCheck' message
[2018-02-27 17:04:23 +0100] notice/ApiListener: Relaying 'event::SetNextCheck' message
[2018-02-27 17:04:25 +0100] information/WorkQueue: #9 (JsonRpcConnection, #0) items: 0, rate: 0.0666667/s (4/min 4/5min 4/15min);
[2018-02-27 17:04:25 +0100] information/WorkQueue: #10 (JsonRpcConnection, #1) items: 0, rate: 2.26667/s (136/min 136/5min 136/15min);
[2018-02-27 17:04:26 +0100] information/Application: Got reload command: Starting new instance.
[2018-02-27 17:04:26 +0100] notice/Process: Running command '/usr/lib64/icinga2/sbin/icinga2' '--no-stack-rlimit' 'daemon' '-x' 'debug' '--reload-internal' '13353': PID 13425
[2018-02-27 17:04:26 +0100] information/Application: Reload requested, letting new process take over.

[2018-02-27 17:04:26 +0100] information/ApiListener: 'api' started.
[2018-02-27 17:04:26 +0100] information/ApiListener: Adding new listener on port '5665'
[2018-02-27 17:04:26 +0100] information/ConfigItem: Activated all objects.
[2018-02-27 17:04:36 +0100] information/WorkQueue: #5 (ApiListener, RelayQueue) items: 0, rate: 3.63333/s (218/min 218/5min 218/15min);
[2018-02-27 17:04:36 +0100] information/WorkQueue: #6 (ApiListener, SyncQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2018-02-27 17:05:23 +0100] information/ApiListener: New client connection for identity 'icinga2-master1.vagrant.demo.icinga.com' from [192.168.33.101]:34106
[2018-02-27 17:05:23 +0100] information/ApiListener: Requesting new certificate for this Icinga instance from endpoint 'icinga2-master1.vagrant.demo.icinga.com'.
[2018-02-27 17:05:23 +0100] information/ApiListener: Sending config updates for endpoint 'icinga2-master1.vagrant.demo.icinga.com' in zone 'master'.
[2018-02-27 17:05:23 +0100] information/ApiListener: Finished sending config file updates for endpoint 'icinga2-master1.vagrant.demo.icinga.com' in zone 'master'.
[2018-02-27 17:05:23 +0100] information/ApiListener: Syncing runtime objects to endpoint 'icinga2-master1.vagrant.demo.icinga.com'.
[2018-02-27 17:05:23 +0100] information/ApiListener: Finished syncing runtime objects to endpoint 'icinga2-master1.vagrant.demo.icinga.com'.
[2018-02-27 17:05:23 +0100] information/ApiListener: Finished sending runtime config updates for endpoint 'icinga2-master1.vagrant.demo.icinga.com' in zone 'master'.
[2018-02-27 17:05:23 +0100] information/ApiListener: Sending replay log for endpoint 'icinga2-master1.vagrant.demo.icinga.com' in zone 'master'.
[2018-02-27 17:05:23 +0100] information/ApiListener: Applying config update from endpoint 'icinga2-master1.vagrant.demo.icinga.com' of zone 'master'.
[2018-02-27 17:05:23 +0100] information/ApiListener: Replayed 350 messages.
[2018-02-27 17:05:23 +0100] information/ApiListener: Finished sending replay log for endpoint 'icinga2-master1.vagrant.demo.icinga.com' in zone 'master'.
[2018-02-27 17:05:23 +0100] information/ApiListener: Finished syncing endpoint 'icinga2-master1.vagrant.demo.icinga.com' in zone 'master'.
[2018-02-27 17:05:33 +0100] information/WorkQueue: #10 (JsonRpcConnection, #1) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2018-02-27 17:05:33 +0100] information/WorkQueue: #9 (JsonRpcConnection, #0) items: 0, rate: 0.0833333/s (5/min 5/5min 5/15min);

@dnsmichi
Copy link
Contributor Author

Investigation from journalctl -o verbose -u icinga2 - could be a problem with a hanging parent process which cannot be killed.

It started to work after a while of testing.


    MESSAGE=[2018-02-27 16:18:04 +0100] information/Application: Reload requested, letting new process take over.
Tue 2018-02-27 16:21:29.880592 CET [s=fafabc92b70f4f779cd33a5b754d73c2;i=78f;b=e284ec56b5e84917aef8ef94fdac68b8;m=17f89704;t=566332e4f829d;x=b47e2251fade33f9]
    _UID=0
    _GID=0
    _BOOT_ID=e284ec56b5e84917aef8ef94fdac68b8
    _MACHINE_ID=fa29fc24eae8bf48993c99513dff435c
    PRIORITY=4
    SYSLOG_FACILITY=3
    SYSLOG_IDENTIFIER=systemd
    _TRANSPORT=journal
    _PID=1
    _COMM=systemd
    _EXE=/usr/lib/systemd/systemd
    _CAP_EFFECTIVE=1fffffffff
    _SYSTEMD_CGROUP=/
    _CMDLINE=/usr/lib/systemd/systemd --switched-root --system --deserialize 21
    _SELINUX_CONTEXT=system_u:system_r:init_t:s0
    CODE_FILE=src/core/service.c
    _HOSTNAME=icinga2-satellite1.vagrant.demo.icinga.com
    UNIT=icinga2.service
    CODE_LINE=2709
    CODE_FUNCTION=service_dispatch_timer
    MESSAGE=icinga2.service stop-sigterm timed out. Killing.
    _SOURCE_REALTIME_TIMESTAMP=1519744889880592

Tue 2018-02-27 16:22:59.897074 CET [s=fafabc92b70f4f779cd33a5b754d73c2;i=790;b=e284ec56b5e84917aef8ef94fdac68b8;m=1d562215;t=5663333ad0daf;x=3d627c67f04a9fc0]
    _UID=0
    _GID=0
    _BOOT_ID=e284ec56b5e84917aef8ef94fdac68b8
    _MACHINE_ID=fa29fc24eae8bf48993c99513dff435c
    PRIORITY=4
    SYSLOG_FACILITY=3
    SYSLOG_IDENTIFIER=systemd
    _TRANSPORT=journal
    _PID=1
    _COMM=systemd
    _EXE=/usr/lib/systemd/systemd
    _CAP_EFFECTIVE=1fffffffff
    _SYSTEMD_CGROUP=/
    _CMDLINE=/usr/lib/systemd/systemd --switched-root --system --deserialize 21
    _SELINUX_CONTEXT=system_u:system_r:init_t:s0
    CODE_FILE=src/core/service.c
    _HOSTNAME=icinga2-satellite1.vagrant.demo.icinga.com
    UNIT=icinga2.service
    CODE_FUNCTION=service_dispatch_timer
    CODE_LINE=2723
    MESSAGE=icinga2.service still around after SIGKILL. Ignoring.
    _SOURCE_REALTIME_TIMESTAMP=1519744979897074

Tue 2018-02-27 16:24:30.147378 CET [s=fafabc92b70f4f779cd33a5b754d73c2;i=791;b=e284ec56b5e84917aef8ef94fdac68b8;m=22b73e8e;t=56633390e2a27;x=9daece086e85e97c]
    _UID=0
    _GID=0
    _BOOT_ID=e284ec56b5e84917aef8ef94fdac68b8
    _MACHINE_ID=fa29fc24eae8bf48993c99513dff435c
    PRIORITY=4
    SYSLOG_FACILITY=3
    SYSLOG_IDENTIFIER=systemd
    _TRANSPORT=journal
    _PID=1
    _COMM=systemd
    _EXE=/usr/lib/systemd/systemd
    _CAP_EFFECTIVE=1fffffffff
    _SYSTEMD_CGROUP=/
    _CMDLINE=/usr/lib/systemd/systemd --switched-root --system --deserialize 21
    _SELINUX_CONTEXT=system_u:system_r:init_t:s0
    CODE_FILE=src/core/service.c
    _HOSTNAME=icinga2-satellite1.vagrant.demo.icinga.com
    UNIT=icinga2.service
    CODE_FUNCTION=service_dispatch_timer
    CODE_LINE=2734
    MESSAGE=icinga2.service stop-final-sigterm timed out. Killing.
    _SOURCE_REALTIME_TIMESTAMP=1519745070147378

Tue 2018-02-27 16:26:00.397826 CET [s=fafabc92b70f4f779cd33a5b754d73c2;i=792;b=e284ec56b5e84917aef8ef94fdac68b8;m=28185c31;t=566333e6f47ca;x=945b4437f7d653ff]
    _UID=0
    _GID=0
    _BOOT_ID=e284ec56b5e84917aef8ef94fdac68b8
    _MACHINE_ID=fa29fc24eae8bf48993c99513dff435c
    PRIORITY=4
    SYSLOG_FACILITY=3
    SYSLOG_IDENTIFIER=systemd
    _TRANSPORT=journal
    _PID=1
    _COMM=systemd
    _EXE=/usr/lib/systemd/systemd
    _CAP_EFFECTIVE=1fffffffff
    _SYSTEMD_CGROUP=/
    _CMDLINE=/usr/lib/systemd/systemd --switched-root --system --deserialize 21
    _SELINUX_CONTEXT=system_u:system_r:init_t:s0
    CODE_FILE=src/core/service.c
    _HOSTNAME=icinga2-satellite1.vagrant.demo.icinga.com
    UNIT=icinga2.service
    CODE_FUNCTION=service_dispatch_timer
    CODE_LINE=2744
    MESSAGE=icinga2.service still around after final SIGKILL. Entering failed mode.
    _SOURCE_REALTIME_TIMESTAMP=1519745160397826

Tue 2018-02-27 16:26:00.398392 CET [s=fafabc92b70f4f779cd33a5b754d73c2;i=793;b=e284ec56b5e84917aef8ef94fdac68b8;m=28185dfa;t=566333e6f4993;x=5e0f657b2d70e440]
    _UID=0
    _GID=0
    _BOOT_ID=e284ec56b5e84917aef8ef94fdac68b8
    _MACHINE_ID=fa29fc24eae8bf48993c99513dff435c
    PRIORITY=5
    SYSLOG_FACILITY=3
    SYSLOG_IDENTIFIER=systemd
    _TRANSPORT=journal
    _PID=1
    _COMM=systemd
    _EXE=/usr/lib/systemd/systemd
    _CAP_EFFECTIVE=1fffffffff
    _SYSTEMD_CGROUP=/
    CODE_FILE=src/core/unit.c
    _CMDLINE=/usr/lib/systemd/systemd --switched-root --system --deserialize 21
    _SELINUX_CONTEXT=system_u:system_r:init_t:s0
    CODE_LINE=1926
    CODE_FUNCTION=unit_notify
    _HOSTNAME=icinga2-satellite1.vagrant.demo.icinga.com
    UNIT=icinga2.service
    MESSAGE=Unit icinga2.service entered failed state.
    _SOURCE_REALTIME_TIMESTAMP=1519745160398392

Tue 2018-02-27 16:26:00.398481 CET [s=fafabc92b70f4f779cd33a5b754d73c2;i=794;b=e284ec56b5e84917aef8ef94fdac68b8;m=281861c0;t=566333e6f4d59;x=c59c00ef7b9b8c4f]
    _UID=0
    _GID=0
    _BOOT_ID=e284ec56b5e84917aef8ef94fdac68b8
    _MACHINE_ID=fa29fc24eae8bf48993c99513dff435c
    PRIORITY=4
    SYSLOG_FACILITY=3
    SYSLOG_IDENTIFIER=systemd
    _TRANSPORT=journal
    _PID=1
    _COMM=systemd
    _EXE=/usr/lib/systemd/systemd
    _CAP_EFFECTIVE=1fffffffff
    _SYSTEMD_CGROUP=/
    _CMDLINE=/usr/lib/systemd/systemd --switched-root --system --deserialize 21
    _SELINUX_CONTEXT=system_u:system_r:init_t:s0
    CODE_FILE=src/core/service.c
    CODE_LINE=1282
    CODE_FUNCTION=service_enter_dead
    _HOSTNAME=icinga2-satellite1.vagrant.demo.icinga.com
    UNIT=icinga2.service
    MESSAGE=icinga2.service failed.
    _SOURCE_REALTIME_TIMESTAMP=1519745160398481


@dnsmichi
Copy link
Contributor Author

I need to investigate further on this, please add your feedback.

@dnsmichi dnsmichi removed this from the 2.9.0 milestone Feb 27, 2018
@dnsmichi dnsmichi self-assigned this Feb 28, 2018
@dnsmichi dnsmichi added the needs feedback We'll only proceed once we hear from you again label Feb 28, 2018
@Crunsher
Copy link
Contributor

Crunsher commented Mar 1, 2018

Does this happen when running the safe-reload script?

@dnsmichi
Copy link
Contributor Author

dnsmichi commented Mar 1, 2018

This is what Systemd invokes I believe ... I don't understand how the current reload was changed, afaik you're now using Systemd signals to tell Systemd to fire a reload, which in turn calls safe-reload right?

@Crunsher
Copy link
Contributor

Crunsher commented Mar 1, 2018

There are two changes in this:

  1. We now use sd_notify to notify Systemd when we start a reload and when we finish a reload.
  2. We now send SIGUSR2 to the old process from the new one, the old one then stops itself instead of just sending kill.

But none of these changes affect the init scripts (except offering to use WatchdogSec=).

@lazyfrosch
Copy link
Contributor

Same issue as #6082

@dnsmichi
Copy link
Contributor Author

I would need custom RPM artifacts from that branch put into 2 VMs for proper tests, that's time I cannot afford in the coming weeks. I'll test #6163 once it is merged, sorry.

@dnsmichi dnsmichi removed the needs feedback We'll only proceed once we hear from you again label May 9, 2018
@Al2Klimov Al2Klimov added this to the 2.9.0 milestone Sep 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/distributed Distributed monitoring (master, satellites, clients) bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants