-
Notifications
You must be signed in to change notification settings - Fork 95
systemd: Failed to start Alertmanager (when Prometheus is not running yet) #74
Comments
Sorry for my lack of reply. We try not to have any dependencies between roles and we cannot ensure there is As for the second part ( |
Fair point. We're actually already using a drop-ins for that purpose: # systemctl status alertmanager.service
● alertmanager.service - Prometheus Alertmanager
Loaded: loaded (/etc/systemd/system/alertmanager.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/alertmanager.service.d
└─after-prometheus.conf
└─slowdown-restarts.conf
[...] with
Maybe this could be added to the
Great! |
I also looked at your error msgs and it seems quite strange that your alertmanager requires prometheus server to start. This is not a usual case for alertmanager as it should be able to operate without any prometheus server (communication is unidirectional and alertmanager is on receiving end). From what I see you are having problems because of networking problem as can be seen in logs:
Are you using alertmanager in HA mode with gossip network? Because it looks like alertmanager cannot start because gossip network address specified with
No, there is no requirement of having prometheus running to use alertmanager. Of course it doesn't make much sense, but it is not a hard requirement. |
This error:
is something you get when one of alertmanager requirements is not met. As alertmanager docs says:
Source: https://github.com/prometheus/alertmanager#high-availability |
I'm not aware we're using alertmanager in HA mode. From what I can gather from our Gitlab instance (I don't have the repo checked out right now), these are the only alertmanager_version: 0.18.0
alertmanager_receivers: [] # multiple configures
alertmanager_route: {} # configured
alertmanager_child_routes: [] # routes omitted
alertmanager_listen_address: "127.0.0.1:9093"
alertmanager_external_url: "https://alertmanager.example.com" Maybe of note: we've submoduled 50d90b5. I'll have a more detailed look tomorrow when I'm back in the office. |
I just saw that we have backwards-compatibility layer in place so those variables are translated to newer versions. Nevertheless you should consider updating them.
I just discovered we didn't update that part of our docs for over a year 🤦♂️ |
I've checked and we don't run alertmanager in HA mode. However, the alertmanager docs state (emphasis mine):
combined with
Note the absence of a This is just a bad default on alertmanager's part. I'll check whether alertmanager_cluster:
listen-address: "" shuts that off.
😀 |
Yeah, |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
The Alertmanager service fails to start, when Prometheus has not started yet. We observer this mainly after a machine reboot:
# journalctl -u alertmanager.service --boot
We're running Prometheus and Alertmanager on the same host (deployed using your Ansible roles 👍), so waiting for Prometheus seems a good measure:
I realise this might need a new variable and a conditional for general usage (i.e. when both services run on different hosts). Alternatively (or additionaly), it might be also useful to add a delay between the retries to give Prometheus a fair chance to start (as you can see in the log above, the restart attempts all happened within 2s):
diff templates/alertmanager.service.j2 Restart=always +RestartSec=5s
The text was updated successfully, but these errors were encountered: