node_exporter is causing high LA on remote shares failing #1259

kworr · 2019-02-11T15:12:44Z

Hello.

Recently we experienced a local network glitch when some NFS remote shares become unavailable. When that happened node_exporter started creating extra threads and host load was raised up to ~1000. The host was still available, no huge latency was observed, though in logs system claimed node_exported was out of file sockets (defaults to 1000). Our OS is RHEL7.

During the event ls /failed/mount or df -h /failed/mount was just stuck indefinitely returning nothing.

It looks like on each scrape new number of threads got started, querying filesystems again and getting stuck again. Can we have some mutex on a per-mount basis that will prevent creating another check for that FS and just will report empty data? This will help to evade bad issues like:

Indefinite wait on NFS mounts when the remote host is not available.
Stuck syscall on fusefs mounts (like SSHFS) when userland process already died but in-kernel part of the mount is stuck and can't be used until cleaned.

Thanks in advance.

SuperQ · 2019-02-12T09:07:26Z

This is why load average is a bad metric. 😄

We fixed this in 0.17.0. There is now a mutex that watches for stuck mounts.

EDIT: I also recommend filtering out NFS and other non-local filesystems in your configuration. These are better monitored from their server side, rather than the client filesystem view.

discordianfish · 2019-02-26T22:45:58Z

@SuperQ Should this been fixed with #1166? That should effectively mitigate this.

SuperQ · 2019-02-26T23:08:46Z

Yes, that should also help.

SuperQ closed this as completed Feb 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

node_exporter is causing high LA on remote shares failing #1259

node_exporter is causing high LA on remote shares failing #1259

kworr commented Feb 11, 2019

SuperQ commented Feb 12, 2019 •

edited

Loading

discordianfish commented Feb 26, 2019

SuperQ commented Feb 26, 2019

node_exporter is causing high LA on remote shares failing #1259

node_exporter is causing high LA on remote shares failing #1259

Comments

kworr commented Feb 11, 2019

SuperQ commented Feb 12, 2019 • edited Loading

discordianfish commented Feb 26, 2019

SuperQ commented Feb 26, 2019

SuperQ commented Feb 12, 2019 •

edited

Loading