Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node exporter have high memory usage in some nodes #3080

Open
guymeron opened this issue Jul 30, 2024 · 4 comments
Open

Node exporter have high memory usage in some nodes #3080

guymeron opened this issue Jul 30, 2024 · 4 comments

Comments

@guymeron
Copy link

Host operating system: output of uname -a

Linux ip-XX-XX-XX-XX.ap-northeast-1.compute.internal 6.6.35-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.6.35-0gardenlinux1~bp1443 (2024- x86_64 GNU/Linux

node_exporter version: output of node_exporter --version

node_exporter, version 1.8.1 (branch: HEAD, revision: 400c3979931613db930ea035f39ce7b377cdbb5b)
  build user:       root@7afbff271a3f
  build date:       20240521-18:36:22
  go version:       go1.22.3
  platform:         linux/amd64
  tags:                unknown

node_exporter command line flags

Args:
  --path.procfs=/host/proc
  --path.sysfs=/host/sys
  --path.rootfs=/host/root
  --path.udev.data=/host/root/run/udev/data
  --web.listen-address=[$(HOST_IP)]:9100
  --collector.filesystem.mount-points-exclude=^/(dev|proc|sys|var/lib/docker/.+|var/lib/kubelet/.+)($|/)
  --collector.filesystem.fs-types-exclude=^(autofs|binfmt_misc|bpf|cgroup2?|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|iso9660|mqueue|nsfs|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|selinuxfs|squashfs|sysfs|tracefs)$

node_exporter log output

ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=netclass
ts=2024-07-24T10:09:35.864Z caller=node_exporter.go:118 level=info collector=thermal_zone
ts=2024-07-24T10:09:35.864Z caller=node_exporter.go:118 level=info collector=timex
ts=2024-07-25T02:15:04.662Z caller=collector.go:169 level=error msg="collector failed" name=netclass duration_seconds=0.455484629 err="could not get net class info: failed to read file \"/host/sys/class/net/califf257bc2536/ifalias\": open /host/sys/class/net/califf257bc2536/ifalias: no such device"
ts=2024-07-25T10:15:04.563Z caller=collector.go:169 level=error msg="collector failed" name=netclass duration_seconds=0.353718454 err="could not get net class info: failed to read file \"/host/sys/class/net/calie3498b6b174/carrier_changes\": no such device"
ts=2024-07-25T00:30:04.465Z caller=collector.go:169 level=error msg="collector failed" name=netclass duration_seconds=0.258062288 err="could not get net class info: failed to read file \"/host/sys/class/net/cali7b9ce739791/threaded\": no such device"
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=bonding
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=conntrack
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=edac
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=nfsd
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=softnet
ts=2024-07-27T14:59:04.366Z caller=collector.go:169 level=error msg="collector failed" name=netclass duration_seconds=0.143897883 err="could not get net class info: failed to read file \"/host/sys/class/net/cali25d3d338fcb/dev_port\": open /host/sys/class/net/cali25d3d338fcb/dev_port: no such device"
ts=2024-07-24T10:09:35.767Z caller=node_exporter.go:193 level=info msg="Starting node_exporter" version="(version=1.8.1, branch=HEAD, revision=400c3979931613db930ea035f39ce7b377cdbb5b)"
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=cpu
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=fibrechannel
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=loadavg
ts=2024-07-24T10:09:35.864Z caller=node_exporter.go:118 level=info collector=udp_queues
ts=2024-07-24T10:09:35.864Z caller=node_exporter.go:118 level=info collector=watchdog
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=hwmon
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=nfs
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=rapl
ts=2024-07-24T10:09:35.864Z caller=node_exporter.go:118 level=info collector=uname
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:111 level=info msg="Enabled collectors"
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=arp
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=btrfs
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=cpufreq
ts=2024-07-24T10:09:35.864Z caller=node_exporter.go:118 level=info collector=vmstat
ts=2024-07-24T10:09:35.864Z caller=tls_config.go:313 level=info msg="Listening on" address=[::]:9100
ts=2024-07-24T10:09:35.864Z caller=tls_config.go:316 level=info msg="TLS is disabled." http2=false address=[::]:9100
ts=2024-07-27T10:30:04.562Z caller=collector.go:169 level=error msg="collector failed" name=netclass duration_seconds=0.296004707 err="could not get net class info: failed to read file \"/host/sys/class/net/calic75c59c4569/carrier_up_count\": no such device"
ts=2024-07-28T22:12:04.379Z caller=collector.go:169 level=error msg="collector failed" name=netclass duration_seconds=0.113815084 err="could not get net class info: failed to read file \"/host/sys/class/net/cali54e5a64fb2f/testing\": open /host/sys/class/net/cali54e5a64fb2f/testing: no such device"
ts=2024-07-28T23:51:04.562Z caller=collector.go:169 level=error msg="collector failed" name=netclass duration_seconds=0.340009443 err="could not get net class info: failed to read file \"/host/sys/class/net/cali9bb535b8912/ifindex\": no such device"
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=diskstats
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=infiniband
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=ipvs
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=nvme
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=schedstat
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=selinux
ts=2024-07-24T10:09:35.864Z caller=node_exporter.go:118 level=info collector=zfs
ts=2024-07-24T10:09:35.863Z caller=filesystem_common.go:113 level=info collector=filesystem msg="Parsed flag --collector.filesystem.fs-types-exclude" flag=^(autofs|binfmt_misc|bpf|cgroup2?|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|iso9660|mqueue|nsfs|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|selinuxfs|squashfs|sysfs|tracefs)$
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=netstat
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=os
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=sockstat
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=stat
ts=2024-07-24T10:09:35.864Z caller=node_exporter.go:118 level=info collector=textfile
ts=2024-07-24T10:09:35.767Z caller=node_exporter.go:194 level=info msg="Build context" build_context="(go=go1.22.3, platform=linux/amd64, user=root@7afbff271a3f, date=20240521-18:36:22, tags=unknown)"
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=bcache
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=dmi
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=filefd
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=mdadm
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=pressure
ts=2024-07-24T10:09:35.864Z caller=node_exporter.go:118 level=info collector=time
ts=2024-07-24T10:09:35.864Z caller=node_exporter.go:118 level=info collector=xfs
ts=2024-07-24T10:09:35.767Z caller=diskstats_common.go:111 level=info collector=diskstats msg="Parsed flag --collector.diskstats.device-exclude" flag=^(z?ram|loop|fd|(h|s|v|xv)d[a-z]|nvme\d+n\d+p)\d+$
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=entropy
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=filesystem
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=meminfo
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=powersupplyclass
ts=2024-07-24T10:09:35.864Z caller=node_exporter.go:118 level=info collector=tapestats
ts=2024-07-24T10:09:35.863Z caller=filesystem_common.go:111 level=info collector=filesystem msg="Parsed flag --collector.filesystem.mount-points-exclude" flag=^/(dev|proc|sys|var/lib/docker/.+|var/lib/kubelet/.+)($|/)
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=netdev

Are you running node_exporter in Docker?

k8s image

What did you do that produced an error?

nothing... the error seems to always appear on 1 of the k8s node

What did you expect to see?

standard memory usage

What did you see instead?

The memory increases over time

memory usage graph:

image

pprof file

node_exporter

@SuperQ
Copy link
Member

SuperQ commented Jul 30, 2024

Please provide a heap profile, not a cpu profile. Also when including metrics, state what metric name is being used.

You can post pprof data to https://pprof.me.

@guymeron
Copy link
Author

Thanks @SuperQ,

I used container_memory_working_set_bytes metric in the memory usage graph.

This is the output of go tool pprof http://localhost:9100/debug/pprof/heap
https://pprof.me/fa76c1bf5081901359f550387487e53b/?profileType=profile%3Aalloc_objects%3Acount%3Aspace%3Abytes

@SuperQ
Copy link
Member

SuperQ commented Jul 30, 2024

container_memory_working_set_bytes is a misleading metric because it includes cache memory that is not part of the process. You want container_memory_rss. This will show you closer to the real use.

The pprof provided shows less than 1MiB memory use.

@SuperQ
Copy link
Member

SuperQ commented Jul 30, 2024

Duplicate of #2726

@SuperQ SuperQ marked this as a duplicate of #2726 Jul 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants