Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

CPU usage in 192.168.131.186:9100 is above 98% #5667

Open
guningbo opened this issue Nov 29, 2021 · 3 comments
Open

CPU usage in 192.168.131.186:9100 is above 98% #5667

guningbo opened this issue Nov 29, 2021 · 3 comments

Comments

@guningbo
Copy link
Contributor

这两天收到邮件提醒集群中有个节点CPU占满,并提醒该节点k8sDockerDaemonNotOk
image
image

进入该节点后,使用top命令发现有很多lsof命令在占用cpu
image

使用ps -axjf 发现好像是containerd-shim一直在创建lsof进程。

image

请问这个问题该如何解决
Organization Name:

Short summary about the issue/question:

Brief what process you are following:

How to reproduce it:

OpenPAI Environment:

  • OpenPAI version:
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Hardware (e.g. core number, memory size, storage size, GPU type etc.):
  • Others:

Anything else we need to know:

@guningbo
Copy link
Contributor Author

查看job-exporter docker日志发现一直在报lsof 超时警告
image

@hzy46
Copy link
Contributor

hzy46 commented Dec 1, 2021

Seems related to lsof in job-exporter. @Binyang2014 Do you have any idea?

@Binyang2014
Copy link
Contributor

Will restart the node solve this problem?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants