You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Simply rand node_exporter for an extended period of time
What did you expect to see?
No error logs
What did you see instead?
node_exporter[15746]: level=error ts=2020-11-25T13:12:02.291Z caller=collector.go:161 msg="collector failed" name=processes duration_seconds=0.027611184 err="unable to retrieve number of allocated threads: "read /proc/2054/stat: no such process""
Analysis
This is very closely related to #1043 : that change fixed processes disappearing between list the /proc directory and reading the actual process stats. But another race condition is possible: between opening the /proc/<process id>/stat file and actually reading it, another race condition can occur and the error code returned is different. Bellow is a small code snippet to reproduce that race condition.
The recommended fix is to modify getAllocatedThreads() in collector/processes_linux.go to continue after stat, err := pid.Stat() if the error meets this condition: strings.Contains(err.Error(),syscall.ESRCH.Error()).
package main
import (
"fmt"
"os"
"io"
"io/ioutil"
"syscall"
"strings"
"strconv"
"os/exec"
"log"
)
func main(){
const maxBufferSize = 1024 * 512
fmt.Printf("Starting process sleep\n")
cmd := exec.Command("sleep","1")
err := cmd.Start()
if(err != nil) {
log.Fatal(err)
}
procPath := "/proc/" + strconv.Itoa(cmd.Process.Pid) + "/stat"
fmt.Printf("Read stat for %s\n",procPath)
f, err := os.Open(procPath)
defer f.Close()
if(err != nil) {
log.Fatal(err)
}
cmd.Wait()
fmt.Printf("Sleep process existed, reading opened stat file\n")
reader := io.LimitReader(f, maxBufferSize)
_, err = ioutil.ReadAll(reader)
if err != nil {
if strings.Contains(err.Error(),syscall.ESRCH.Error()) {
fmt.Println("Got error no such process:", err)
} else {
fmt.Println("Read stat failed: ",err)
}
} else {
fmt.Println("No error reading stat")
}
}
The text was updated successfully, but these errors were encountered:
acastong
changed the title
Processes exporter logs no such process errors
Processes exporter logs error message: no such process
Nov 26, 2020
Host operating system: output of
uname -a
Linux 3.10.0-1160.6.1.el7.x86_64 #1 SMP Tue Nov 17 13:59:11 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
CentOS 7
node_exporter version: output of
node_exporter --version
node_exporter --version
node_exporter, version 1.0.1 (branch: HEAD, revision: 3715be6)
build user: root@1f76dbbcfa55
build date: 20200616-12:44:12
go version: go1.14.4
node_exporter command line flags
node_exporter --collector.processes --collector.qdisc --collector.systemd
Are you running node_exporter in Docker?
No
What did you do that produced an error?
Simply rand node_exporter for an extended period of time
What did you expect to see?
No error logs
What did you see instead?
node_exporter[15746]: level=error ts=2020-11-25T13:12:02.291Z caller=collector.go:161 msg="collector failed" name=processes duration_seconds=0.027611184 err="unable to retrieve number of allocated threads: "read /proc/2054/stat: no such process""
Analysis
This is very closely related to #1043 : that change fixed processes disappearing between list the /proc directory and reading the actual process stats. But another race condition is possible: between opening the
/proc/<process id>/stat
file and actually reading it, another race condition can occur and the error code returned is different. Bellow is a small code snippet to reproduce that race condition.The recommended fix is to modify
getAllocatedThreads()
in collector/processes_linux.go to continue afterstat, err := pid.Stat()
if the error meets this condition:strings.Contains(err.Error(),syscall.ESRCH.Error())
.The text was updated successfully, but these errors were encountered: