Skip to content

Commit

Permalink
Scrape CPU latency stats from /proc/schedstat (#1389)
Browse files Browse the repository at this point in the history
These are useful as a direct indication of CPU contention and task
scheduler latency.

Handy references:
 - https://github.com/torvalds/linux/blob/master/Documentation/scheduler/sched-stats.txt
 - https://doc.opensuse.org/documentation/leap/tuning/html/book.sle.tuning/cha.tuning.taskscheduler.html

procfs is updated to pull in the enabling change:
prometheus/procfs#186

Signed-off-by: Phil Frost <phil@postmates.com>
  • Loading branch information
bitglue authored and SuperQ committed Jul 10, 2019
1 parent 777b751 commit f693a71
Show file tree
Hide file tree
Showing 13 changed files with 297 additions and 5 deletions.
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@

* [CHANGE] Add `--collector.netdev.device-whitelist`. #1279
* [CHANGE] Refactor mdadm collector #1403
* [FEATURE]
* [FEATURE] Add new schedstat collector #1389
* [ENHANCEMENT]
* [BUGFIX] Renamed label `state` to `name` on `node_systemd_service_restart_total`. #1393

Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ netstat | Exposes network statistics from `/proc/net/netstat`. This is the same
nfs | Exposes NFS client statistics from `/proc/net/rpc/nfs`. This is the same information as `nfsstat -c`. | Linux
nfsd | Exposes NFS kernel server statistics from `/proc/net/rpc/nfsd`. This is the same information as `nfsstat -s`. | Linux
pressure | Exposes pressure stall statistics from `/proc/pressure/`. | Linux (kernel 4.20+ and/or [CONFIG\_PSI](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/accounting/psi.txt))
schedstat | Exposes task scheduler statistics from `/proc/schedstat`. | Linux
sockstat | Exposes various statistics from `/proc/net/sockstat`. | Linux
stat | Exposes various statistics from `/proc/stat`. This includes boot time, forks and interrupts. | Linux
textfile | Exposes statistics read from local disk. The `--collector.textfile.directory` flag must be set. | _any_
Expand Down
13 changes: 13 additions & 0 deletions collector/fixtures/e2e-64k-page-output.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2438,6 +2438,18 @@ node_qdisc_packets_total{device="wlan0",kind="fq"} 42
# TYPE node_qdisc_requeues_total counter
node_qdisc_requeues_total{device="eth0",kind="pfifo_fast"} 2
node_qdisc_requeues_total{device="wlan0",kind="fq"} 1
# HELP node_schedstat_running_seconds_total Number of seconds CPU spent running a process.
# TYPE node_schedstat_running_seconds_total counter
node_schedstat_running_seconds_total{cpu="0"} 2.045936778163039e+13
node_schedstat_running_seconds_total{cpu="1"} 1.904686152592476e+13
# HELP node_schedstat_timeslices_total Number of timeslices executed by CPU.
# TYPE node_schedstat_timeslices_total counter
node_schedstat_timeslices_total{cpu="0"} 4.767485306e+09
node_schedstat_timeslices_total{cpu="1"} 5.145567945e+09
# HELP node_schedstat_waiting_seconds_total Number of seconds spent by processing waiting for this CPU.
# TYPE node_schedstat_waiting_seconds_total counter
node_schedstat_waiting_seconds_total{cpu="0"} 3.43796328169361e+12
node_schedstat_waiting_seconds_total{cpu="1"} 3.64107263788241e+12
# HELP node_scrape_collector_duration_seconds node_exporter: Duration of a collector scrape.
# TYPE node_scrape_collector_duration_seconds gauge
# HELP node_scrape_collector_success node_exporter: Whether a collector succeeded.
Expand Down Expand Up @@ -2472,6 +2484,7 @@ node_scrape_collector_success{collector="nfsd"} 1
node_scrape_collector_success{collector="pressure"} 1
node_scrape_collector_success{collector="processes"} 1
node_scrape_collector_success{collector="qdisc"} 1
node_scrape_collector_success{collector="schedstat"} 1
node_scrape_collector_success{collector="sockstat"} 1
node_scrape_collector_success{collector="stat"} 1
node_scrape_collector_success{collector="textfile"} 1
Expand Down
13 changes: 13 additions & 0 deletions collector/fixtures/e2e-output.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2438,6 +2438,18 @@ node_qdisc_packets_total{device="wlan0",kind="fq"} 42
# TYPE node_qdisc_requeues_total counter
node_qdisc_requeues_total{device="eth0",kind="pfifo_fast"} 2
node_qdisc_requeues_total{device="wlan0",kind="fq"} 1
# HELP node_schedstat_running_seconds_total Number of seconds CPU spent running a process.
# TYPE node_schedstat_running_seconds_total counter
node_schedstat_running_seconds_total{cpu="0"} 2.045936778163039e+13
node_schedstat_running_seconds_total{cpu="1"} 1.904686152592476e+13
# HELP node_schedstat_timeslices_total Number of timeslices executed by CPU.
# TYPE node_schedstat_timeslices_total counter
node_schedstat_timeslices_total{cpu="0"} 4.767485306e+09
node_schedstat_timeslices_total{cpu="1"} 5.145567945e+09
# HELP node_schedstat_waiting_seconds_total Number of seconds spent by processing waiting for this CPU.
# TYPE node_schedstat_waiting_seconds_total counter
node_schedstat_waiting_seconds_total{cpu="0"} 3.43796328169361e+12
node_schedstat_waiting_seconds_total{cpu="1"} 3.64107263788241e+12
# HELP node_scrape_collector_duration_seconds node_exporter: Duration of a collector scrape.
# TYPE node_scrape_collector_duration_seconds gauge
# HELP node_scrape_collector_success node_exporter: Whether a collector succeeded.
Expand Down Expand Up @@ -2472,6 +2484,7 @@ node_scrape_collector_success{collector="nfsd"} 1
node_scrape_collector_success{collector="pressure"} 1
node_scrape_collector_success{collector="processes"} 1
node_scrape_collector_success{collector="qdisc"} 1
node_scrape_collector_success{collector="schedstat"} 1
node_scrape_collector_success{collector="sockstat"} 1
node_scrape_collector_success{collector="stat"} 1
node_scrape_collector_success{collector="textfile"} 1
Expand Down
6 changes: 6 additions & 0 deletions collector/fixtures/proc/schedstat
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
version 15
timestamp 15819019232
cpu0 498494191 0 3533438552 2553969831 3853684107 2465731542 2045936778163039 343796328169361 4767485306
domain0 00000000,00000003 212499247 210112015 1861015 1860405436 536440 369895 32599 210079416 25368550 24241256 384652 927363878 807233 6366 1647 24239609 2122447165 1886868564 121112060 2848625533 125678146 241025 1032026 1885836538 2545 12 2533 0 0 0 0 0 0 1387952561 21076581 0
cpu1 518377256 0 4155211005 2778589869 10466382 2867629021 1904686152592476 364107263788241 5145567945
domain0 00000000,00000003 217653037 215526982 1577949 1580427380 557469 393576 28538 215498444 28721913 27662819 371153 870843407 745912 5523 1639 27661180 2331056874 2107732788 111442342 652402556 123615235 196159 1045245 2106687543 2400 3 2397 0 0 0 0 0 0 1437804657 26220076 0
94 changes: 94 additions & 0 deletions collector/schedstat_linux.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
// Copyright 2019 The Prometheus Authors
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

package collector

import (
"fmt"

"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/procfs"
)

var (
runningSecondsTotal = prometheus.NewDesc(
prometheus.BuildFQName(namespace, "schedstat", "running_seconds_total"),
"Number of seconds CPU spent running a process.",
[]string{"cpu"},
nil,
)

waitingSecondsTotal = prometheus.NewDesc(
prometheus.BuildFQName(namespace, "schedstat", "waiting_seconds_total"),
"Number of seconds spent by processing waiting for this CPU.",
[]string{"cpu"},
nil,
)

timeslicesTotal = prometheus.NewDesc(
prometheus.BuildFQName(namespace, "schedstat", "timeslices_total"),
"Number of timeslices executed by CPU.",
[]string{"cpu"},
nil,
)
)

// NewSchedstatCollector returns a new Collector exposing task scheduler statistics
func NewSchedstatCollector() (Collector, error) {
fs, err := procfs.NewFS(*procPath)
if err != nil {
return nil, fmt.Errorf("failed to open procfs: %v", err)
}

return &schedstatCollector{fs: fs}, nil
}

type schedstatCollector struct {
fs procfs.FS
}

func init() {
registerCollector("schedstat", defaultEnabled, NewSchedstatCollector)
}

func (c *schedstatCollector) Update(ch chan<- prometheus.Metric) error {
stats, err := c.fs.Schedstat()
if err != nil {
return err
}

for _, cpu := range stats.CPUs {
ch <- prometheus.MustNewConstMetric(
runningSecondsTotal,
prometheus.CounterValue,
cpu.RunningSeconds(),
cpu.CPUNum,
)

ch <- prometheus.MustNewConstMetric(
waitingSecondsTotal,
prometheus.CounterValue,
cpu.WaitingSeconds(),
cpu.CPUNum,
)

ch <- prometheus.MustNewConstMetric(
timeslicesTotal,
prometheus.CounterValue,
float64(cpu.RunTimeslices),
cpu.CPUNum,
)
}

return nil
}
1 change: 1 addition & 0 deletions end-to-end-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ enabled_collectors=$(cat << COLLECTORS
nfsd
pressure
qdisc
schedstat
sockstat
stat
textfile
Expand Down
2 changes: 1 addition & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ require (
github.com/prometheus/client_golang v1.0.0
github.com/prometheus/client_model v0.0.0-20190129233127-fd36f4220a90
github.com/prometheus/common v0.4.1
github.com/prometheus/procfs v0.0.4-0.20190627154503-39e1aff1547e
github.com/prometheus/procfs v0.0.4-0.20190702183519-8f55e607908e
github.com/siebenmann/go-kstat v0.0.0-20160321171754-d34789b79745
github.com/sirupsen/logrus v1.4.2 // indirect
github.com/soundcloud/go-runit v0.0.0-20150630195641-06ad41a06c4a
Expand Down
4 changes: 2 additions & 2 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -68,8 +68,8 @@ github.com/prometheus/common v0.4.1/go.mod h1:TNfzLD0ON7rHzMJeJkieUDPYmFC7Snx/y8
github.com/prometheus/procfs v0.0.0-20181005140218-185b4288413d/go.mod h1:c3At6R/oaqEKCNdg8wHV1ftS6bRYblBhIjjI8uT2IGk=
github.com/prometheus/procfs v0.0.2 h1:6LJUbpNm42llc4HRCuvApCSWB/WfhuNo9K98Q9sNGfs=
github.com/prometheus/procfs v0.0.2/go.mod h1:TjEm7ze935MbeOT/UhFTIMYKhuLP4wbCsTZCD3I8kEA=
github.com/prometheus/procfs v0.0.4-0.20190627154503-39e1aff1547e h1:wHo3spLBVUEsk0etFRCYPI6CpKUZnxq0JwqXjaQhp/g=
github.com/prometheus/procfs v0.0.4-0.20190627154503-39e1aff1547e/go.mod h1:4A/X28fw3Fc593LaREMrKMqOKvUAntwMDaekg4FpcdQ=
github.com/prometheus/procfs v0.0.4-0.20190702183519-8f55e607908e h1:p57e/ejwNofSHhEh+d7KoCVpHSTN3efX5Aj3z0jGWIE=
github.com/prometheus/procfs v0.0.4-0.20190702183519-8f55e607908e/go.mod h1:4A/X28fw3Fc593LaREMrKMqOKvUAntwMDaekg4FpcdQ=
github.com/siebenmann/go-kstat v0.0.0-20160321171754-d34789b79745 h1:IuH7WumZNax0D+rEqmy2TyhKCzrtMGqbZO0b8rO00JA=
github.com/siebenmann/go-kstat v0.0.0-20160321171754-d34789b79745/go.mod h1:G81aIFAMS9ECrwBYR9YxhlPjWgrItd+Kje78O6+uqm8=
github.com/sirupsen/logrus v1.2.0/go.mod h1:LxeOpSwHxABJmUn/MG1IvRgCAasNZTLOkJPxbbu5VWo=
Expand Down
27 changes: 27 additions & 0 deletions vendor/github.com/prometheus/procfs/fixtures.ttar

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 9 additions & 0 deletions vendor/github.com/prometheus/procfs/proc.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

128 changes: 128 additions & 0 deletions vendor/github.com/prometheus/procfs/schedstat.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion vendor/modules.txt
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ github.com/prometheus/common/version
github.com/prometheus/common/expfmt
github.com/prometheus/common/model
github.com/prometheus/common/internal/bitbucket.org/ww/goautoneg
# github.com/prometheus/procfs v0.0.4-0.20190627154503-39e1aff1547e
# github.com/prometheus/procfs v0.0.4-0.20190702183519-8f55e607908e
github.com/prometheus/procfs
github.com/prometheus/procfs/bcache
github.com/prometheus/procfs/nfs
Expand Down

0 comments on commit f693a71

Please sign in to comment.