Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(inputs.kernel): Add Pressure Stall Information #14507

Merged
merged 26 commits into from
Jan 5, 2024
Merged
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
5e91c12
Download psi.go from https://github.com/gridscale/linux-psi-telegraf-…
iBug Dec 26, 2023
4b91d15
psi: Split up psi.go, add sample.conf and README
iBug Dec 26, 2023
df37b00
Cleanup psi_linux.go, add inputs/all/psi.go
iBug Dec 26, 2023
800c09b
psi: Polish README
iBug Dec 26, 2023
a9e061a
psi: Add unit test with go:build linux
iBug Dec 26, 2023
9f38df1
inputs/psi: Add credits to gridscale/linux-psi-telegraf-plugin
iBug Dec 26, 2023
c4cb52b
chore(README): Linting rules was a bit more stupid than I had previou…
iBug Dec 26, 2023
1becff8
psi_linux: Merge type=some and type=full with a for loop
iBug Dec 26, 2023
0e30425
psi: Add stub unit test for nonlinux
iBug Dec 26, 2023
aab6c83
fix: (*procfs.PSILine).Full was not used at all
iBug Dec 27, 2023
a1ab1aa
psi_linux: Merge two loops with same structure
iBug Dec 27, 2023
9107636
psi_test: Add supporting tests for aab6c83e7a181e53a205946a64b8c81d22…
iBug Dec 27, 2023
90335b6
psi: Merge pressure and pressureTotal into one series
iBug Jan 2, 2024
4e6cae0
inputs/psi: Start migrating PSI code to inputs/kernel
iBug Jan 4, 2024
47626a0
Move code from inputs/psi to inputs/kernel
iBug Jan 4, 2024
c970755
inputs/kernel: Fix documentation and formatting
iBug Jan 4, 2024
69fe9a4
inputs/kernel: Fix README long lines
iBug Jan 4, 2024
bcad064
inputs/kernel: Add initializer value for k.psiDir
iBug Jan 4, 2024
5da3e6c
inputs/kernel: Rename psi_linux.go to psi.go
iBug Jan 4, 2024
4537dff
Remove trailing punctuations in error messages
iBug Jan 4, 2024
1ea12c2
Add TestPSIEnabledWrongDir
iBug Jan 4, 2024
588f12e
inputs.kernel: Use k.psiDir for initializing procfs.FS, add test data
iBug Jan 4, 2024
2067d63
inputs.kernel: Merge getPressureValues and uploadPressure
iBug Jan 4, 2024
9261d75
Apply suggestions from code review
iBug Jan 5, 2024
56dd9b2
Format with goimports -local
iBug Jan 5, 2024
4c12350
inputs/kernel/kernel.go: Revert error wrapping on k.gatherPressure
iBug Jan 5, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 55 additions & 5 deletions plugins/inputs/kernel/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,14 @@ This plugin is only available on Linux.
The kernel plugin gathers info about the kernel that doesn't fit into other
plugins. In general, it is the statistics available in `/proc/stat` that are not
covered by other plugins as well as the value of
`/proc/sys/kernel/random/entropy_avail` and optionally, Kernel Samepage Merging.
`/proc/sys/kernel/random/entropy_avail` and optionally, Kernel Samepage Merging
and Pressure Stall Information.

The metrics are documented in `man proc` under the `/proc/stat` section.
The metrics are documented in `man 4 random` under the `/proc/stat` section.
The metrics are documented in `man 5 proc` under the `/proc/stat` section, as
well as `man 4 random` under the `/proc interfaces` section
(for `entropy_avail`).

```text

/proc/sys/kernel/random/entropy_avail
Contains the value of available entropy

Expand Down Expand Up @@ -40,10 +41,28 @@ Number of forks since boot.
```

Kernel Samepage Merging is generally documented in [kernel documentation][1] and
the available metrics exposed via sysfs are documented in [admin guide][2]
the available metrics exposed via sysfs are documented in [admin guide][2].

Pressure Stall Information is exposed through `/proc/pressure` and is documented
in [kernel documentation][3]. Kernel version 4.20 or later is required.
Examples of PSI:

```shell
# /proc/pressure/cpu
some avg10=1.53 avg60=1.87 avg300=1.73 total=1088168194

# /proc/pressure/memory
some avg10=0.00 avg60=0.00 avg300=0.00 total=3463792
full avg10=0.00 avg60=0.00 avg300=0.00 total=1429641

# /proc/pressure/io
some avg10=0.00 avg60=0.00 avg300=0.00 total=68568296
full avg10=0.00 avg60=0.00 avg300=0.00 total=54982338
```

[1]: https://www.kernel.org/doc/html/latest/mm/ksm.html
[2]: https://www.kernel.org/doc/html/latest/admin-guide/mm/ksm.html#ksm-daemon-sysfs-interface
[3]: https://www.kernel.org/doc/html/latest/accounting/psi.html

## Global configuration options <!-- @/docs/includes/plugin_config.md -->

Expand All @@ -63,6 +82,7 @@ See the [CONFIGURATION.md][CONFIGURATION.md] for more details.
## Additional gather options
## Possible options include:
## * ksm - kernel same-page merging
## * psi - pressure stall information
# collect = []
```

Expand Down Expand Up @@ -91,11 +111,41 @@ See the [CONFIGURATION.md][CONFIGURATION.md] for more details.
- ksm_stable_node_dups (integer, number of duplicated KSM pages, `stable_node_dups`)
- ksm_use_zero_pages (integer, whether empty pages should be treated specially, `use_zero_pages`)

- pressure (if `psi` is included in `collect`)
- tags:
- resource: cpu, memory, or io
- type: some or full
- floating-point fields: avg10, avg60, avg300
- integer fields: total

## Example Output

Default config:

```text
kernel boot_time=1690487872i,context_switches=321398652i,entropy_avail=256i,interrupts=141868628i,processes_forked=946492i 1691339564000000000
```

If `ksm` is included in `collect`:

```text
kernel boot_time=1690487872i,context_switches=321252729i,entropy_avail=256i,interrupts=141783427i,ksm_full_scans=0i,ksm_max_page_sharing=256i,ksm_merge_across_nodes=1i,ksm_pages_shared=0i,ksm_pages_sharing=0i,ksm_pages_to_scan=100i,ksm_pages_unshared=0i,ksm_pages_volatile=0i,ksm_run=0i,ksm_sleep_millisecs=20i,ksm_stable_node_chains=0i,ksm_stable_node_chains_prune_millisecs=2000i,ksm_stable_node_dups=0i,ksm_use_zero_pages=0i,processes_forked=946467i 1691339522000000000
```

If `psi` is included in `collect`:

```text
pressure,resource=cpu,type=some avg10=1.53,avg60=1.87,avg300=1.73 1700000000000000000
pressure,resource=memory,type=some avg10=0.00,avg60=0.00,avg300=0.00 1700000000000000000
pressure,resource=memory,type=full avg10=0.00,avg60=0.00,avg300=0.00 1700000000000000000
pressure,resource=io,type=some avg10=0.0,avg60=0.0,avg300=0.0 1700000000000000000
pressure,resource=io,type=full avg10=0.0,avg60=0.0,avg300=0.0 1700000000000000000
pressure,resource=cpu,type=some total=1088168194i 1700000000000000000
pressure,resource=memory,type=some total=3463792i 1700000000000000000
pressure,resource=memory,type=full total=1429641i 1700000000000000000
pressure,resource=io,type=some total=68568296i 1700000000000000000
pressure,resource=io,type=full total=54982338i 1700000000000000000
```

Note that the combination for `resource=cpu,type=full` is omitted because it is
always zero.
22 changes: 20 additions & 2 deletions plugins/inputs/kernel/kernel.go
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ import (

"github.com/influxdata/telegraf"
"github.com/influxdata/telegraf/plugins/inputs"
"github.com/prometheus/procfs"
iBug marked this conversation as resolved.
Show resolved Hide resolved
)

//go:embed sample.conf
Expand All @@ -34,6 +35,8 @@ type Kernel struct {
statFile string
entropyStatFile string
ksmStatsDir string
psiDir string
procfs procfs.FS
}

func (k *Kernel) Init() error {
Expand All @@ -45,7 +48,15 @@ func (k *Kernel) Init() error {
if k.optCollect["ksm"] {
if _, err := os.Stat(k.ksmStatsDir); os.IsNotExist(err) {
// ksm probably not enabled in the kernel, bail out early
return fmt.Errorf("directory %q does not exist. Is KSM enabled in this kernel?", k.ksmStatsDir)
return fmt.Errorf("directory %q does not exist. KSM is not enabled in this kernel", k.ksmStatsDir)
}
}
if k.optCollect["psi"] {
procdir := filepath.Dir(k.psiDir)
var err error
if k.procfs, err = procfs.NewFS(procdir); err != nil {
// psi probably not supported in the kernel, bail out early
return fmt.Errorf("failed to initialize procfs on %s: %w", procdir, err)
}
}
return nil
Expand Down Expand Up @@ -145,12 +156,18 @@ func (k *Kernel) Gather(acc telegraf.Accumulator) error {
}
acc.AddCounter("kernel", fields, map[string]string{})

if k.optCollect["psi"] {
if err := k.gatherPressure(acc); err != nil {
return err
iBug marked this conversation as resolved.
Show resolved Hide resolved
}
}

return nil
}

func (k *Kernel) getProcValueBytes(path string) ([]byte, error) {
if _, err := os.Stat(path); os.IsNotExist(err) {
return nil, fmt.Errorf("Path %q does not exist", path)
return nil, fmt.Errorf("path %q does not exist", path)
} else if err != nil {
return nil, err
}
Expand Down Expand Up @@ -183,6 +200,7 @@ func init() {
statFile: "/proc/stat",
entropyStatFile: "/proc/sys/kernel/random/entropy_avail",
ksmStatsDir: "/sys/kernel/mm/ksm",
psiDir: "/proc/pressure",
}
})
}
2 changes: 1 addition & 1 deletion plugins/inputs/kernel/kernel_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -200,7 +200,7 @@ func TestKSMEnabledWrongDir(t *testing.T) {
ConfigCollect: []string{"ksm"},
}

require.ErrorContains(t, k.Init(), "Is KSM enabled in this kernel?")
require.ErrorContains(t, k.Init(), "KSM is not enabled in this kernel")
}

func TestKSMDisabledNoKSMTags(t *testing.T) {
Expand Down
50 changes: 50 additions & 0 deletions plugins/inputs/kernel/psi.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
//go:build linux

package kernel

import (
"fmt"
"time"

"github.com/influxdata/telegraf"
"github.com/prometheus/procfs"
)

// Gather PSI metrics
func (k *Kernel) gatherPressure(acc telegraf.Accumulator) error {
for _, resource := range []string{"cpu", "memory", "io"} {
now := time.Now()
psiStats, err := k.procfs.PSIStatsForResource(resource)
if err != nil {
return fmt.Errorf("failed to read %s pressure: %w", resource, err)
}

stats := map[string]*procfs.PSILine{
"some": psiStats.Some,
"full": psiStats.Full,
}

for _, typ := range []string{"some", "full"} {
if resource == "cpu" && typ == "full" {
// resource=cpu,type=full is omitted because it is always zero
continue
}

tags := map[string]string{
"resource": resource,
"type": typ,
}
stat := stats[typ]

acc.AddCounter("pressure", map[string]interface{}{
"total": stat.Total,
}, tags, now)
acc.AddGauge("pressure", map[string]interface{}{
"avg10": stat.Avg10,
"avg60": stat.Avg60,
"avg300": stat.Avg300,
}, tags, now)
}
}
return nil
}
76 changes: 76 additions & 0 deletions plugins/inputs/kernel/psi_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
//go:build linux

package kernel

import (
"testing"

"github.com/influxdata/telegraf/testutil"
"github.com/stretchr/testify/require"
)

func TestPSIEnabledWrongDir(t *testing.T) {
k := Kernel{
psiDir: "testdata/this_directory_does_not_exist/stub",
ConfigCollect: []string{"psi"},
}

require.ErrorContains(t, k.Init(), "failed to initialize procfs on ")
}

func TestPSIStats(t *testing.T) {
var acc testutil.Accumulator

k := Kernel{
psiDir: "testdata/pressure",
ConfigCollect: []string{"psi"},
}
require.NoError(t, k.Init())
require.NoError(t, k.gatherPressure(&acc))

// separate fields for gauges and counters
pressureFields := map[string]map[string]interface{}{
"some": {
"avg10": float64(10),
"avg60": float64(60),
"avg300": float64(300),
},
"full": {
"avg10": float64(1),
"avg60": float64(6),
"avg300": float64(30),
},
}
pressureTotalFields := map[string]map[string]interface{}{
"some": {
"total": uint64(114514),
},
"full": {
"total": uint64(11451),
},
}

for _, typ := range []string{"some", "full"} {
for _, resource := range []string{"cpu", "memory", "io"} {
if resource == "cpu" && typ == "full" {
continue
}

tags := map[string]string{
"resource": resource,
"type": typ,
}

acc.AssertContainsTaggedFields(t, "pressure", pressureFields[typ], tags)
acc.AssertContainsTaggedFields(t, "pressure", pressureTotalFields[typ], tags)
}
}

// The combination "resource=cpu,type=full" should NOT appear anywhere
forbiddenTags := map[string]string{
"resource": "cpu",
"type": "full",
}
acc.AssertDoesNotContainsTaggedFields(t, "pressure", pressureFields["full"], forbiddenTags)
acc.AssertDoesNotContainsTaggedFields(t, "pressure", pressureTotalFields["full"], forbiddenTags)
iBug marked this conversation as resolved.
Show resolved Hide resolved
}
1 change: 1 addition & 0 deletions plugins/inputs/kernel/sample.conf
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,5 @@
## Additional gather options
## Possible options include:
## * ksm - kernel same-page merging
## * psi - pressure stall information
# collect = []
2 changes: 2 additions & 0 deletions plugins/inputs/kernel/testdata/pressure/cpu
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
some avg10=10.00 avg60=60.00 avg300=300.00 total=114514
full avg10=0.00 avg60=0.00 avg300=0.00 total=0
2 changes: 2 additions & 0 deletions plugins/inputs/kernel/testdata/pressure/io
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
some avg10=10.00 avg60=60.00 avg300=300.00 total=114514
full avg10=1.00 avg60=6.00 avg300=30.00 total=11451
2 changes: 2 additions & 0 deletions plugins/inputs/kernel/testdata/pressure/memory
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
some avg10=10.00 avg60=60.00 avg300=300.00 total=114514
full avg10=1.00 avg60=6.00 avg300=30.00 total=11451
Loading