Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metrics: improve reading Go runtime metrics #25886

Merged
merged 30 commits into from
Nov 11, 2022

Conversation

fjl
Copy link
Contributor

@fjl fjl commented Sep 28, 2022

This PR changes how we read performance metrics from the Go runtime. Instead of
using runtime.ReadMemStats, we now rely on the API provided by package runtime/metrics.

Here's why:

ReadMemStats is a stop-the-world operation, It requires halting all goroutines,
resulting in very unpredictable latency. Under high load, it can sometimes take milliseconds
to run, and all other work will be interrupted while it waits.

runtime/metrics also provides more accurate information. For example, the new interface
has better reporting of memory use. In my testing, the reported value of held memory
more accurately reflects the usage reported by the OS.

The semantics of metrics system/memory/allocs and system/memory/frees have
changed to report amounts in bytes. ReadMemStats only reported the count of allocations
in number-of-objects. This is imprecise: 'tiny objects' are not counted because the runtime
allocates them in batches; and certain improvements in allocation behavior, such as struct
size optimizations, will be less visible when the number of allocs doesn't change.

Changing allocation reports to be in bytes will make it appear in graphs that lots more is
being allocated. I don't think that's a problem because this metric is primarily interesting
for geth developers.

I have also deleted a lot of old and unused code for reading runtime metrics. The functions
in metrics/runtime*.go were never used in go-ethereum.

@fjl
Copy link
Contributor Author

fjl commented Sep 28, 2022

GC pause metric doesn't work yet

@fjl fjl removed the status:triage label Oct 12, 2022
@fjl fjl force-pushed the metrics-runtime-metrics branch 2 times, most recently from 24dff68 to 825acd1 Compare November 9, 2022 21:09
Copy link
Contributor

@holiman holiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I think we should merge it and then try it out on our infra.

@fjl fjl merged commit c539bda into ethereum:master Nov 11, 2022
@fjl fjl added this to the 1.11.0 milestone Nov 11, 2022
shekhirin pushed a commit to shekhirin/go-ethereum that referenced this pull request Jun 6, 2023
This changes how we read performance metrics from the Go runtime. Instead
of using runtime.ReadMemStats, we now rely on the API provided by package
runtime/metrics.

runtime/metrics provides more accurate information. For example, the new
interface has better reporting of memory use. In my testing, the reported
value of held memory more accurately reflects the usage reported by the OS.

The semantics of metrics system/memory/allocs and system/memory/frees have
changed to report amounts in bytes. ReadMemStats only reported the count of
allocations in number-of-objects. This is imprecise: 'tiny objects' are not
counted because the runtime allocates them in batches; and certain
improvements in allocation behavior, such as struct size optimizations,
will be less visible when the number of allocs doesn't change.

Changing allocation reports to be in bytes makes it appear in graphs that
lots more is being allocated. I don't think that's a problem because this
metric is primarily interesting for geth developers.

The metric system/memory/pauses has been changed to report statistical
values from the histogram provided by the runtime. Its name in influxdb has
changed from geth.system/memory/pauses.meter to
geth.system/memory/pauses.histogram.

We also have a new histogram metric, system/cpu/schedlatency, reporting the
Go scheduler latency.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants