-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory numa stats #2621
Memory numa stats #2621
Conversation
Signed-off-by: Katarzyna Kujawa <katarzyna.kujawa@intel.com>
metrics/prometheus.go
Outdated
values := make(metricValues, 0) | ||
|
||
values = append(values, getNumaStatsPerNode(s.Memory.ContainerData.NumaStats.Total, | ||
[]string{"total", "container"}, s.Timestamp)...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The general rule of thumb for when to use a label vs when to add a new metric is that the sum of a metric across all dimensions should be meaningful. "total" isn't a great dimension to have, as we would expect the sum of dimensions to be the "total". So we can either calculate the "other" portion, or make ...pages_total
a separate metric.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In documentation it is written that "total" count is sum of file + anon + unevictable so I'll remove metrics with "type"="total".
metrics/prometheus.go
Outdated
[]string{"unevictable", "container"}, s.Timestamp)...) | ||
|
||
values = append(values, getNumaStatsPerNode(s.Memory.HierarchicalData.NumaStats.Total, | ||
[]string{"total", "hierarchy"}, s.Timestamp)...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect hierarchy vs container may need to be separate metrics by the same logic above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I followed the pattern which was used for "container_memory_failures_total metric", see this. There is "scope" label with values "container" or "hierarchy".
name: "container_memory_numa_pages", | ||
help: "Memory usage per numa node", | ||
valueType: prometheus.GaugeValue, | ||
extraLabels: []string{"type", "scope", "node"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we only add two labels to metrics below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment is still relevant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see now.
if includedMetrics.Has(container.MemoryNumaMetrics) { | ||
c.containerMetrics = append(c.containerMetrics, []containerMetric{ | ||
{ | ||
name: "container_memory_numa_pages", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this measured in #pages, or bytes? The help text should specify the units, and the suffix of the metric should be _
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improve help text for prometheus metric Signed-off-by: Katarzyna Kujawa <katarzyna.kujawa@intel.com>
@dashpole Could you take a look if new names of metrics are better? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Signed-off-by: Katarzyna Kujawa katarzyna.kujawa@intel.com
This pull request introduces information from memory.numa_stat as Prometheus metrics.