-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory numa stats #2621
Memory numa stats #2621
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -422,7 +422,8 @@ func NewPrometheusCollector(i infoProvider, f ContainerLabelsFunc, includedMetri | |
getValues: func(s *info.ContainerStats) metricValues { | ||
return metricValues{{value: float64(s.Memory.WorkingSet), timestamp: s.Timestamp}} | ||
}, | ||
}, { | ||
}, | ||
{ | ||
name: "container_memory_failures_total", | ||
help: "Cumulative count of memory allocation failures.", | ||
valueType: prometheus.CounterValue, | ||
|
@@ -454,6 +455,38 @@ func NewPrometheusCollector(i infoProvider, f ContainerLabelsFunc, includedMetri | |
}, | ||
}...) | ||
} | ||
if includedMetrics.Has(container.MemoryNumaMetrics) { | ||
c.containerMetrics = append(c.containerMetrics, []containerMetric{ | ||
{ | ||
name: "container_memory_numa_pages", | ||
help: "Memory usage per numa node", | ||
valueType: prometheus.GaugeValue, | ||
extraLabels: []string{"type", "scope", "node"}, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we only add two labels to metrics below? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this comment is still relevant There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, I see now. |
||
getValues: func(s *info.ContainerStats) metricValues { | ||
values := make(metricValues, 0) | ||
|
||
values = append(values, getNumaStatsPerNode(s.Memory.ContainerData.NumaStats.Total, | ||
[]string{"total", "container"}, s.Timestamp)...) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The general rule of thumb for when to use a label vs when to add a new metric is that the sum of a metric across all dimensions should be meaningful. "total" isn't a great dimension to have, as we would expect the sum of dimensions to be the "total". So we can either calculate the "other" portion, or make There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In documentation it is written that "total" count is sum of file + anon + unevictable so I'll remove metrics with "type"="total". |
||
values = append(values, getNumaStatsPerNode(s.Memory.ContainerData.NumaStats.File, | ||
[]string{"file", "container"}, s.Timestamp)...) | ||
values = append(values, getNumaStatsPerNode(s.Memory.ContainerData.NumaStats.Anon, | ||
[]string{"anon", "container"}, s.Timestamp)...) | ||
values = append(values, getNumaStatsPerNode(s.Memory.ContainerData.NumaStats.Unevictable, | ||
[]string{"unevictable", "container"}, s.Timestamp)...) | ||
|
||
values = append(values, getNumaStatsPerNode(s.Memory.HierarchicalData.NumaStats.Total, | ||
[]string{"total", "hierarchy"}, s.Timestamp)...) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I suspect hierarchy vs container may need to be separate metrics by the same logic above. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I followed the pattern which was used for "container_memory_failures_total metric", see this. There is "scope" label with values "container" or "hierarchy". |
||
values = append(values, getNumaStatsPerNode(s.Memory.HierarchicalData.NumaStats.File, | ||
[]string{"file", "hierarchy"}, s.Timestamp)...) | ||
values = append(values, getNumaStatsPerNode(s.Memory.HierarchicalData.NumaStats.Anon, | ||
[]string{"anon", "hierarchy"}, s.Timestamp)...) | ||
values = append(values, getNumaStatsPerNode(s.Memory.HierarchicalData.NumaStats.Unevictable, | ||
[]string{"unevictable", "hierarchy"}, s.Timestamp)...) | ||
return values | ||
}, | ||
}, | ||
}...) | ||
} | ||
if includedMetrics.Has(container.AcceleratorUsageMetrics) { | ||
c.containerMetrics = append(c.containerMetrics, []containerMetric{ | ||
{ | ||
|
@@ -1903,3 +1936,12 @@ var invalidNameCharRE = regexp.MustCompile(`[^a-zA-Z0-9_]`) | |
func sanitizeLabelName(name string) string { | ||
return invalidNameCharRE.ReplaceAllString(name, "_") | ||
} | ||
|
||
func getNumaStatsPerNode(nodeStats map[uint8]uint64, labels []string, timestamp time.Time) metricValues { | ||
mValues := make(metricValues, 0, len(nodeStats)) | ||
for node, stat := range nodeStats { | ||
nodeLabels := append(labels, strconv.FormatUint(uint64(node), 10)) | ||
mValues = append(mValues, metricValue{value: float64(stat), labels: nodeLabels, timestamp: timestamp}) | ||
} | ||
return mValues | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this measured in #pages, or bytes? The help text should specify the units, and the suffix of the metric should be _
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here it is written that memory.numa_stat contains pages and I see that in runc values are only read from file. I'll improve help text.