Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error adding metric sample for container : sample discarded invalid or out of order. #6965

Open
cp319391 opened this issue Jun 24, 2024 · 3 comments
Assignees
Labels
area/vertical-pod-autoscaler kind/bug Categorizes issue or PR as related to a bug.

Comments

@cp319391
Copy link

Which component are you using?:

vertical-pod-autoscaler

What version of the component are you using?:

Component version:

Vertical-pod-autoscaler : v9.8.2

getting lot of logging for VPA recommender pod with

error adding metric sample for container : sample discarded invalid or out of order. 

How to avoid these multiple log spam.

@cp319391 cp319391 added the kind/bug Categorizes issue or PR as related to a bug. label Jun 24, 2024
@adrianmoisey
Copy link
Contributor

/area vertical-pod-autoscaler

@voelzmo
Copy link
Contributor

voelzmo commented Jul 15, 2024

/assign @adrianmoisey

@adrianmoisey
Copy link
Contributor

adrianmoisey commented Jul 15, 2024

For my own journey this seems to be the code path in question:

if err := feeder.clusterState.AddSample(
&model.ContainerUsageSampleWithKey{
ContainerUsageSample: sample,
Container: containerID,
}); err != nil {
klog.Warningf("Error adding metric sample for container %v: %v", containerID, err)
}

or
if err := feeder.clusterState.AddSample(sample); err != nil {
// Not all pod states are tracked in memory saver mode
if _, isKeyError := err.(model.KeyError); isKeyError && feeder.memorySaveMode {
continue
}
klog.Warningf("Error adding metric sample for container %v: %v", sample.Container, err)
droppedSampleCount++
} else {
sampleCount++
}

Those places seem to refer to this:

// AddSample adds a new usage sample to the proper container in the ClusterState
// object. Requires the container as well as the parent pod to be added to the
// ClusterState first. Otherwise an error is returned.
func (cluster *ClusterState) AddSample(sample *ContainerUsageSampleWithKey) error {
pod, podExists := cluster.Pods[sample.Container.PodID]
if !podExists {
return NewKeyError(sample.Container.PodID)
}
containerState, containerExists := pod.Containers[sample.Container.ContainerName]
if !containerExists {
return NewKeyError(sample.Container)
}
if !containerState.AddSample(&sample.ContainerUsageSample) {
return fmt.Errorf("sample discarded (invalid or out of order)")
}
return nil
}

Which takes us to this:

// AddSample adds a usage sample to the given ContainerState. Requires samples
// for a single resource to be passed in chronological order (i.e. in order of
// growing MeasureStart). Invalid samples (out of order or measure out of legal
// range) are discarded. Returns true if the sample was aggregated, false if it
// was discarded.
// Note: usage samples don't hold their end timestamp / duration. They are
// implicitly assumed to be disjoint when aggregating.
func (container *ContainerState) AddSample(sample *ContainerUsageSample) bool {
switch sample.Resource {
case ResourceCPU:
return container.addCPUSample(sample)
case ResourceMemory:
return container.addMemorySample(sample, false)
default:
return false
}
}

And finally, comes from:

func (container *ContainerState) addCPUSample(sample *ContainerUsageSample) bool {
// Order should not matter for the histogram, other than deduplication.
if !sample.isValid(ResourceCPU) || !sample.MeasureStart.After(container.LastCPUSampleStart) {
return false // Discard invalid, duplicate or out-of-order samples.
}
container.observeQualityMetrics(sample.Usage, false, corev1.ResourceCPU)
container.aggregator.AddSample(sample)
container.LastCPUSampleStart = sample.MeasureStart
return true
}

or
func (container *ContainerState) addMemorySample(sample *ContainerUsageSample, isOOM bool) bool {
ts := sample.MeasureStart
// We always process OOM samples.
if !sample.isValid(ResourceMemory) ||
(!isOOM && ts.Before(container.lastMemorySampleStart)) {
return false // Discard invalid or outdated samples.
}
container.lastMemorySampleStart = ts
if container.WindowEnd.IsZero() { // This is the first sample.
container.WindowEnd = ts
}
// Each container aggregates one peak per aggregation interval. If the timestamp of the
// current sample is earlier than the end of the current interval (WindowEnd) and is larger
// than the current peak, the peak is updated in the aggregation by subtracting the old value
// and adding the new value.
addNewPeak := false
if ts.Before(container.WindowEnd) {
oldMaxMem := container.GetMaxMemoryPeak()
if oldMaxMem != 0 && sample.Usage > oldMaxMem {
// Remove the old peak.
oldPeak := ContainerUsageSample{
MeasureStart: container.WindowEnd,
Usage: oldMaxMem,
Request: sample.Request,
Resource: ResourceMemory,
}
container.aggregator.SubtractSample(&oldPeak)
addNewPeak = true
}
} else {
// Shift the memory aggregation window to the next interval.
memoryAggregationInterval := GetAggregationsConfig().MemoryAggregationInterval
shift := ts.Sub(container.WindowEnd).Truncate(memoryAggregationInterval) + memoryAggregationInterval
container.WindowEnd = container.WindowEnd.Add(shift)
container.memoryPeak = 0
container.oomPeak = 0
addNewPeak = true
}
container.observeQualityMetrics(sample.Usage, isOOM, corev1.ResourceMemory)
if addNewPeak {
newPeak := ContainerUsageSample{
MeasureStart: container.WindowEnd,
Usage: sample.Usage,
Request: sample.Request,
Resource: ResourceMemory,
}
container.aggregator.AddSample(&newPeak)
if isOOM {
container.oomPeak = sample.Usage
} else {
container.memoryPeak = sample.Usage
}
}
return true
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/vertical-pod-autoscaler kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

4 participants