-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Metric tekton_pipelines_controller_pipelinerun_count #4468
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -195,12 +195,6 @@ func (c *Reconciler) ReconcileKind(ctx context.Context, pr *v1beta1.PipelineRun) | |
logger.Errorf("Failed to update Run status for PipelineRun %s: %v", pr.Name, err) | ||
return c.finishReconcileUpdateEmitEvents(ctx, pr, before, err) | ||
} | ||
go func(metrics *pipelinerunmetrics.Recorder) { | ||
err := metrics.DurationAndCount(pr) | ||
if err != nil { | ||
logger.Warnf("Failed to log the metrics : %v", err) | ||
} | ||
}(c.metrics) | ||
return c.finishReconcileUpdateEmitEvents(ctx, pr, before, nil) | ||
} | ||
|
||
|
@@ -242,6 +236,24 @@ func (c *Reconciler) ReconcileKind(ctx context.Context, pr *v1beta1.PipelineRun) | |
return nil | ||
} | ||
|
||
func (c *Reconciler) durationAndCountMetrics(ctx context.Context, pr *v1beta1.PipelineRun) { | ||
logger := logging.FromContext(ctx) | ||
if pr.IsDone() { | ||
// We get latest pipelinerun cr already to avoid recount | ||
newPr, err := c.pipelineRunLister.PipelineRuns(pr.Namespace).Get(pr.Name) | ||
if err != nil { | ||
logger.Errorf("Error getting PipelineRun %s when updating metrics: %w", pr.Name, err) | ||
} | ||
before := newPr.Status.GetCondition(apis.ConditionSucceeded) | ||
go func(metrics *pipelinerunmetrics.Recorder) { | ||
err := metrics.DurationAndCount(pr, before) | ||
if err != nil { | ||
logger.Warnf("Failed to log the metrics : %v", err) | ||
} | ||
}(c.metrics) | ||
} | ||
} | ||
|
||
func (c *Reconciler) finishReconcileUpdateEmitEvents(ctx context.Context, pr *v1beta1.PipelineRun, beforeCondition *apis.Condition, previousError error) error { | ||
logger := logging.FromContext(ctx) | ||
|
||
|
@@ -319,6 +331,7 @@ func (c *Reconciler) resolvePipelineState( | |
} | ||
|
||
func (c *Reconciler) reconcile(ctx context.Context, pr *v1beta1.PipelineRun, getPipelineFunc resources.GetPipeline) error { | ||
defer c.durationAndCountMetrics(ctx, pr) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What about creating a copy of the PR condition in a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @afrittoli When I was running a test, somehow the same event came twice even when the informer cache was updated. So I decided to use informer cache. Any idea why this happens? @wlynch Is it due to the controller resync? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @afrittoli @wlynch Please check this comment: |
||
logger := logging.FromContext(ctx) | ||
cfg := config.FromContextOrDefaults(ctx) | ||
pr.SetDefaults(ctx) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not super familiar with how/when the lister updates, but wouldn't pulling in the latest status mean that the current pr status and the "before" status always match, since the status is updated during
reconcile
?It's would be useful to add a test to make sure we're handling the reconciler state changes properly (i.e. making sure we're avoiding instances where before always equals after) - right now we're really only testing the DurationAndCount func directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reconcile
changes status but that isn't applied to thepipelienrun
object inInformer
cache.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding aligns with @khrm, the status changes but the change is not visible through the lister.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my own knowledge - When would the lister value update? Is it because we're updating only the status that this is safe to do?
I'm a bit confused with the current impl/comments since we're getting the "latest" pipelinerun, but this is really the pipelinerun before the latest status update?
I played around with this change and this appears to be correct (I even tried adding some sleeps to try and trigger a race condition), but it feels like we're relying on non-obvious behavior of the lister. I'd have a lot more peace of mind if we could add a test to ensure that it's safe rely on this lister behavior and be able to catch if this assumption is no longer true (i.e. if we make an unrelated change that would cause the lister to start returning the true latest value).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, sounds good. @khrm please add a unit test including an update during the
reconcile
in addition to what we have otherwise looks good to me 👍 Thanks!There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I share @wlynch concern - this feels like introducing a potential race condition.
Wouldn't it be safer to extract the beforeCondition at the beginning of the reconcile so we can use it later?
That's what we do for events already - see https://github.com/tektoncd/pipeline/blob/7855cb720c3fc59e585c2e230f71e488fc9038be/pkg/reconciler/pipelinerun/pipelinerun.go#L148:L148.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check this comment.
#4468 (comment)