`tekton_pipelines_controller_pipelinerun_count` metric counter increases without having any pipeline executed #4397

gmeghnag · 2021-11-26T09:36:25Z

Expected Behavior

To have the tekton_pipelines_controller_pipelinerun_count metric counter increased only when new pipelines are executed.

Actual Behavior

After a few hours the counter is increased without having any pipelines executed.

Steps to Reproduce the Problem on OpenShift

Run at least one pipeline in the cluster
Expose the metrics endpoint:

$ oc expose -n openshift-pipelines svc/tekton-pipelines-controller --port=9090 --path="/metrics"

Check the counter value:

$ METRICS_ENDPOINT=$(oc get route tekton-pipelines-controller -n openshift-pipelines -o jsonpath="{.spec.host}")/metrics
$ curl -s -k $METRICS_ENDPOINT | grep -v "#" | grep tekton_pipelines_controller_pipelinerun_count

Wait a few hours (4/5) without executing any pipelines, and after re-checking the counter value, you will see it increased

Additional Info

Kubernetes version:

Kubernetes Version: v1.20.0+9689d22

Tekton Pipeline version:

$ tkn version
Client version: 0.13.1
Pipeline version: v0.22.0
Triggers version: v0.12.1

The text was updated successfully, but these errors were encountered:

gmeghnag · 2021-11-26T11:45:58Z

DurationAndCount method:
https://github.com/tektoncd/pipeline/blob/release-v0.22.x/pkg/reconciler/pipelinerun/metrics.go#L126-L164

Where is Invoked:
https://github.com/tektoncd/pipeline/blob/release-v0.22.x/pkg/reconciler/pipelinerun/pipelinerun.go#L199-L206

guillaumerose · 2021-11-29T09:53:57Z

Yes I confirm this behaviour.

If I install Tekton with a resyncPeriod of the controller = 10 seconds, then when I create a single pipeline, every 10s, the counter tekton_pipelines_controller_pipelinerun_count{status="success"} is increased by 1.

Looking at the code, it shows also the duration histogram is incorrect. Older pipelines are counted many more times than younger pipelines.

The bug is by design and I think, the only way to remove it is to refactor the code to create this counter only using the lister and not using the controller loop.
If we use the controller loop, we need to use the beforeCondition/afterCondition check to find out if the pipeline was already counted or not. .. But it's not possible: if the controller restarts, we loose this information and the counter will start again at 0.

khrm · 2022-01-12T11:34:48Z

@wlynch I fixed this in a pr #4468.

@guillaumerose We generally use the rate function of Prometheus query to avoid instance restart related issues. Even with lister loop, information will be lost during restart because pipelinerun will be deleted by the end-user. So I think before and after condition check is sufficient to resolve this.

tekton-robot · 2022-05-08T20:43:22Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

lbernick · 2022-05-09T16:38:25Z

/remove-lifecycle stale

gmeghnag added the kind/bug Categorizes issue or PR as related to a bug. label Nov 26, 2021

gmeghnag changed the title ~~tekton_pipelines_controller_pipelinerun_count metric counter increases without having any pipeline executed~~ tekton_pipelines_controller_pipelinerun_count metric counter increases without having any pipeline executed Nov 26, 2021

dibyom added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Nov 29, 2021

jerop assigned wlynch Jan 10, 2022

lbernick added this to Pipelines V1 Jan 11, 2022

lbernick moved this to Todo in Pipelines V1 Jan 11, 2022

khrm mentioned this issue Jan 12, 2022

Fix Metric tekton_pipelines_controller_pipelinerun_count #4468

Merged

5 tasks

dibyom assigned khrm Feb 7, 2022

lbernick moved this from Todo to In Progress in Pipelines V1 Feb 22, 2022

tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 8, 2022

tekton-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 9, 2022

tekton-robot closed this as completed in #4468 May 25, 2022

Repository owner moved this from In Progress to Done in Pipelines V1 May 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`tekton_pipelines_controller_pipelinerun_count` metric counter increases without having any pipeline executed #4397

`tekton_pipelines_controller_pipelinerun_count` metric counter increases without having any pipeline executed #4397

gmeghnag commented Nov 26, 2021

gmeghnag commented Nov 26, 2021

guillaumerose commented Nov 29, 2021

khrm commented Jan 12, 2022

tekton-robot commented May 8, 2022

lbernick commented May 9, 2022

tekton_pipelines_controller_pipelinerun_count metric counter increases without having any pipeline executed #4397

tekton_pipelines_controller_pipelinerun_count metric counter increases without having any pipeline executed #4397

Comments

gmeghnag commented Nov 26, 2021

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem on OpenShift

Additional Info

gmeghnag commented Nov 26, 2021

guillaumerose commented Nov 29, 2021

khrm commented Jan 12, 2022

tekton-robot commented May 8, 2022

lbernick commented May 9, 2022

`tekton_pipelines_controller_pipelinerun_count` metric counter increases without having any pipeline executed #4397

`tekton_pipelines_controller_pipelinerun_count` metric counter increases without having any pipeline executed #4397