adds WatchList Latency to APIResponsivenessPrometheus #2764

p0lyn0mial · 2024-07-08T14:24:16Z

What type of PR is this?

/kind feature

What this PR does / why we need it:

WatchList latency is gathered for 50th, 90th and 99th duration quantiles for watch list requests broken down by group, resource, scope.

The new metric (kubernetes/kubernetes#120490) allows for comparing watch-list requests with standard list requests and measuring performance of the new requests in general.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

wojtek-t

Small comments - overall it looks the way I wanted it to do.

wojtek-t · 2024-07-26T12:03:47Z

clusterloader2/pkg/measurement/common/slos/api_responsiveness_prometheus.go

+	watchListLatencyMetricName = "apiserver_watch_list_duration_seconds"
+	// watchListLatencyQuery placeholders must be replaced with (1) quantile (2) query window size
+	watchListLatencyQuery = "histogram_quantile(%.2f, sum(rate(%v_bucket{}[%v])) by (group, version, resource, scope, le))"
+


Looking into test results:
https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/perf-tests/2764/pull-perf-tests-clusterloader2/1810319004401668096/artifacts/APIResponsivenessPrometheus_simple_load_2024-07-08T14:50:01Z.json

I see only one entry for watchlist (pod list on namespace scope).

Is that expected? What is issueing this request?

good question.
do we have metrics so that i could execute the prom query manually ?
could it be that the run was using a server with enabled watchlist feature ?

could it be that the run was using a server with enabled watchlist feature ?

it looks like this was our case, the most recent run doesn't have any entries of watchlist (the feature was turned off on the server some time ago)

https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/perf-tests/2764/pull-perf-tests-clusterloader2/1818234814688399360/artifacts/APIResponsivenessPrometheus_simple_load_2024-07-30T11:06:18Z.json

So I was wondering why we've seen only one such entry, but I guess it's because it's disabled in client-go by default (and was enabled only in KCM). So that makes sense.

wojtek-t · 2024-07-26T12:04:06Z

clusterloader2/pkg/measurement/common/slos/api_responsiveness_prometheus_test.go

 }

 func (ex *fakeQueryExecutor) Query(query string, _ time.Time) ([]*model.Sample, error) {
+


nit: remove empty line

clusterloader2/pkg/measurement/common/slos/api_responsiveness_prometheus_test.go

p0lyn0mial · 2024-07-30T10:38:32Z

/test perf-tests-clusterloader2

k8s-ci-robot · 2024-07-30T10:38:35Z

@p0lyn0mial: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

/test pull-perf-tests-benchmark-kube-dns
/test pull-perf-tests-clusterloader2
/test pull-perf-tests-clusterloader2-e2e-gce-scale-performance-manual
/test pull-perf-tests-clusterloader2-kubemark
/test pull-perf-tests-util-images
/test pull-perf-tests-verify-all-python
/test pull-perf-tests-verify-dashboard
/test pull-perf-tests-verify-lint
/test pull-perf-tests-verify-test

The following commands are available to trigger optional jobs:

/test pull-perf-tests-100-adhoc
/test pull-scheduler-perf
/test soak-tests-capz-windows-2019

Use /test all to run the following jobs that were automatically triggered:

pull-perf-tests-clusterloader2
pull-perf-tests-clusterloader2-kubemark
pull-perf-tests-verify-all-python
pull-perf-tests-verify-lint
pull-perf-tests-verify-test

In response to this:

/test perf-tests-clusterloader2

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

p0lyn0mial · 2024-07-30T10:38:53Z

/test pull-perf-tests-clusterloader2

…r metric

…enessPrometheus

wojtek-t · 2024-07-30T14:06:13Z

/lgtm
/approve

k8s-ci-robot · 2024-07-30T14:06:32Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: p0lyn0mial, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~clusterloader2/OWNERS~~ [wojtek-t]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Jul 8, 2024

k8s-ci-robot requested review from mborsz and wojtek-t July 8, 2024 14:24

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 8, 2024

p0lyn0mial mentioned this pull request Jul 8, 2024

introduce WatchListLatencyPrometheus measurement #2315

Closed

wojtek-t reviewed Jul 26, 2024

View reviewed changes

wojtek-t self-assigned this Jul 26, 2024

p0lyn0mial added 2 commits July 30, 2024 13:01

api_responsiveness_prometheus_test: fakeQueryExecutor gets samples pe…

8e98b80

…r metric

api_responsiveness_prometheus: adds WatchList Latency to APIResponsiv…

bead663

…enessPrometheus

p0lyn0mial force-pushed the upstream-api-responsiveness-watch-list branch from 71544a3 to bead663 Compare July 30, 2024 11:33

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 30, 2024

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 30, 2024

k8s-ci-robot merged commit c76b37d into kubernetes:master Jul 30, 2024
7 checks passed

p0lyn0mial mentioned this pull request Aug 20, 2024

config: add a new measurement to watchlist job type #2316

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adds WatchList Latency to APIResponsivenessPrometheus #2764

adds WatchList Latency to APIResponsivenessPrometheus #2764

p0lyn0mial commented Jul 8, 2024

wojtek-t left a comment

wojtek-t Jul 26, 2024

p0lyn0mial Jul 30, 2024

p0lyn0mial Jul 30, 2024

wojtek-t Jul 30, 2024

wojtek-t Jul 26, 2024

p0lyn0mial commented Jul 30, 2024

k8s-ci-robot commented Jul 30, 2024

p0lyn0mial commented Jul 30, 2024

wojtek-t commented Jul 30, 2024

k8s-ci-robot commented Jul 30, 2024

		}

		func (ex fakeQueryExecutor) Query(query string, _ time.Time) ([]model.Sample, error) {

adds WatchList Latency to APIResponsivenessPrometheus #2764

adds WatchList Latency to APIResponsivenessPrometheus #2764

Conversation

p0lyn0mial commented Jul 8, 2024

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

wojtek-t left a comment

Choose a reason for hiding this comment

wojtek-t Jul 26, 2024

Choose a reason for hiding this comment

p0lyn0mial Jul 30, 2024

Choose a reason for hiding this comment

p0lyn0mial Jul 30, 2024

Choose a reason for hiding this comment

wojtek-t Jul 30, 2024

Choose a reason for hiding this comment

wojtek-t Jul 26, 2024

Choose a reason for hiding this comment

p0lyn0mial commented Jul 30, 2024

k8s-ci-robot commented Jul 30, 2024

p0lyn0mial commented Jul 30, 2024

wojtek-t commented Jul 30, 2024

k8s-ci-robot commented Jul 30, 2024