KEP-3077: contextual logging: promotion to beta in 1.30 #4219

pohly · 2023-09-18T13:22:22Z

One-line PR description: update for to reflect changes in the ecosystem
Issue link: contextual logging #3077
Other comments: some of the PRs linked here are not merged yet because support for slog is still new; they are expected to be merged before 1.29

pohly · 2023-09-18T13:30:53Z

keps/sig-instrumentation/3077-contextual-logging/README.md

@@ -870,7 +912,7 @@ logging.

 #### Beta

- All of kube-scheduler (in-tree) and CSI external-provisioner (out-of-tree) converted
+- [All of kube-controller-manager](https://github.com/kubernetes/kubernetes/pull/119250) and some [parts of kube-scheduler](https://github.com/kubernetes/kubernetes/pull/115588) converted (in-tree), conversion of out-of-tree components possible, whether they use pflag ([external-provisioner](https://github.com/kubernetes-csi/external-provisioner/pull/639)] or plain Go flags ([node-driver-registrar](https://github.com/kubernetes-csi/node-driver-registrar/pull/259))


We ended up focusing on kube-controller-manager instead of kube-scheduler because that work was easier to split up. Conversion of external components is technically feasible, but was delayed by lack of developer time. Contributors are willing to take on that part now and have already started (1, 2).

Regarding the next points (cannot comment on it in GitHub):

Gathered feedback from developers and surveys

There has been steady interest in this, as seen from questions on Slack and good attendance of the KubeCon maintainer track sessions about logging. We've not done a survey, and I am not sure what exactly I would ask in it. May I perhaps remove that part?

New APIs in k8s.io/klog/v2 no longer marked as experimental

EDIT: Done in klog update for 1.29.

Just sharing some experience from using contextual logging with Cluster API.

In Cluster API we use component-base/logs to setup the log flags and the text/JSON loggers. Then we pass in the logger from klog.Background() into controller-runtime.

controller-runtime ensures that this logger (+ a few additional k/v pairs) is available in all contexts passed in our controllers and webhooks. In the controllers/webhooks we use ~ go-logr/logr.FromContext/NewContext to pass around the logger (controller-runtime has a thin wrapper around these funcs).
In Cluster API contextual logging is always enabled. Because we're using the CR FromContext/NewContext wrappers there is also no way to turn it off.

So we use large parts of the implementation and concepts of upstream Kubernetes contextual logging and align as much as we can to upstream conventions.

We are using this since a while in Cluster API and for us it worked pretty well without any issues.

Do you have any estimates around how much work is left in kube-scheduler? Also what about kubelet and kube-apiserver and cloud-controller-manager? I'd like to see explicit information about each at beta.

I'm tracking that in the description of #3077.

Shall I copy a snapshot of that information into the KEP?

kube-scheduler and kube-controller-manager are done and only need to be touch again when we make API changes elsewhere, for example in component-base or client-go.

kubelet has around 2000 calls which need to be updated, kube-proxy 400, kube-apiserver 660.

I've not checked cmd/cloud-controller-manager, but it seems to be small.

That's fine, I didn't look there. No need to put that information in the KEP in that case.

dashpole · 2023-09-21T16:44:04Z

/assign @logicalhan

pohly · 2023-10-05T17:49:55Z

This PR is no longer targeting beta in 1.29. Let's try instead for 1.30.

Not sure whether it needs to be merged before the freeze. We can also hold it until some of the pending slog PRs are merged and then merge it as a documentation update.

pohly · 2023-11-13T09:52:29Z

Let's refocus on promotion to beta, with on by default, in 1.30. In this comment I am going to check the usefulness and performance overhead of enabling contextual logging based on current master (= 5ce0bd9, shortly before v1.29.0-alpha.3).

First, use-cases. kube-controller-manager and kube-scheduler are converted to contextual logging and use WithName and WithValues to generate more useful logs.

kube-scheduler:

I1113 08:43:37.029524 87144 default_binder.go:53] "Attempting to bind pod to node" logger="Bind.DefaultBinder" pod="kube-system/coredns-69cbfb9798-ms4pq" node="127.0.0.1"

The immediate benefit is that the operation and plugin are visible in logger. pod and node already get logged as parameters in individual log calls in kube-scheduler code. Once contextual logging is supported by more packages outside of kube-scheduler, they will also be visible there (for example, client-go). Once it is GA, log calls can be simplified.

kube-controller-manager:

I1113 08:43:29.284360 87141 graph_builder.go:285] "garbage controller monitor not synced: no monitors" logger="garbage-collector-controller"

The logger=”garbage-collector-controller” was added by the kube-controller-manager core when instantiating that controller and appears in all log entries of it, as long as the code being called supports contextual logging.

Second, performance. I am using test/integration/scheduler_perf to run go test -bench comparisons with benchstat:

go test -timeout=0 -count=5 '-bench=BenchmarkPerfScheduling'  -benchtime=1ns -run=xxx ./test/integration/scheduler_perf -args -perf-scheduling-label-filter= -v=3 -feature-gate=ContextualLogging=false

That benchmark is completely CPU-bound as log output gets buffered in memory by go test, so any impact of additional code will be very visible. When using WithName too much, results got worse, therefore we toned down the instrumentation a bit to avoid performance regressions.

Most test cases in test/integration/scheduler_perf are relatively stable. I used -count=10 initially, but running everything that often took most of a day, so I switched to -count=5. My test system is a Skylake workstation, with no UI and thus little background activity.

When doing that comparison with kube-scheduler code which supports contextual logging (i.e. uses FromContext) but has no WithName and no WithValues, there was no significant difference. Significant overhead only occurs when calling WithValues and NewContext for call chains which are very fast (for example, plugin calls which don't do anything), because both calls must allocate memory. The use of WithName in kube-controller-manager is no problem because it's only done once during controller instantiation.

kube-scheduler is more interesting. The goal was to cause no performance regression at -v3 or lower. Higher log levels are for debugging and slowdowns become acceptable, in particular when the resulting log output is then more useful for debugging (as it is, in the example above).

At -v3, benchstat reports no significant difference in most test cases, so that goal has been achieved. Some even ran a bit faster (see benchstat-v3.log).

For higher log levels, let's look at -v5 and SchedulingBasic/5000Nodes (simple pods, the most common case) and SchedulingCSIPVs/5000Nodes (produces more output in the volume binding plugin). SchedulingBasic throughput is 27% lower. There's no impact on SchedulingCSIPVs. See
benchstat-v5.log.

logicalhan · 2023-11-13T19:47:12Z

Let's refocus on promotion to beta, with on by default, in 1.30. In this comment I am going to check the usefulness and performance overhead of enabling contextual logging based on current master (= 5ce0bd9, shortly before v1.29.0-alpha.3).

First, use cases. kube-controller-manager and kube-scheduler are converted to contextual logging and use WithName and WithValues to generate more useful logs.

kube-scheduler:

I1113 08:43:37.029524 87144 default_binder.go:53] "Attempting to bind pod to node" logger="Bind.DefaultBinder" pod="kube-system/coredns-69cbfb9798-ms4pq" node="127.0.0.1"

The immediate benefit is that the operation and plugin are visible in logger. pod and node already get logged as parameters in individual log calls in kube-scheduler code. Once contextual logging is supported by more packages outside of kube-scheduler, they will also be visible there (for example, client-go). Once it is GA, log calls can be simplified.

kube-controller-manager:

I1113 08:43:29.284360 87141 graph_builder.go:285] "garbage controller monitor not synced: no monitors" logger="garbage-collector-controller"

The logger=”garbage-collector-controller” was added by the kube-controller-manager core when instantiating that controller and appears in all log entries of it, as long as the code being called supports contextual logging.

Second, performance. I am using test/integration/scheduler_perf to run go test -bench comparisons with benchstat:
go test -timeout=0 -count=5 '-bench=BenchmarkPerfScheduling'  -benchtime=1ns -run=xxx ./test/integration/scheduler_perf -args -perf-scheduling-label-filter= -v=3 -feature-gate=ContextualLogging=false
That benchmark is completely CPU-bound as log output gets buffered in memory by go test, so any impact of additional code will be very visible. When using WithName too much, results got worse, therefore we toned down the instrumentation a bit to avoid performance regressions.

Most test cases in test/integration/scheduler_perf are relatively stable. I used -count=10 initially, but running everything that often took most of a day, so I switched to -count=5. My test system is a Skylake workstation, with no UI and thus little background activity.

When doing that comparison with kube-scheduler code which supports contextual logging (i.e. uses FromContext) but has no WithName and no WithValues, there was no significant difference. Significant overhead only occurs when calling WithValues and NewContext for call chains which are very fast (for example, plugin calls which don't do anything), because both calls must allocate memory. The use of WithName in kube-controller-manager is no problem because it's only done once during controller instantiation.

kube-scheduler is more interesting. The goal was to cause no performance regression at -v3 or lower. Higher log levels are for debugging and slowdowns become acceptable, in particular when the resulting log output is then more useful for debugging (as it is, in the example above).

At -v3, benchstat reports no significant difference in most test cases, so that goal has been achieved. Some even ran a bit faster (see benchstat-v3.log).

For higher log levels, let's look at -v5 and SchedulingBasic/5000Nodes (simple pods, the most common case) and SchedulingCSIPVs/5000Nodes (produces more output in the volume binding plugin). SchedulingBasic throughput is 27% lower. There's no impact on SchedulingCSIPVs. See benchstat-v5.log.

Thanks for doing this analysis!

pohly · 2023-11-16T18:15:10Z

@logicalhan: are you now okay with promotion to beta in 1.30 as described here?

On chat we had also discussed doing some kind of scalability comparison. I asked about that in #sig-scalability on Slack and got zero responses - it looks like interest and bandwidth for such an experiment are low. Instead, my preference would be to promote the feature early in the 1.30 cycle, watch the normal scalability tests for a regresssion, then back it out again if there is one. That'll be simpler.

logicalhan · 2023-11-17T16:58:36Z

@logicalhan: are you now okay with promotion to beta in 1.30 as described here?

On chat we had also discussed doing some kind of scalability comparison. I asked about that in #sig-scalability on Slack and got zero responses - it looks like interest and bandwidth for such an experiment are low. Instead, my preference would be to promote the feature early in the 1.30 cycle, watch the normal scalability tests for a regresssion, then back it out again if there is one. That'll be simpler.

Yeah I'm okay with that plan. Thanks for doing the due diligence!

logicalhan · 2023-11-20T14:06:31Z

/milestone v1.30

As discussed in kubernetes/enhancements#4219 (comment) benchstat reports no significant difference in most test cases for scheduler_perf for -v3. At -v5, there is some slowdown, but that is justified because the output becomes more useful for debugging. Enabling the feature by default ensures that log output becomes better regardless whether users know and remember to enable it.

logicalhan

/lgtm
/approve

pohly · 2024-01-18T18:52:16Z

keps/prod-readiness/sig-instrumentation/3077.yaml

@@ -4,3 +4,5 @@
 kep-number: 3077
 alpha:
  approver: "@ehashman"
+beta:
+  approver: "@logicalhan"


@johnbelamaric: should (may?) I list you here?

I picked @logicalhan as replacement for ehashman because both are SIG Instrumentation, but now I noticed that this should be a PRR reviewer.

@logicalhan added LGTM on behalf of the SIG. This PR is now ready for PRR approval.

/assign @johnbelamaric

soltysh

#prr-shadow
Several questions from PRR shadow, before pulling in PRR approver.

soltysh · 2024-01-23T12:23:57Z

keps/sig-instrumentation/3077-contextual-logging/README.md

+logger.Info("Done", "pod", klog.KObj(pod))
+```
+
+Starting with beta, the feature gate will be enabled and code can be written


Does this require another re-write during beta, or at this point in time we'll let it be, just in case someone decides to turn the feature off, thus falling back to previous mechanism, which as described above requires that duplication?

I need to update this section. We decided to keep the duplication as long as the feature still can be turned off, which is the opposite of what I was proposing here.

soltysh · 2024-01-23T12:32:12Z

keps/sig-instrumentation/3077-contextual-logging/README.md

@@ -870,7 +912,7 @@ logging.

 #### Beta

- All of kube-scheduler (in-tree) and CSI external-provisioner (out-of-tree) converted
+- [All of kube-controller-manager](https://github.com/kubernetes/kubernetes/pull/119250) and some [parts of kube-scheduler](https://github.com/kubernetes/kubernetes/pull/115588) converted (in-tree), conversion of out-of-tree components possible, whether they use pflag ([external-provisioner](https://github.com/kubernetes-csi/external-provisioner/pull/639)] or plain Go flags ([node-driver-registrar](https://github.com/kubernetes-csi/node-driver-registrar/pull/259))


Do you have any estimates around how much work is left in kube-scheduler? Also what about kubelet and kube-apiserver and cloud-controller-manager? I'd like to see explicit information about each at beta.

soltysh · 2024-01-23T12:42:15Z

keps/sig-instrumentation/3077-contextual-logging/README.md

@@ -1037,6 +1079,11 @@ Revert commits that changed log calls.

 ## Implementation History

+* Kubernetes 1.24: initial alpha


I can't comment on not modified text, so a few questions. Feature Enablement and Rollback and Version Skew Strategy questions, both of these are gaining in importance when the feature is enabled by default.

What will happen if a user starts to use new flags/options which start logging to json and rolls the component back? Are there any new flags for setting up different log output at all?

You mentioned in the risk section Performance overhead risk how a cluster administrator can notice that? Was this measured as part of this effort to point out which components are affected more and which less with that impact? You mentioned flat 2% but in the risk section you called out that there will be places where this might be more or less, that's why I'm asking about per-component data.

Pod scheduling (= "startup latency of schedulable stateless pods" SLI) might become slightly worse - have you measured it by how much? This is similar to the above question.

Failure modes, in the risks section you described Uninitialized logger, is there a way we can prevent that failure from happening in CI?

In What steps should be taken if SLOs are not being met to determine the problem? I doubt the cluster operator can revert commits, you need to define this more in terms of a cluster admin rollback to previous working version.

What will happen if a user starts to use new flags/options which start logging to json and rolls the component back?

JSON output is not part of this feature. That's structured logging.

Are there any new flags for setting up different log output at all?

This feature here has no separate command line flags or config options.

Was this measured as part of this effort to point out which components are affected more and which less with that impact?

We only have performance benchmarks for kube-scheduler. See #4219 (comment) for an evaluation of that.

have you measured it by how much? This is similar to the above question.

I observed no relevant difference in the kube-scheduler benchmark.

Failure modes, in the risks section you described Uninitialized logger, is there a way we can prevent that failure from happening in CI?

This is mostly a non-issue because the Go API was designed such that all code which asks for a logr.Logger gets one. A Logger value is never uninitialized. A pointer might be nil, but this is rarely used.

To catch mistakes in the CI, the modified code must be executed.

I doubt the cluster operator can revert commits, you need to define this more in terms of a cluster admin rollback to previous working version.

True. Let me double-check what I wrote there.

soltysh · 2024-01-23T12:43:26Z

keps/sig-instrumentation/3077-contextual-logging/README.md

+* Kubernetes 1.27: parts of kube-controller-manager converted
+* Kubernetes 1.28: kube-controller-manager converted completely, relationship
+  with log/slog in Go 1.21 clarified
+


What about 1.30?

We finished the kube-scheduler in 1.29. Let me add that.

For 1.30, we are focusing on client-go. We had a volunteer for kubelet and @tallclair wanted to do something with the apiserver, but it's unclear whether both will happen.

It shouldn't be in "Implementation History" anyway, should it? It's not history, it's the future...

@soltysh: I added a "Status and next steps" section. I hope that also addresses your question about "estimates around how much work is left" - can you take another look?

We need to be more conservative than originally planned and continue to produce the same informative log output as before when contextual logging is disabled. To help reviewers understand the scope of the work, a new "Status and next steps" summarizes where we are with the conversion.

soltysh

#prr-shadow
prr lgtm
/assign @johnbelamaric

johnbelamaric · 2024-02-06T21:03:55Z

👍

/approve

k8s-ci-robot · 2024-02-06T21:04:04Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: johnbelamaric, logicalhan, pohly, soltysh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~keps/prod-readiness/OWNERS~~ [johnbelamaric]
~~keps/sig-instrumentation/OWNERS~~ [johnbelamaric,logicalhan]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

pohly · 2024-02-07T07:04:07Z

@logicalhan or @johnbelamaric: can you add back the LGTM?

The only changes since the LGTM from @logicalhan are these: https://github.com/kubernetes/enhancements/compare/66b45aa076bbf64a93cabf00d4e81c8b8fc086b5..b83cdb208f6fff7457d6a480ecd5ecb46f8a24a3

soltysh · 2024-02-07T10:26:02Z

Re-applying lgtm based on @logicalhan's comment #4219 (review)

/lgtm

As discussed in kubernetes/enhancements#4219 (comment) benchstat reports no significant difference in most test cases for scheduler_perf for -v3. At -v5, there is some slowdown, but that is justified because the output becomes more useful for debugging. Enabling the feature by default ensures that log output becomes better regardless whether users know and remember to enable it. Kubernetes-commit: 9c6b42af46ed9f541fefa65a8a1e175ecd0a7c34

As discussed in kubernetes/enhancements#4219 (comment) benchstat reports no significant difference in most test cases for scheduler_perf for -v3. At -v5, there is some slowdown, but that is justified because the output becomes more useful for debugging. Enabling the feature by default ensures that log output becomes better regardless whether users know and remember to enable it.

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. labels Sep 18, 2023

k8s-ci-robot requested review from dgrisonnet and logicalhan September 18, 2023 13:22

k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Sep 18, 2023

pohly commented Sep 18, 2023

View reviewed changes

pohly force-pushed the contextual-logging branch from 1d3bd05 to 9795382 Compare September 18, 2023 14:59

k8s-ci-robot assigned logicalhan Sep 21, 2023

pohly mentioned this pull request Sep 30, 2023

k8s.io/kms: drop direct dependency on klog kubernetes/kubernetes#120947

Merged

pohly force-pushed the contextual-logging branch from 9795382 to 9e4fc0d Compare October 5, 2023 17:47

pohly changed the title ~~KEP-3077: contextual logging: bump to beta~~ KEP-3077: contextual logging: slog update for 1.29 Oct 5, 2023

pohly mentioned this pull request Oct 25, 2023

[RFC] Contextual logging in the apiserver POC kubernetes/kubernetes#114198

Closed

pohly changed the title ~~KEP-3077: contextual logging: slog update for 1.29~~ KEP-3077: contextual logging: promotion to beta in 1.30 Nov 13, 2023

contextual logging: bump to beta

66b45aa

pohly force-pushed the contextual-logging branch from 9e4fc0d to 66b45aa Compare November 13, 2023 12:17

pohly mentioned this pull request Nov 20, 2023

contextual logging #3077

Open

12 tasks

k8s-ci-robot added this to the v1.30 milestone Nov 20, 2023

logicalhan added the lead-opted-in Denotes that an issue has been opted in to a release label Nov 20, 2023

pohly mentioned this pull request Jan 4, 2024

promote contextual logging to beta, enabled by default kubernetes/kubernetes#122589

Merged

logicalhan approved these changes Jan 18, 2024

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 18, 2024

pohly commented Jan 18, 2024

View reviewed changes

k8s-ci-robot assigned johnbelamaric Jan 18, 2024

soltysh reviewed Jan 23, 2024

View reviewed changes

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 24, 2024

pohly force-pushed the contextual-logging branch from a84b9ff to b83cdb2 Compare January 24, 2024 09:21

soltysh approved these changes Jan 26, 2024

View reviewed changes

contextual logging: fix PRR approver

3b20681

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 6, 2024

k8s-ci-robot assigned soltysh Feb 7, 2024

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 7, 2024

k8s-ci-robot merged commit 97465c4 into kubernetes:master Feb 7, 2024
4 checks passed

soltysh mentioned this pull request Apr 3, 2024

Add soltysh to prod-readiness-approvers #4566

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KEP-3077: contextual logging: promotion to beta in 1.30 #4219

KEP-3077: contextual logging: promotion to beta in 1.30 #4219

pohly commented Sep 18, 2023 •

edited

Loading

pohly Sep 18, 2023 •

edited

Loading

sbueringer Sep 18, 2023

soltysh Jan 23, 2024

pohly Jan 23, 2024

soltysh Jan 26, 2024

dashpole commented Sep 21, 2023

pohly commented Oct 5, 2023

pohly commented Nov 13, 2023 •

edited

Loading

logicalhan commented Nov 13, 2023

pohly commented Nov 16, 2023

logicalhan commented Nov 17, 2023

logicalhan commented Nov 20, 2023

logicalhan left a comment

pohly Jan 18, 2024

soltysh left a comment

soltysh Jan 23, 2024

pohly Jan 23, 2024

soltysh Jan 23, 2024

soltysh Jan 23, 2024

pohly Jan 23, 2024

soltysh Jan 23, 2024

pohly Jan 23, 2024

pohly Jan 24, 2024

soltysh left a comment

johnbelamaric commented Feb 6, 2024

k8s-ci-robot commented Feb 6, 2024

pohly commented Feb 7, 2024

soltysh commented Feb 7, 2024

		@@ -1037,6 +1079,11 @@ Revert commits that changed log calls.

		## Implementation History

		* Kubernetes 1.24: initial alpha

KEP-3077: contextual logging: promotion to beta in 1.30 #4219

KEP-3077: contextual logging: promotion to beta in 1.30 #4219

Conversation

pohly commented Sep 18, 2023 • edited Loading

pohly Sep 18, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dashpole commented Sep 21, 2023

pohly commented Oct 5, 2023

pohly commented Nov 13, 2023 • edited Loading

logicalhan commented Nov 13, 2023

pohly commented Nov 16, 2023

logicalhan commented Nov 17, 2023

logicalhan commented Nov 20, 2023

logicalhan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

soltysh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

soltysh left a comment

Choose a reason for hiding this comment

johnbelamaric commented Feb 6, 2024

k8s-ci-robot commented Feb 6, 2024

pohly commented Feb 7, 2024

soltysh commented Feb 7, 2024

pohly commented Sep 18, 2023 •

edited

Loading

pohly Sep 18, 2023 •

edited

Loading

pohly commented Nov 13, 2023 •

edited

Loading