-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-3077: contextual logging: promotion to beta in 1.30 #4219
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,3 +4,5 @@ | |
kep-number: 3077 | ||
alpha: | ||
approver: "@ehashman" | ||
beta: | ||
approver: "@johnbelamaric" |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -37,6 +37,7 @@ | |
- [Unit testing](#unit-testing) | ||
- [Injecting common value, logger passed through existing ctx parameter or new parameter](#injecting-common-value-logger-passed-through-existing-ctx-parameter-or-new-parameter) | ||
- [Resulting output](#resulting-output) | ||
- [Integration with log/slog](#integration-with-logslog) | ||
- [Test Plan](#test-plan) | ||
- [Graduation Criteria](#graduation-criteria) | ||
- [Alpha](#alpha) | ||
|
@@ -52,12 +53,14 @@ | |
- [Scalability](#scalability) | ||
- [Troubleshooting](#troubleshooting) | ||
- [Implementation History](#implementation-history) | ||
- [Status and next steps](#status-and-next-steps) | ||
- [Drawbacks](#drawbacks) | ||
- [Alternatives](#alternatives) | ||
- [Per-component logger](#per-component-logger) | ||
- [Propagating a logger to init code](#propagating-a-logger-to-init-code) | ||
- [Panic when FromContext is called before setting a logger](#panic-when-fromcontext-is-called-before-setting-a-logger) | ||
- [Clean separation of contextual logging and traditional klog logging](#clean-separation-of-contextual-logging-and-traditional-klog-logging) | ||
- [Use log/slog instead of klog+logr](#use-logslog-instead-of-kloglogr) | ||
<!-- /toc --> | ||
|
||
## Release Signoff Checklist | ||
|
@@ -507,6 +510,36 @@ The logcheck static code analysis tool will warn about code in Kubernetes which | |
calls the underlying functions directly. Once the feature gate is no longer needed, | ||
a global search/replace can remove the usage of these wrapper functions again. | ||
|
||
Because the feature gate is off during alpha, log calls have to repeat | ||
important key/value pairs even if those also got passed to `WithValues`: | ||
|
||
``` | ||
logger := logger.WithValues("pod", klog.KObj(pod)) | ||
... | ||
logger.Info("Processing", "pod", klog.KObj(pod)) | ||
... | ||
logger.Info("Done", "pod", klog.KObj(pod)) | ||
``` | ||
|
||
Starting with GA, the feature will always be enabled and code can be written | ||
without such duplication: | ||
|
||
``` | ||
logger := logger.WithValues("pod", klog.KObj(pod)) | ||
... | ||
logger.Info("Processing") | ||
... | ||
logger.Info("Done") | ||
``` | ||
|
||
Documentation of APIs has to make it clear which values will always be included | ||
in log entries and thus don't need to be repeated. If in doubt, repeating them | ||
is okay: the text format will filter out duplicates if log call parameters | ||
overlap with `WithValues` parameters. For performance reasons it will not do | ||
that for duplicates between different `WithValues` calls. In JSON, repeating | ||
keys increases log volume size because there is no de-duplication, but the | ||
semantic is the same ("most recent wins"). | ||
|
||
### Text format | ||
|
||
The formatting and verbosity code will be moved into `internal` packages where | ||
|
@@ -839,6 +872,15 @@ I1026 16:21:00.461886 801139 scheduler.go:464] "Status after running PostFilter | |
I1026 16:21:00.461918 801139 factory.go:209] "Unable to schedule pod; no fit; waiting" pod="default/my-csi-app-inline-volume" err="0/1 nodes are available: 1 node(s) did not have enough free storage." | ||
``` | ||
|
||
### Integration with log/slog | ||
|
||
[`log/slog`](https://pkg.go.dev/log/slog) got added in Go | ||
1.21. Interoperability with slog is [provided by | ||
logr](https://github.com/go-logr/logr/pull/222). Applications which use slog | ||
can route log output from Kubernetes packages into their `slog.Handler` and | ||
vice versa, as demonstrated with [`component-base/logs` | ||
examples](https://github.com/kubernetes/kubernetes/pull/120696). | ||
|
||
### Test Plan | ||
|
||
The new code will be covered by unit tests that execute as part of | ||
|
@@ -870,7 +912,7 @@ logging. | |
|
||
#### Beta | ||
|
||
- All of kube-scheduler (in-tree) and CSI external-provisioner (out-of-tree) converted | ||
- [All of kube-controller-manager](https://github.com/kubernetes/kubernetes/pull/119250) and some [parts of kube-scheduler](https://github.com/kubernetes/kubernetes/pull/115588) converted (in-tree), conversion of out-of-tree components possible, whether they use pflag ([external-provisioner](https://github.com/kubernetes-csi/external-provisioner/pull/639)] or plain Go flags ([node-driver-registrar](https://github.com/kubernetes-csi/node-driver-registrar/pull/259)) | ||
- Gathered feedback from developers and surveys | ||
- New APIs in `k8s.io/klog/v2` no longer marked as experimental | ||
|
||
|
@@ -1033,10 +1075,131 @@ None besides bugs that could cause a program to panic (null logger). | |
|
||
###### What steps should be taken if SLOs are not being met to determine the problem? | ||
|
||
Revert commits that changed log calls. | ||
A cluster operator can disable the feature via the feature gate. | ||
|
||
Kubernetes developers can revert individual commits that changed log calls once | ||
it has been determined that they introduce too much overhead. | ||
|
||
## Implementation History | ||
|
||
* Kubernetes 1.24: initial alpha | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can't comment on not modified text, so a few questions.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
JSON output is not part of this feature. That's structured logging.
This feature here has no separate command line flags or config options.
We only have performance benchmarks for kube-scheduler. See #4219 (comment) for an evaluation of that.
I observed no relevant difference in the kube-scheduler benchmark.
This is mostly a non-issue because the Go API was designed such that all code which asks for a To catch mistakes in the CI, the modified code must be executed.
True. Let me double-check what I wrote there. |
||
* Kubernetes 1.27: parts of kube-controller-manager converted | ||
* Kubernetes 1.28: kube-controller-manager converted completely, relationship | ||
with log/slog in Go 1.21 clarified | ||
* Kubernetes 1.29: kube-scheduler converted completely | ||
|
||
## Status and next steps | ||
|
||
As of Kubernetes 1.29.1, kube-controller-manager and kube-scheduler have been | ||
converted. The logcheck tool can be used to count remaining log calls that need | ||
to be converted: | ||
|
||
``` | ||
go install sigs.k8s.io/logtools/logcheck@latest | ||
|
||
echo "Component | Non-Structured Logging | Non-Contextual Logging " && echo "------ | ------- | -------" && for i in $(find pkg/* cmd/* staging/src/k8s.io/* -maxdepth 0 -type d | sort); do echo "$i | $(cd $i; ${GOPATH}/bin/logcheck -check-structured -check-deprecations=false 2>&1 ./... | wc -l ) | $(cd $i; ${GOPATH}/bin/logcheck -check-structured -check-deprecations=false -check-contextual ./... 2>&1 | wc -l )"; done | ||
``` | ||
|
||
Note that this also counts calls where it was decided to not convert them. The | ||
actual check with golangci-lint ignores those because of a `//nolint:logcheck` | ||
suppression comment. | ||
|
||
Component | Non-Structured Logging | Non-Contextual Logging | ||
------ | ------- | ------- | ||
cmd/clicheck | 0 | 0 | ||
cmd/cloud-controller-manager | 6 | 8 | ||
cmd/dependencycheck | 0 | 0 | ||
cmd/dependencyverifier | 0 | 0 | ||
cmd/fieldnamedocscheck | 1 | 1 | ||
cmd/gendocs | 0 | 0 | ||
cmd/genkubedocs | 0 | 0 | ||
cmd/genman | 0 | 0 | ||
cmd/genswaggertypedocs | 2 | 2 | ||
cmd/genutils | 0 | 0 | ||
cmd/genyaml | 0 | 0 | ||
cmd/gotemplate | 0 | 0 | ||
cmd/importverifier | 0 | 0 | ||
cmd/kubeadm | 264 | 463 | ||
cmd/kube-apiserver | 6 | 7 | ||
cmd/kube-controller-manager | 0 | 0 | ||
cmd/kubectl | 0 | 0 | ||
cmd/kubectl-convert | 0 | 0 | ||
cmd/kubelet | 0 | 52 | ||
cmd/kubemark | 1 | 1 | ||
cmd/kube-proxy | 0 | 42 | ||
cmd/kube-scheduler | 0 | 0 | ||
cmd/preferredimports | 0 | 0 | ||
cmd/prune-junit-xml | 0 | 0 | ||
cmd/yamlfmt | 0 | 0 | ||
pkg/api | 0 | 0 | ||
pkg/apis | 0 | 0 | ||
pkg/auth | 1 | 1 | ||
pkg/capabilities | 0 | 0 | ||
pkg/client | 0 | 0 | ||
pkg/cloudprovider | 0 | 0 | ||
pkg/cluster | 0 | 0 | ||
pkg/controller | 0 | 3 | ||
pkg/controlplane | 53 | 69 | ||
pkg/credentialprovider | 48 | 77 | ||
pkg/features | 0 | 0 | ||
pkg/fieldpath | 0 | 0 | ||
pkg/generated | 0 | 0 | ||
pkg/kubeapiserver | 4 | 4 | ||
pkg/kubectl | 1 | 2 | ||
pkg/kubelet | 2 | 1983 | ||
pkg/kubemark | 7 | 7 | ||
pkg/printers | 0 | 0 | ||
pkg/probe | 7 | 24 | ||
pkg/proxy | 0 | 360 | ||
pkg/quota | 0 | 0 | ||
pkg/registry | 46 | 99 | ||
pkg/routes | 2 | 2 | ||
pkg/scheduler | 0 | 0 | ||
pkg/security | 0 | 0 | ||
pkg/securitycontext | 0 | 0 | ||
pkg/serviceaccount | 25 | 44 | ||
pkg/util | 20 | 57 | ||
pkg/volume | 704 | 1110 | ||
pkg/windows | 1 | 1 | ||
staging/src/k8s.io/api | 0 | 0 | ||
staging/src/k8s.io/apiextensions-apiserver | 58 | 89 | ||
staging/src/k8s.io/apimachinery | 80 | 125 | ||
staging/src/k8s.io/apiserver | 285 | 655 | ||
staging/src/k8s.io/client-go | 163 | 283 | ||
staging/src/k8s.io/cli-runtime | 1 | 2 | ||
staging/src/k8s.io/cloud-provider | 122 | 162 | ||
staging/src/k8s.io/cluster-bootstrap | 2 | 4 | ||
staging/src/k8s.io/code-generator | 108 | 155 | ||
staging/src/k8s.io/component-base | 33 | 64 | ||
staging/src/k8s.io/component-helpers | 2 | 4 | ||
staging/src/k8s.io/controller-manager | 10 | 10 | ||
staging/src/k8s.io/cri-api | 0 | 0 | ||
staging/src/k8s.io/csi-translation-lib | 3 | 4 | ||
staging/src/k8s.io/dynamic-resource-allocation | 0 | 0 | ||
staging/src/k8s.io/endpointslice | 0 | 0 | ||
staging/src/k8s.io/kms | 0 | 0 | ||
staging/src/k8s.io/kube-aggregator | 45 | 62 | ||
staging/src/k8s.io/kube-controller-manager | 0 | 0 | ||
staging/src/k8s.io/kubectl | 96 | 160 | ||
staging/src/k8s.io/kubelet | 0 | 32 | ||
staging/src/k8s.io/kube-proxy | 0 | 0 | ||
staging/src/k8s.io/kube-scheduler | 0 | 0 | ||
staging/src/k8s.io/legacy-cloud-providers | 1281 | 2015 | ||
staging/src/k8s.io/metrics | 0 | 0 | ||
staging/src/k8s.io/mount-utils | 55 | 95 | ||
staging/src/k8s.io/pod-security-admission | 0 | 1 | ||
staging/src/k8s.io/sample-apiserver | 0 | 0 | ||
staging/src/k8s.io/sample-cli-plugin | 0 | 0 | ||
staging/src/k8s.io/sample-controller | 0 | 0 | ||
|
||
For Kubernetes 1.30, the focus is on client-go. APIs need to be extended | ||
carefully without breaking existing code so that a context can be provided for | ||
log calls. In some cases, this also makes a context available to code which | ||
currently uses `context.TODO` as a stop-gap measure. Currently there are over | ||
300 of those in `staging/src/k8s.io/client-go`. Whenever new APIs get | ||
introduced, components which were already converted to contextual logging get | ||
updated to use those. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What about 1.30? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We finished the kube-scheduler in 1.29. Let me add that. For 1.30, we are focusing on client-go. We had a volunteer for kubelet and @tallclair wanted to do something with the apiserver, but it's unclear whether both will happen. It shouldn't be in "Implementation History" anyway, should it? It's not history, it's the future... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @soltysh: I added a "Status and next steps" section. I hope that also addresses your question about "estimates around how much work is left" - can you take another look? |
||
## Drawbacks | ||
|
||
Supporting contextual logging is a key design decision that has implications | ||
|
@@ -1098,3 +1261,10 @@ would have removed all legacy code from Kubernetes. However, that transition | |
would have been complicated and forced all consumers of Kubernetes code to | ||
adjust their code. Therefore the scope of the KEP was reduced from "remove | ||
dependency on klog" to "remove dependency on global logger in klog". | ||
|
||
### Use log/slog instead of klog+logr | ||
|
||
This isn't viable because `slog` doesn't provide a mechanism to pass a logger | ||
through a context. Therefore it would not be possible to support contextual | ||
logging in packages like client-go where adding an explicit logger parameter | ||
would be a major API break. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We ended up focusing on kube-controller-manager instead of kube-scheduler because that work was easier to split up. Conversion of external components is technically feasible, but was delayed by lack of developer time. Contributors are willing to take on that part now and have already started (1, 2).
Regarding the next points (cannot comment on it in GitHub):
There has been steady interest in this, as seen from questions on Slack and good attendance of the KubeCon maintainer track sessions about logging. We've not done a survey, and I am not sure what exactly I would ask in it. May I perhaps remove that part?
EDIT: Done in klog update for 1.29.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just sharing some experience from using contextual logging with Cluster API.
In Cluster API we use component-base/logs to setup the log flags and the text/JSON loggers. Then we pass in the logger from
klog.Background()
into controller-runtime.controller-runtime ensures that this logger (+ a few additional k/v pairs) is available in all contexts passed in our controllers and webhooks. In the controllers/webhooks we use ~ go-logr/logr.FromContext/NewContext to pass around the logger (controller-runtime has a thin wrapper around these funcs).
In Cluster API contextual logging is always enabled. Because we're using the CR FromContext/NewContext wrappers there is also no way to turn it off.
So we use large parts of the implementation and concepts of upstream Kubernetes contextual logging and align as much as we can to upstream conventions.
We are using this since a while in Cluster API and for us it worked pretty well without any issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any estimates around how much work is left in kube-scheduler? Also what about kubelet and kube-apiserver and cloud-controller-manager? I'd like to see explicit information about each at beta.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm tracking that in the description of #3077.
Shall I copy a snapshot of that information into the KEP?
kube-scheduler and kube-controller-manager are done and only need to be touch again when we make API changes elsewhere, for example in component-base or client-go.
kubelet has around 2000 calls which need to be updated, kube-proxy 400, kube-apiserver 660.
I've not checked cmd/cloud-controller-manager, but it seems to be small.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine, I didn't look there. No need to put that information in the KEP in that case.