-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-3077: contextual logging: promotion to beta in 1.30 #4219
Conversation
@@ -870,7 +912,7 @@ logging. | |||
|
|||
#### Beta | |||
|
|||
- All of kube-scheduler (in-tree) and CSI external-provisioner (out-of-tree) converted | |||
- [All of kube-controller-manager](https://github.com/kubernetes/kubernetes/pull/119250) and some [parts of kube-scheduler](https://github.com/kubernetes/kubernetes/pull/115588) converted (in-tree), conversion of out-of-tree components possible, whether they use pflag ([external-provisioner](https://github.com/kubernetes-csi/external-provisioner/pull/639)] or plain Go flags ([node-driver-registrar](https://github.com/kubernetes-csi/node-driver-registrar/pull/259)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We ended up focusing on kube-controller-manager instead of kube-scheduler because that work was easier to split up. Conversion of external components is technically feasible, but was delayed by lack of developer time. Contributors are willing to take on that part now and have already started (1, 2).
Regarding the next points (cannot comment on it in GitHub):
Gathered feedback from developers and surveys
There has been steady interest in this, as seen from questions on Slack and good attendance of the KubeCon maintainer track sessions about logging. We've not done a survey, and I am not sure what exactly I would ask in it. May I perhaps remove that part?
New APIs in
k8s.io/klog/v2
no longer marked as experimental
EDIT: Done in klog update for 1.29.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just sharing some experience from using contextual logging with Cluster API.
In Cluster API we use component-base/logs to setup the log flags and the text/JSON loggers. Then we pass in the logger from klog.Background()
into controller-runtime.
controller-runtime ensures that this logger (+ a few additional k/v pairs) is available in all contexts passed in our controllers and webhooks. In the controllers/webhooks we use ~ go-logr/logr.FromContext/NewContext to pass around the logger (controller-runtime has a thin wrapper around these funcs).
In Cluster API contextual logging is always enabled. Because we're using the CR FromContext/NewContext wrappers there is also no way to turn it off.
So we use large parts of the implementation and concepts of upstream Kubernetes contextual logging and align as much as we can to upstream conventions.
We are using this since a while in Cluster API and for us it worked pretty well without any issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any estimates around how much work is left in kube-scheduler? Also what about kubelet and kube-apiserver and cloud-controller-manager? I'd like to see explicit information about each at beta.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm tracking that in the description of #3077.
Shall I copy a snapshot of that information into the KEP?
kube-scheduler and kube-controller-manager are done and only need to be touch again when we make API changes elsewhere, for example in component-base or client-go.
kubelet has around 2000 calls which need to be updated, kube-proxy 400, kube-apiserver 660.
I've not checked cmd/cloud-controller-manager, but it seems to be small.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine, I didn't look there. No need to put that information in the KEP in that case.
1d3bd05
to
9795382
Compare
/assign @logicalhan |
9795382
to
9e4fc0d
Compare
This PR is no longer targeting beta in 1.29. Let's try instead for 1.30. Not sure whether it needs to be merged before the freeze. We can also hold it until some of the pending slog PRs are merged and then merge it as a documentation update. |
Let's refocus on promotion to beta, with on by default, in 1.30. In this comment I am going to check the usefulness and performance overhead of enabling contextual logging based on current master (= 5ce0bd9, shortly before v1.29.0-alpha.3). First, use-cases. kube-controller-manager and kube-scheduler are converted to contextual logging and use kube-scheduler:
The immediate benefit is that the operation and plugin are visible in kube-controller-manager:
The Second, performance. I am using
That benchmark is completely CPU-bound as log output gets buffered in memory by Most test cases in When doing that comparison with kube-scheduler code which supports contextual logging (i.e. uses kube-scheduler is more interesting. The goal was to cause no performance regression at -v3 or lower. Higher log levels are for debugging and slowdowns become acceptable, in particular when the resulting log output is then more useful for debugging (as it is, in the example above). At -v3, benchstat reports no significant difference in most test cases, so that goal has been achieved. Some even ran a bit faster (see benchstat-v3.log). For higher log levels, let's look at -v5 and |
9e4fc0d
to
66b45aa
Compare
Thanks for doing this analysis! |
@logicalhan: are you now okay with promotion to beta in 1.30 as described here? On chat we had also discussed doing some kind of scalability comparison. I asked about that in #sig-scalability on Slack and got zero responses - it looks like interest and bandwidth for such an experiment are low. Instead, my preference would be to promote the feature early in the 1.30 cycle, watch the normal scalability tests for a regresssion, then back it out again if there is one. That'll be simpler. |
Yeah I'm okay with that plan. Thanks for doing the due diligence! |
/milestone v1.30 |
As discussed in kubernetes/enhancements#4219 (comment) benchstat reports no significant difference in most test cases for scheduler_perf for -v3. At -v5, there is some slowdown, but that is justified because the output becomes more useful for debugging. Enabling the feature by default ensures that log output becomes better regardless whether users know and remember to enable it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
@@ -4,3 +4,5 @@ | |||
kep-number: 3077 | |||
alpha: | |||
approver: "@ehashman" | |||
beta: | |||
approver: "@logicalhan" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@johnbelamaric: should (may?) I list you here?
I picked @logicalhan as replacement for ehashman
because both are SIG Instrumentation, but now I noticed that this should be a PRR reviewer.
@logicalhan added LGTM on behalf of the SIG. This PR is now ready for PRR approval.
/assign @johnbelamaric
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#prr-shadow
Several questions from PRR shadow, before pulling in PRR approver.
logger.Info("Done", "pod", klog.KObj(pod)) | ||
``` | ||
|
||
Starting with beta, the feature gate will be enabled and code can be written |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this require another re-write during beta, or at this point in time we'll let it be, just in case someone decides to turn the feature off, thus falling back to previous mechanism, which as described above requires that duplication?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to update this section. We decided to keep the duplication as long as the feature still can be turned off, which is the opposite of what I was proposing here.
@@ -870,7 +912,7 @@ logging. | |||
|
|||
#### Beta | |||
|
|||
- All of kube-scheduler (in-tree) and CSI external-provisioner (out-of-tree) converted | |||
- [All of kube-controller-manager](https://github.com/kubernetes/kubernetes/pull/119250) and some [parts of kube-scheduler](https://github.com/kubernetes/kubernetes/pull/115588) converted (in-tree), conversion of out-of-tree components possible, whether they use pflag ([external-provisioner](https://github.com/kubernetes-csi/external-provisioner/pull/639)] or plain Go flags ([node-driver-registrar](https://github.com/kubernetes-csi/node-driver-registrar/pull/259)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any estimates around how much work is left in kube-scheduler? Also what about kubelet and kube-apiserver and cloud-controller-manager? I'd like to see explicit information about each at beta.
@@ -1037,6 +1079,11 @@ Revert commits that changed log calls. | |||
|
|||
## Implementation History | |||
|
|||
* Kubernetes 1.24: initial alpha |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't comment on not modified text, so a few questions. Feature Enablement and Rollback
and Version Skew Strategy
questions, both of these are gaining in importance when the feature is enabled by default.
- What will happen if a user starts to use new flags/options which start logging to json and rolls the component back? Are there any new flags for setting up different log output at all?
- You mentioned in the risk section
Performance overhead risk
how a cluster administrator can notice that? Was this measured as part of this effort to point out which components are affected more and which less with that impact? You mentioned flat 2% but in the risk section you called out that there will be places where this might be more or less, that's why I'm asking about per-component data. Pod scheduling (= "startup latency of schedulable stateless pods" SLI) might become slightly worse
- have you measured it by how much? This is similar to the above question.- Failure modes, in the risks section you described
Uninitialized logger
, is there a way we can prevent that failure from happening in CI? - In
What steps should be taken if SLOs are not being met to determine the problem?
I doubt the cluster operator can revert commits, you need to define this more in terms of a cluster admin rollback to previous working version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will happen if a user starts to use new flags/options which start logging to json and rolls the component back?
JSON output is not part of this feature. That's structured logging.
Are there any new flags for setting up different log output at all?
This feature here has no separate command line flags or config options.
Was this measured as part of this effort to point out which components are affected more and which less with that impact?
We only have performance benchmarks for kube-scheduler. See #4219 (comment) for an evaluation of that.
have you measured it by how much? This is similar to the above question.
I observed no relevant difference in the kube-scheduler benchmark.
Failure modes, in the risks section you described Uninitialized logger, is there a way we can prevent that failure from happening in CI?
This is mostly a non-issue because the Go API was designed such that all code which asks for a logr.Logger
gets one. A Logger
value is never uninitialized. A pointer might be nil, but this is rarely used.
To catch mistakes in the CI, the modified code must be executed.
I doubt the cluster operator can revert commits, you need to define this more in terms of a cluster admin rollback to previous working version.
True. Let me double-check what I wrote there.
* Kubernetes 1.27: parts of kube-controller-manager converted | ||
* Kubernetes 1.28: kube-controller-manager converted completely, relationship | ||
with log/slog in Go 1.21 clarified | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about 1.30?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We finished the kube-scheduler in 1.29. Let me add that.
For 1.30, we are focusing on client-go. We had a volunteer for kubelet and @tallclair wanted to do something with the apiserver, but it's unclear whether both will happen.
It shouldn't be in "Implementation History" anyway, should it? It's not history, it's the future...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@soltysh: I added a "Status and next steps" section. I hope that also addresses your question about "estimates around how much work is left" - can you take another look?
We need to be more conservative than originally planned and continue to produce the same informative log output as before when contextual logging is disabled. To help reviewers understand the scope of the work, a new "Status and next steps" summarizes where we are with the conversion.
a84b9ff
to
b83cdb2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#prr-shadow
prr lgtm
/assign @johnbelamaric
👍 /approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: johnbelamaric, logicalhan, pohly, soltysh The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@logicalhan or @johnbelamaric: can you add back the LGTM? The only changes since the LGTM from @logicalhan are these: https://github.com/kubernetes/enhancements/compare/66b45aa076bbf64a93cabf00d4e81c8b8fc086b5..b83cdb208f6fff7457d6a480ecd5ecb46f8a24a3 |
Re-applying lgtm based on @logicalhan's comment #4219 (review) /lgtm |
As discussed in kubernetes/enhancements#4219 (comment) benchstat reports no significant difference in most test cases for scheduler_perf for -v3. At -v5, there is some slowdown, but that is justified because the output becomes more useful for debugging. Enabling the feature by default ensures that log output becomes better regardless whether users know and remember to enable it. Kubernetes-commit: 9c6b42af46ed9f541fefa65a8a1e175ecd0a7c34
As discussed in kubernetes/enhancements#4219 (comment) benchstat reports no significant difference in most test cases for scheduler_perf for -v3. At -v5, there is some slowdown, but that is justified because the output becomes more useful for debugging. Enabling the feature by default ensures that log output becomes better regardless whether users know and remember to enable it.
As discussed in kubernetes/enhancements#4219 (comment) benchstat reports no significant difference in most test cases for scheduler_perf for -v3. At -v5, there is some slowdown, but that is justified because the output becomes more useful for debugging. Enabling the feature by default ensures that log output becomes better regardless whether users know and remember to enable it.
As discussed in kubernetes/enhancements#4219 (comment) benchstat reports no significant difference in most test cases for scheduler_perf for -v3. At -v5, there is some slowdown, but that is justified because the output becomes more useful for debugging. Enabling the feature by default ensures that log output becomes better regardless whether users know and remember to enable it.
One-line PR description: update for to reflect changes in the ecosystem
Issue link: contextual logging #3077
Other comments: some of the PRs linked here are not merged yet because support for slog is still new; they are expected to be merged before 1.29