Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⚠️ fix redundant values in log entries #2074

Closed

Conversation

zbindenren
Copy link
Contributor

@zbindenren zbindenren commented Nov 28, 2022

With the current log configuration we get following reconcile log entries:

level=INFO msg="creating certificate request" controller=certificate controllerGroup=example.com controllerKind=Certificate Certificate="{certificate-sample appl-k8s-e2etests1-e1}" namespace=appl-k8s-e2etests1-e1 name=certificate-sample reconcileID=8e0379c3-ec61-42a7-9bb1-09b19f57bfb6 rev=1

As you can see, we have an entry Certificate="{certificate-sample appl-k8s-e2etests1-e1}", but this information is redundant, because we already have:

controllerKind=Certificate namespace=appl-k8s-e2etests1-e1 name=certificate-sample

If you want to search for this entry, it is probably easier and more flexible to filter by controllerKind, namespace and name.

The above commit changes the log entry to:

level=INFO msg="creating certificate request" controller=certificate controllerGroup=example.com controllerKind=Certificate namespace=appl-k8s-e2etests1-e1 name=certificate-sample reconcileID=8e0379c3-ec61-42a7-9bb1-09b19f57bfb6 rev=1

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 28, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: zbindenren
Once this PR has been reviewed and has the lgtm label, please assign shawn-hurley for approval by writing /assign @shawn-hurley in a comment. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Contributor

Hi @zbindenren. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Nov 28, 2022
@alvaroaleman
Copy link
Member

/cc @sbueringer

@vincepri
Copy link
Member

vincepri commented Dec 1, 2022

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Dec 1, 2022
@@ -319,7 +318,6 @@ func (blder *Builder) doController(r reconcile.Reconciler) error {
log := log
if req != nil {
log = log.WithValues(
gvk.Kind, klog.KRef(req.Namespace, req.Name),
"namespace", req.Namespace, "name", req.Name,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"namespace", req.Namespace, "name", req.Name,
"namespace", req.Namespace, "name", req.Name,

We should probably remove this line instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vincepri thanks for your time to review this.

If I understand correctly, your proposition would result in something like this:

level=INFO msg="creating certificate request" controller=certificate controllerGroup=example.com controllerKind=Certificate Certificate="{certificate-sample appl-k8s-e2etests1-e1}" reconcileID=8e0379c3-ec61-42a7-9bb1-09b19f57bfb6 rev=1

We would also have to remove this line. Otherwise we would still have redundant information in the log line. This would produce:

level=INFO msg="creating certificate request" controller=certificate controllerGroup=example.com Certificate="{certificate-sample appl-k8s-e2etests1-e1}" reconcileID=8e0379c3-ec61-42a7-9bb1-09b19f57bfb6 rev=1

In comparison my change:

level=INFO msg="creating certificate request" controller=certificate controllerGroup=example.com controllerKind=Certificate namespace=appl-k8s-e2etests1-e1 name=certificate-sample reconcileID=8e0379c3-ec61-42a7-9bb1-09b19f57bfb6 rev=1

If you are using indexing software like Splunk or Elastic Search, it would be harder to write queries for your proposition, since the Kind (Certificate in our the case above) is now a key in the structured log. If you are reconciling for different kinds, we would have a key, that can change for different log entries.

@@ -319,7 +318,6 @@ func (blder *Builder) doController(r reconcile.Reconciler) error {
log := log
if req != nil {
log = log.WithValues(
gvk.Kind, klog.KRef(req.Namespace, req.Name),
Copy link
Member

@alvaroaleman alvaroaleman Dec 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intentional btw, we wanted to not break existing users who rely on these fields and allow it to just work for ppl who rely on the klog field naming. Is there any issue with the redundancy? Most log shippers allow to drop fields if you are concerned about the size.

This is a breaking change btw since it might break log indexing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, if we consider log output as part of the controller-runtime's API, this is a breaking change. But if we can agree, that for fixing a 'bug', it is allowed to introduce a breaking change, then let me elaborate why I think this is a bug:

Using a value as key is the main issue I have with the above log entry. I think this is an anti pattern in structured logging. Let's take http requests as analogy. To log http requests with status codes we probably would log something like:

request_path="/api/v1/user/list" request_code="404" ...

and usually not:

/api/v1/user/list="404"

The above code would be ok in my opinion, if we rewrite it to something like:

log = log.WithValues(
    "kindReference", klog.KRef(req.Namespace, req.Name),
)

But if we change it, we could remove it altogether.

Additionally it is probably easier to create search queries with the fields controllerKind, namespace and name than with the klog field name Certificate which changes for each kind.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the general sentiment to not put values into keys, but it has the advantage that you can search for logs of related to a given object type + name across controllers: #1826 (comment)

We basically had the same discussion we are having here on the issue before this was introduced

Copy link
Contributor Author

@zbindenren zbindenren Dec 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you refer to this quote:

By having namespace/name for all of them nested we can simply filter with cluster.name = "" across logs of multiple controllers. Otherwise it would be very hard to take the namespace and name from every controller correlate them to the kind they belong to and then filter across controllers.

The query suggested by @sbueringer is still possible with:

controllerKind=Cluster name=cluster-name namespace=namespace

And for just one controller:

controller=controller controllerKind=Cluster name=cluster-name namespace=namespace

So to summarize pros ✔️ and cons ❌:

If we merge this PR:

  • ✔️ we get smaller log entries without redundant information
  • ✔️ log entries are easier to read (no nested objects and less noise)
  • ✔️ stop using a value as key in structured logging
  • ✔️ same filter possibilities in splunk, loki or elastic search
  • ❌ breaking change

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zbindenren I agree with your reasoning in principle, but the problem is that we want to do this the same way as upstream does so log queries will work for both.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akutz and @sbueringer were the ones who cared about the previous format, lets give them a chance to voice their opinion

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, @pohly not Andrew

Copy link
Member

@sbueringer sbueringer Jan 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the very late response, I have to get my notifications a bit more under control.

As stated above the current format was intended as I described in: #1826

Log "<kind>": "<namespace>/<name>" instead of "namespace" and "name"

Initially we used kind also with a first lower-case character but later on we realized that we can't do this correctly for arbitrary kind's:

So yes we are not 100% consistent with upstream, but as far as we can be.

Independent of consistency, my reason for wanting the current format is the following:

In ClusterAPI we have various controllers and they are all reconciling related objects.
With the current loggin we end up with logs like this: (re-ordered keys to make it easier to understand)
MachineSet controller:

{"Cluster":{"name":"quick-start-h0uf1w","namespace":"quick-start-u7zz7r"}, "MachineDeployment":{"name":"quick-start-h0uf1w-md-0","namespace":"quick-start-u7zz7r"}, "MachineSet":{"name":"quick-start-h0uf1w-md-0-6c856bb876","namespace":"quick-start-u7zz7r"},"ts":1673599028821.286,"caller":"machineset/machineset_controller.go:402","msg":"MachineSet is scaling up to 1 replicas by creating 1 machines","v":0,"controller":"machineset","controllerGroup":"cluster.x-k8s.io","controllerKind":"MachineSet","namespace":"quick-start-u7zz7r","name":"quick-start-h0uf1w-md-0-6c856bb876","reconcileID":"7f5cf231-61cb-4cb6-8d28-ac02d709d1ff","replicas":1,"machineCount":0}

Cluster controller

{"Cluster":{"name":"clusterclass-changes-pbb08r","namespace":"clusterclass-changes-xo727h"},"ts":1673599028749.3906,"caller":"cluster/cluster_controller.go:187","msg":"Waiting for the topology to be generated","v":0,"controller":"cluster","controllerGroup":"cluster.x-k8s.io","controllerKind":"Cluster","namespace":"clusterclass-changes-xo727h","name":"clusterclass-changes-pbb08r","reconcileID":"bd128734-0120-4035-964b-e36d0624351a"}

So essentially across controllers we are adding key/value pairs of the objects involved, like {"Cluster":{"name":"clusterclass-changes-pbb08r","namespace":"clusterclass-changes-xo727h"}, "MachineDeployment":{"name":"quick-start-h0uf1w-md-0","namespace":"quick-start-u7zz7r"}...

This makes it very easy to query logs across controller, by just querying for log entries with Cluster.name == <cluster-name>

If we just would want to query the logs of a single controller you're format would work as well, but for cross-refencing it makes the queries very complicated in my opinion.

Regarding using values as keys

I don't know where that comes from and what the background is. Is it the same concern about cardinality with labels in metrics? Essentially if you use values as log keys you can end up with very high cardinality?

I'm not sure that this is true in this case, because the amount of kind's that we are reconciling shouldn't be that excessive to become a problem.

(also described here: #1826 (comment))

Regarding redundancy and easier to ready

I agree if folks are reading logs via kubectl less redundancy would indeed help. I think via Kibana / Grafana / .. it doesn't make a difference.

P.S. Just to make sure we're on the same page. We are discussing what the default / out-of-the box behavior should be. It is always possible to use different k/v pairs by setting LogConstructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your answer.

Regarding your reason for wanting the current format
Your query could also be created without the redundant key like:

controllerKind=Cluster name=<cluster-name>

The above query would show the logs for multiple controllers. If you want to limit the controller you would need to do the following:

controller=machineset controllerKind=Cluster name=<cluster-name>

or

controller=cluster controllerKind=Cluster name=<cluster-name>

To me this doesn't look to complicated. Or do I misunderstand something?

Copy link
Member

@sbueringer sbueringer Jan 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

controllerKind is different for each controller so I can't use controllerKind=Cluster to filter for logs related to a Cluster across controllers.

controllerKind=Cluster name=<cluster-name> would only match the logs from the Cluster controller but not the logs from the MachineSet controller

Essentially in the MachineSet controller the controllerKind is MachineSet and also name and namespace are from a MachineSet. In the Cluster controller the controllerKind is Cluster.

@zbindenren zbindenren changed the title 🐛 fix redundant values in log entries ⚠️ : fix redundant values in log entries Dec 5, 2022
@zbindenren zbindenren changed the title ⚠️ : fix redundant values in log entries ⚠️ fix redundant values in log entries Dec 5, 2022
@pohly
Copy link

pohly commented Jan 24, 2023

Most controllers are not using structured logging (namespace, job, resourcequota, replicaset, daemon and more).

Not sure whether it is relevant here, but in case that you haven't seen it: progress has been made with converting the in-tree controllers to structured, contextual logging. See the table in kubernetes/enhancements#3077. PRs are currently under review for 1.27.

@k8s-ci-robot
Copy link
Contributor

@zbindenren: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-controller-runtime-test bcd1121 link true /test pull-controller-runtime-test

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 17, 2023
@zbindenren zbindenren closed this May 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants