Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cronjob controller: reconsider log verbosity #115574

Closed
ffilippopoulos opened this issue Feb 7, 2023 · 14 comments · Fixed by #122293
Closed

cronjob controller: reconsider log verbosity #115574

ffilippopoulos opened this issue Feb 7, 2023 · 14 comments · Fixed by #122293
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/apps Categorizes an issue or PR as relevant to SIG Apps. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@ffilippopoulos
Copy link

What happened?

We are running Kubernetes 1.25.5 and trying to configure kube-controller-manager to be less verbose when logging.
We run kube-controller-manager as a static pod created by Kubelet and use the following command where we pass --v=0:

spec:
  containers:
  - command:
    - kube-controller-manager
    - --feature-gates=EphemeralContainers=true,TopologyAwareHints=true
    - --allocate-node-cidrs=true
    - --authentication-kubeconfig=/etc/kubernetes/config/controller-manager.conf
    - --authorization-kubeconfig=/etc/kubernetes/config/controller-manager.conf
    - --cluster-cidr=10.6.0.0/16
    - --configure-cloud-routes=false
    - --kubeconfig=/etc/kubernetes/config/controller-manager.conf
    - --leader-elect=true
    - --node-monitor-grace-period=120s
    - --root-ca-file=/etc/kubernetes/ssl/ca.pem
    - --service-account-private-key-file=/etc/kubernetes/ssl/signing-key.pem
    - --use-service-account-credentials
    - --v=0
    image: registry.k8s.io/kube-controller-manager:v1.25.5
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 3
      httpGet:
        host: 127.0.0.1
        path: /healthz
        port: 10257
        scheme: HTTPS
      initialDelaySeconds: 15
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 15
    name: kube-controller-manager

The --v=0 flag seems to passed fine to the final command line of kube-controller-manager:

root     2464651  285  1.6 1939088 1085624 ?     Ssl  05:59 544:37 kube-controller-manager --feature-gates=EphemeralContainers=true,TopologyAwareHints=true --allocate-node-cidrs=true --authentication-kubeconfig=/etc/kubernetes/config/controller-manager.conf --authorization-kubeconfig=/etc/kubernetes/config/controller-manager.conf --cluster-cidr=10.6.0.0/16 --configure-cloud-routes=false --kubeconfig=/etc/kubernetes/config/controller-manager.conf --leader-elect=true --node-monitor-grace-period=120s --root-ca-file=/etc/kubernetes/ssl/ca.pem --service-account-private-key-file=/etc/kubernetes/ssl/signing-key.pem --use-service-account-credentials --v=0

Even though we get tons of the following logs periodically:

kube-controller-manager-master-2.dev.merit.uw.systems kube-controller-manager I0202 06:52:05.808355       1 event.go:294] "Event occurred" object="energy/gba-fabricator-job" fieldPath="" kind="CronJob" apiVersion="batch/v1" type="Normal" reason="JobAlreadyActive" message="Not starting job because prior execution is running and concurrency policy is Forbid"

which come from https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/cronjob/cronjob_controllerv2.go#L575 where is logged as klog.V(4).InfoS
So arguably this line should not appear on our controller logs.

What did you expect to happen?

Setting flag --v=0 should be enough to lower log verbosity level according to: https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/logging.md

How can we reproduce it (as minimally and precisely as possible)?

Scheduling a long running job at small intervals should be enough to reproduce our exact case and get the above mentioned logs.

Anything else we need to know?

No response

Kubernetes version

$ kubectl version --short
Client Version: v1.25.5
Kustomize Version: v4.5.7
Server Version: v1.25.5

Cloud provider

On premises

OS version

$ cat /etc/os-release
NAME="Flatcar Container Linux by Kinvolk"
ID=flatcar
ID_LIKE=coreos
VERSION=3374.2.3
VERSION_ID=3374.2.3
BUILD_ID=2023-01-20-1751
SYSEXT_LEVEL=1.0
PRETTY_NAME="Flatcar Container Linux by Kinvolk 3374.2.3 (Oklo)"
ANSI_COLOR="38;5;75"
HOME_URL="https://flatcar.org/"
BUG_REPORT_URL="https://issues.flatcar.org"
FLATCAR_BOARD="amd64-usr"
CPE_NAME="cpe:2.3:o:flatcar-linux:flatcar_linux:3374.2.3:*:*:*:*:*:*:*"

$ uname -a
Linux worker-54.dev.merit.uw.systems 5.15.86-flatcar #1 SMP Fri Jan 20 17:26:55 -00 2023 x86_64 Intel(R) Xeon(R) CPU E3-1585L v5 @ 3.00GHz GenuineIntel GNU/Linux

Install tools

Self managed Kubernetes cluster.

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

@ffilippopoulos ffilippopoulos added the kind/bug Categorizes issue or PR as related to a bug. label Feb 7, 2023
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 7, 2023
@ffilippopoulos
Copy link
Author

/sig instrumentation

@k8s-ci-robot k8s-ci-robot added sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 7, 2023
@logicalhan
Copy link
Member

/assign @pohly
/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 9, 2023
@RayAsh37
Copy link

RayAsh37 commented Feb 10, 2023

@ffilippopoulos you are referring to

klog.V(4).InfoS("Not starting job because prior execution is still running and concurrency policy is Forbid", "cronjob", klog.KRef(cronJob.GetNamespace(), cronJob.GetName()))

when you refer to
where is logged as klog.V(4).InfoS So arguably this line should not appear on our controller logs.
but the message you are getting is from

jm.recorder.Eventf(cronJob, corev1.EventTypeNormal, "JobAlreadyActive", "Not starting job because prior execution is running and concurrency policy is Forbid")

I could be wrong here

@pohly
Copy link
Contributor

pohly commented Feb 10, 2023

@RayAsh37: well spotted.

klog.V(verbosity).InfoS("Event occurred", "object", klog.KRef(e.InvolvedObject.Namespace, e.InvolvedObject.Name), "fieldPath", e.InvolvedObject.FieldPath, "kind", e.InvolvedObject.Kind, "apiVersion", e.InvolvedObject.APIVersion, "type", e.Type, "reason", e.Reason, "message", e.Message)
prints the log message with a verbosity of 0 as determined by https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/cronjob/cronjob_controllerv2.go#L131

Whether events should be logged at that verbosity I don't know. The klog.V(4).InfoS calls before each jm.recorder.Eventf also seem redundant.

/unassign
/retitle cronjob controller: reconsider log verbosity

@k8s-ci-robot k8s-ci-robot changed the title kube-controller-manager ignores --v flag to set logs verbosity level cronjob controller: reconsider log verbosity Feb 10, 2023
@ffilippopoulos
Copy link
Author

Ok, I assumed that I am seeing the message from klog as I wasn't aware that jm.recorder.Eventf will also result in printing log messages. So there is no way to avoid getting this for now?

@pohly
Copy link
Contributor

pohly commented Feb 13, 2023

Not without modifying the source code of the cronjob controller.

@mengjiao-liu
Copy link
Member

mengjiao-liu commented Sep 15, 2023

	// Start event processing pipeline.
	jm.broadcaster.StartStructuredLogging(0)

Is the verbosity 0 as the log level that events are logged too low as suggested by sig instrumentation?
Users cannot ignore event messages by lowering the log level. It causes information noise.

I think we can increase this value or make it changeable.
for example: kubelet sets this event verbosity to 3
What do you think? @kow3ns @janetkuo @soltysh @smarterclayton @pohly

@pohly
Copy link
Contributor

pohly commented Sep 15, 2023

Makes sense to me.

@janetkuo
Copy link
Member

I traced it down to this change #91712 and most controllers are doing StartStructuredLogging(0). I'm fine with increasing the log level to be >0, following the recommendation from SIG instrumentation. Ideally it should be done for all controllers.

@pohly
Copy link
Contributor

pohly commented Sep 25, 2023

I don't think that SIG instrumentation has recommendations for this particular case. Also note that #91712 didn't change the verbosity of these events, they were at "info" (v=0) level already before in kube-controller-manager.

It's under the purvey of the individual code owners how important these events are, except that here we probably want consistency inside kube-controller-manager, so this probably should go to SIG Apps.

FWIW, logging at level 3 makes more sense to me than the current level 0.

@pohly
Copy link
Contributor

pohly commented Sep 25, 2023

/remove-sig instrumentation
/sig apps

@k8s-ci-robot k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. and removed sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. labels Sep 25, 2023
@soltysh
Copy link
Contributor

soltysh commented Nov 10, 2023

Looking at this issue more in depth there are 2 things that I'd like to see being resolved:

  1. Increasing the global level for broadcaster's logging to 3, such that this is the new base line. See here for more details.
  2. Making sure the context is properly injected into the broadcaster, this will allow the -v flag value to be used also in that broadcaster, rather than the above global value. Again see here for explanation.

In other words, looking at cronjob controller we need:

  1. this line needs to look like:
jm.broadcaster.StartStructuredLogging(3)
  1. this line need to look like:
eventBroadcaster := record.NewBroadcaster(record.WithContext(ctx))

This should be applied consistently to all controllers. @mengjiao-liu since you're bringing this over to sig-apps call, are you interested in working on this?

@mengjiao-liu
Copy link
Member

mengjiao-liu commented Nov 13, 2023

/assign

Thanks for the reply. I will work on this.

@soltysh
Copy link
Contributor

soltysh commented Nov 14, 2023

@mengjiao-liu once you have a PR up and ready for review, please ping me directly on slack to get a faster response. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/apps Categorizes an issue or PR as relevant to SIG Apps. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

8 participants