Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DelegatingLogger reports incorrect source lines when zap.AddCaller() option is used #1737

Closed
dobsonj opened this issue Dec 3, 2021 · 5 comments · Fixed by #1975
Closed
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@dobsonj
Copy link
Member

dobsonj commented Dec 3, 2021

I noticed an issue with logging in controller-runtime when source line logging is enabled via the zap.AddCaller() option.

For context, I was debugging local-storage-operator, with the following options passed into SetLogger:

https://github.com/openshift/local-storage-operator/blob/8fc42cc8b990907c88a6da551dc85b55c2dc4417/main.go#L79-L87

	opts := zap.Options{
		Development: true,
		ZapOpts:     []zaplog.Option{zaplog.AddCaller()},
	}

	opts.BindFlags(flag.CommandLine)
	flag.Parse()

	ctrl.SetLogger(zap.New(zap.UseFlagOptions(&opts)))

During startup, it prints log messages like this, attributing them to controller/controller.go:221

2021-12-02T22:17:24.408Z        INFO    controller-runtime.manager.controller.localvolume       controller/controller.go:221    Starting EventSource    {"reconciler group": "local.storage.openshift.io", "reconciler kind": "LocalVolume", "source": "kind source: /, Kind="}
2021-12-02T22:17:24.408Z        INFO    controller-runtime.manager.controller.localvolume       controller/controller.go:221    Starting EventSource    {"reconciler group": "local.storage.openshift.io", "reconciler kind": "LocalVolume", "source": "kind source: /, Kind="}

The problem is that line 221 is not the line where the log message is:

https://github.com/openshift/local-storage-operator/blob/8fc42cc8b990907c88a6da551dc85b55c2dc4417/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go#L221

That is the parent function of the line where the actual log message is:

https://github.com/openshift/local-storage-operator/blob/8fc42cc8b990907c88a6da551dc85b55c2dc4417/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go#L165

So I would expect the log messages to report controller/controller.go:165 instead of controller/controller.go:221 when the AddCaller option is present. This is one example, but the problem applies to other log messages as well.

The logr.Logger object is setting callerSkip to 2... so I added debug.PrintStack to AddCallerSkip(), and these were the places where it was provided a non-zero argument:

https://github.com/openshift/local-storage-operator/blob/8fc42cc8b990907c88a6da551dc85b55c2dc4417/vendor/sigs.k8s.io/controller-runtime/pkg/log/zap/zap.go#L234

goroutine 1 [running]:
runtime/debug.Stack(0x1ed08f0, 0xc0002fddd0, 0xc000590d00)
        /usr/lib/golang/src/runtime/debug/stack.go:24 +0x9f
runtime/debug.PrintStack()
        /usr/lib/golang/src/runtime/debug/stack.go:16 +0x25
go.uber.org/zap.AddCallerSkip(0x1, 0x1ec45e0, 0xc000010020)
        /go/src/github.com/openshift/local-storage-operator/vendor/go.uber.org/zap/options.go:110 +0x25
sigs.k8s.io/controller-runtime/pkg/log/zap.NewRaw(0xc00020bcf8, 0x1, 0x1, 0xc0004a8320)
        /go/src/github.com/openshift/local-storage-operator/vendor/sigs.k8s.io/controller-runtime/pkg/log/zap/zap.go:234 +0x106
sigs.k8s.io/controller-runtime/pkg/log/zap.New(0xc00020bcf8, 0x1, 0x1, 0x1, 0x0)
        /go/src/github.com/openshift/local-storage-operator/vendor/sigs.k8s.io/controller-runtime/pkg/log/zap/zap.go:43 +0x3f
main.main()
        /go/src/github.com/openshift/local-storage-operator/main.go:93 +0x3f4

https://github.com/openshift/local-storage-operator/blob/8fc42cc8b990907c88a6da551dc85b55c2dc4417/vendor/github.com/go-logr/zapr/zapr.go#L182

goroutine 1 [running]:
runtime/debug.Stack(0x6, 0x8, 0xc000339920)
        /usr/lib/golang/src/runtime/debug/stack.go:24 +0x9f
runtime/debug.PrintStack()
        /usr/lib/golang/src/runtime/debug/stack.go:16 +0x25
go.uber.org/zap.AddCallerSkip(0x1, 0xc00000ff50, 0x1ec5120)
        /go/src/github.com/openshift/local-storage-operator/vendor/go.uber.org/zap/options.go:110 +0x25
github.com/go-logr/zapr.newLoggerWithExtraSkip(0xc000339920, 0x1, 0x1, 0xc000339920)
        /go/src/github.com/openshift/local-storage-operator/vendor/github.com/go-logr/zapr/zapr.go:172 +0x2f
github.com/go-logr/zapr.NewLogger(...)
        /go/src/github.com/openshift/local-storage-operator/vendor/github.com/go-logr/zapr/zapr.go:182
sigs.k8s.io/controller-runtime/pkg/log/zap.New(0xc00020bcf8, 0x1, 0x1, 0x1, 0x0)
        /go/src/github.com/openshift/local-storage-operator/vendor/sigs.k8s.io/controller-runtime/pkg/log/zap/zap.go:43 +0x56
main.main()
        /go/src/github.com/openshift/local-storage-operator/main.go:93 +0x3f4

As a quick test, I tried reducing callerSkip to 1 by passing in zaplog.AddCallerSkip(-1):

	ZapOpts:     []zaplog.Option{zaplog.AddCaller(), zaplog.AddCallerSkip(-1)},

And that resulted in the correct line being logged:

2021-12-02T23:47:31.676Z        INFO    controller-runtime.manager.controller.localvolume  controller/controller.go:165     Starting EventSource    {"reconciler group": "local.storage.openshift.io", "reconciler kind": "LocalVolume", "source": "kind source: /, Kind="}
2021-12-02T23:47:31.676Z        INFO    controller-runtime.manager.controller.localvolume  controller/controller.go:165     Starting EventSource    {"reconciler group": "local.storage.openshift.io", "reconciler kind": "LocalVolume", "source": "kind source: /, Kind="}

As I understand it, callerSkip is being set to 2 on the assumption that we're using the wrapper functions here:

https://github.com/openshift/local-storage-operator/blob/8fc42cc8b990907c88a6da551dc85b55c2dc4417/vendor/sigs.k8s.io/controller-runtime/pkg/log/deleg.go#L127

If we use the DelegatingLogger functions, then callerSkip = 2 is correct.
But any time WithName() or WithValues() is called, it returns the underlying logr.Logger object:

https://github.com/openshift/local-storage-operator/blob/8fc42cc8b990907c88a6da551dc85b55c2dc4417/vendor/sigs.k8s.io/controller-runtime/pkg/log/deleg.go#L183

Which means we're not using the DelegatingLogger wrapper functions anymore, we're calling go-logr methods instead without wrappers, and in that case callerSkip = 2 is incorrect, and the parent function ends up getting logged instead of the line where the log statement is.

@wu0407
Copy link

wu0407 commented Dec 31, 2021

same issue use kubebuilder 3.2.0

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 31, 2022
@dobsonj
Copy link
Member Author

dobsonj commented Mar 31, 2022

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 31, 2022
@alculquicondor
Copy link
Contributor

cc @camilamacedo86

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants