Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ManagerOptions#CertDir default is confusing #900

Open
thephw opened this issue Apr 15, 2020 · 13 comments
Open

ManagerOptions#CertDir default is confusing #900

thephw opened this issue Apr 15, 2020 · 13 comments
Labels
good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Milestone

Comments

@thephw
Copy link

thephw commented Apr 15, 2020

Problem Description

The default for the CertDir configuration option is presently nonsensical. I expect because it is out of date with other changes in the expected workflow for a developer to configure TLS. A prior PR #300 removed the cert provisioners that would create local credentials at {TempDir}/k8s-webhook-server/serving-certs/tls.key and {TempDir}/k8s-webhook-server/serving-certs/tls.crt This default was likely missed in the refactor due to some unexpected tight coupling to the prior implementation. However, it creates some confusion for a new developer when trying to run the examples.

CertDir has the comment

// CertDir is the directory that contains the server key and certificate.
	// if not set, webhook server would look up the server key and certificate in
	// {TempDir}/k8s-webhook-server/serving-certs. The server key and certificate
	// must be named tls.key and tls.crt, respectively.
	CertDir string

In the logs running the example you will get:

unable to run manager	{"error": "open /var/folders/kc/7wscczc57v15s84cy1nx0d3h0000gn/T/k8s-webhook-server/serving-certs/tls.crt: no such file or directory"}

Developer Experience

A new developer is likely to run through the examples in the repository. This is what I was doing. The CertDir default sort of served as a red herring, masking the problem for awhile. The default value is so specific it seemed like something else was broken and the examples should work as written. However, in the current implementation it seems they require additional configuration.

Possible Solution

Would love some feedback on the proper updates, but my inclination is to:

  1. Remove the outdated default for CertDir
  2. Update the comment on the type definition
  3. Add a helpful logging message on receiving a nil value for CertDir
  4. Update the examples to setup and reference local certs using mkcert
  5. (maybe?) Change the signature for manager.New to reflect that option is not optional

Additional Context

$ kc version
Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.3", GitCommit:"b3cbbae08ec52a7fc73d334838e18d17e8512749", GitTreeState:"clean", BuildDate:"2019-11-14T04:24:29Z", GoVersion:"go1.12.13", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.5", GitCommit:"20c265fef0741dd71a66480e35bd69f18351daea", GitTreeState:"clean", BuildDate:"2019-10-15T19:07:57Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}
$ go version
go version go1.13.8 darwin/amd64

Related Prior Contributions

@mengqiy authored PR #300 and it was reviewed by @droot and @DirectXMan12 and all of them may likely have superior context to the past, present, and future state. Would love to have y'alls input and thank you for your contributions ❤️

@DirectXMan12
Copy link
Contributor

Can I ask which OS you're using here? On most Linux distros, $TMPDIR is just /tmp (we shouldn't rely on this, though).

If we're gonna change this, we need to do it carefully, because it'd probably break a lot of folks. It's not really a problem on most linux installs, because $TMPDIR is /tmp. That's not to say we shouldn't change it to something a bit more sensible, though.

At the very least, updating the comment seems reasonable, and wrapping the error message with a helpful unable to read certs -- did you put the certs in Manager.CertDir: %w would probably be good too.

Change the signature for manager.New to reflect that option is not optional

It should be -- a good general pattern for stuff that runs in containers is to choose a constant, reasonable default. Since you control the fs, you can just always mount there. The main problem here is that $TMPDIR isn't actually constant on all systems.

@DirectXMan12
Copy link
Contributor

/kind bug
/priority important-longterm

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Apr 20, 2020
@DirectXMan12 DirectXMan12 added this to the Next milestone Apr 20, 2020
@DirectXMan12
Copy link
Contributor

/good-first-issue

on the docs and improved error message

@k8s-ci-robot
Copy link
Contributor

@DirectXMan12:
This request has been marked as suitable for new contributors.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-good-first-issue command.

In response to this:

/good-first-issue

on the docs and improved error message

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Apr 20, 2020
@thephw
Copy link
Author

thephw commented Apr 21, 2020

I was running the code locally on MacOS Catalina 10.15.4 (19E287). Good point on the reasonable default. What would be the new reasonable default in the current implementation?

@thephw
Copy link
Author

thephw commented Apr 21, 2020

This would probably also be helpful:

$ mktemp
/var/folders/4g/3dkqhpkd4csgw5nqc6wzcb900000gn/T/tmp.V2TLDwmC

@mengqiy
Copy link
Member

mengqiy commented Apr 24, 2020

s.CertDir = filepath.Join(os.TempDir(), "k8s-webhook-server", "serving-certs")
This is where the defaulting happens.
It relies on os.TempDir to get the temp dir on different OS.

IIRC CertDir is still an optional field, since if you don't need to run webhook at all, you don't need to set it.

I'm more leaning toward updating the comment to point out default directory is determined by os.TempDir

@DirectXMan12
Copy link
Contributor

@mengqiy I think the main problem is that this might be autogenerated/ard to get at -- e.g. on OSX. That said, I don't think we can fix this easily w/o serious breakage, except maybe if we OS-detect OSX and special-case that for the moment.

@thephw
Copy link
Author

thephw commented May 14, 2020

Not that we're getting around to integration testing with kind I am getting the same error in linux land with the default. @DirectXMan12 can you point me to the code that sets up the default certs at /tmp/k8s-webhook-server/serving-certs?

{"level":"error","ts":1589463131.8074567,"logger":"build-webhooks.entrypoint","msg":"Unable to run manager","error":"open /tmp/k8s-webhook-server/serving-certs/tls.crt: no such file or directory","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/layers/heroku_go/shim/go-path/pkg/mod/github.com/go-logr/zapr@v0.1.0/zapr.go:128\ngit.luolix.top/onspaceship/booster/pkg/webhooks.Start\n\t/workspace/pkg/webhooks/webhooks.go:81"}```

@cshivashankar
Copy link

How to overcome this? I think it would be good to have something work for the first-time implementation or maybe a bit more information in Readme will definitely help.
As a workaround should i generate certs validated by API server and create a separate service for this or can it be done through the code itself to generate the cert?

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 14, 2020
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 13, 2020
@DirectXMan12
Copy link
Contributor

/remove-lifecycle rotten
/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Nov 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants