Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection refused by tls: bad certificate #65

Closed
micnncim opened this issue Aug 2, 2021 · 4 comments
Closed

Connection refused by tls: bad certificate #65

micnncim opened this issue Aug 2, 2021 · 4 comments

Comments

@micnncim
Copy link

micnncim commented Aug 2, 2021

Environment

Installed HNC with GKE Config Sync (v1.8.1):

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.4", GitCommit:"224be7bdce5a9dd0c2fd0d46b83865648e2fe0ba", GitTreeState:"clean", BuildDate:"2019-12-11T12:47:40Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"18+", GitVersion:"v1.18.19-gke.1701", GitCommit:"d7cecefb99b58e8968f59b59d76448eb1e6ea403", GitTreeState:"clean", BuildDate:"2021-06-23T21:51:59Z", GoVersion:"go1.13.15b4", Compiler:"gc", Platform:"linux/amd64"}
apiVersion: configmanagement.gke.io/v1
kind: ConfigManagement
metadata:
  name: ...
spec:
  clusterName: ...
  hierarchyController:
    enabled: true
  git:
    # ...
  sourceFormat: unstructured
  patches:
  - apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: hnc-controller-manager
      namespace: hnc-system
    spec:
      template:
        spec:
          containers:
          - name: manager
            args:
            - --webhook-server-port=9443
            - --metrics-addr=:8080
            - --max-reconciles=10
            - --apiserver-qps-throttle=50
            - --enable-internal-cert-management
            - --cert-restart-on-secret-refresh
            - --excluded-namespace=kube-system
            - --excluded-namespace=kube-public
            - --excluded-namespace=kube-node-lease
            - --excluded-namespace=hnc-system
            - --unpropagated-annotation=configmanagement.gke.io/declared-config
            - --unpropagated-annotation=configmanagement.gke.io/managed
            - --unpropagated-annotation=configmanagement.gke.io/token
            - --unpropagated-annotation=configmanagement.gke.io/cluster-selector
            - --unpropagated-annotation=configmanagement.gke.io/cluster-name
$ kubectl get deploy hnc-controller-manager -o jsonpath="{.spec.template.spec.containers[*].image}" -n hnc-system
gcr.io/config-management-release/hnc-manager:hnc-v0.8.0-hc.2 gcr.io/config-management-release/kube-rbac-proxy:v0.5.0

Issue

Ran into the issue where the certs for the ValidatingWebhooks are invalid:

$ kubectl logs -f hnc-controller-manager-6f8b9bc89b-mw4v4 -c manager -n hnc-system
// ...
hnc-controller-manager-6f8b9bc89b-mw4v4 manager {"level":"info","ts":1627895321.8811357,"logger":"reconcilers.HNCConfiguration","msg":"Setting HNCConfiguration name","name":"config"}
hnc-controller-manager-6f8b9bc89b-mw4v4 manager {"level":"info","ts":1627895322.1202245,"logger":"reconcilers.HNCConfiguration","msg":"Creating the default HNCConfiguration object"}
hnc-controller-manager-6f8b9bc89b-mw4v4 manager {"level":"error","ts":1627895322.1395543,"logger":"reconcilers.HNCConfiguration","msg":"Could not create HNCConfiguration object","error":"Internal error occurred: failed calling webhook \"hncconfigurations.hnc.x-k8s.io\": Post https://hnc-webhook-service.hnc-system.svc:443/validate-hnc-x-k8s-io-v1alpha2-hncconfigurations?timeout=10s: ssh: rejected: connect failed (Connection refused)"}
hnc-controller-manager-6f8b9bc89b-mw4v4 manager {"level":"error","ts":1627895322.1396565,"logger":"reconcilers.HNCConfiguration","msg":"Couldn't write singleton","error":"Internal error occurred: failed calling webhook \"hncconfigurations.hnc.x-k8s.io\": Post https://hnc-webhook-service.hnc-system.svc:443/validate-hnc-x-k8s-io-v1alpha2-hncconfigurations?timeout=10s: ssh: rejected: connect failed (Connection refused)"}
hnc-controller-manager-6f8b9bc89b-mw4v4 manager {"level":"error","ts":1627895322.139727,"logger":"controller-runtime.manager.controller.hncconfiguration","msg":"Reconciler error","reconciler group":"hnc.x-k8s.io","reconciler kind":"HNCConfiguration","name":"config","namespace":"","error":"Internal error occurred: failed calling webhook \"hncconfigurations.hnc.x-k8s.io\": Post https://hnc-webhook-service.hnc-system.svc:443/validate-hnc-x-k8s-io-v1alpha2-hncconfigurations?timeout=10s: ssh: rejected: connect failed (Connection refused)"}
hnc-controller-manager-6f8b9bc89b-mw4v4 manager {"level":"error","ts":1627895348.9775093,"msg":"Failed to export to Stackdriver: [rpc error: code = PermissionDenied desc = Permission monitoring.metricDescriptors.create denied (or the resource may not exist).; rpc error: code = PermissionDenied desc = Permission monitoring.metricDescriptors.create denied (or the resource may not exist).; rpc error: code = PermissionDenied desc = Permission monitoring.metricDescriptors.create denied (or the resource may not exist).; rpc error: code = PermissionDenied desc = Permission monitoring.metricDescriptors.create denied (or the resource may not exist).; rpc error: code = PermissionDenied desc = Permission monitoring.metricDescriptors.create denied (or the resource may not exist).; rpc error: code = PermissionDenied desc = Permission monitoring.timeSeries.create denied (or the resource may not exist).]"}

$ kubectl logs -f hnc-controller-manager-6f8b9bc89b-mw4v4 -c kube-rbac-proxy -n hnc-system
I0802 09:03:05.742587       1 main.go:186] Valid token audiences:
I0802 09:03:05.742694       1 main.go:232] Generating self signed cert as no cert is provided
I0802 09:03:06.059998       1 main.go:281] Starting TCP socket on 0.0.0.0:8443
I0802 09:03:06.060330       1 main.go:288] Listening securely on 0.0.0.0:8443
2021/08/02 09:03:21 http: TLS handshake error from 10.33.0.14:44856: remote error: tls: bad certificate
2021/08/02 09:03:36 http: TLS handshake error from 10.33.0.14:45096: remote error: tls: bad certificate
2021/08/02 09:03:51 http: TLS handshake error from 10.33.0.14:45302: remote error: tls: bad certificate

Due to this, I can't create any kind of custom resource either by Config Sync or from local.

I tried the following things but nothing worked:

  • Disabled HNC, deleted all the CRDs, and then reinstalled HNC
  • Deleted the Secret hnc-webhook-server-cert and then restarted the Deployment hnc-controller-manager
  • Used gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0 instead of gcr.io/config-management-release/kube-rbac-proxy:v0.5.0
  • Added --novalidation flag

I also checked the ValidatingWebhookConfiguration with the Secret hnc-webhook-server-cert and confirmed the certs looked correct.

$ kubectl get validatingwebhookconfigurations hnc-validating-webhook-configuration -o yaml

I checked the following and other issues but I can't find a solution.

@adrianludwin
Copy link
Contributor

Hmm, did you manually add that patches: field in the ConfigManagement object? That generally shouldn't be required for things to work correctly. Nothing jumps out at me as being wrong in there, but I'm not sure why you've got it.

kube-rbac-proxy only protects the metric server (and we're removing it entirely in the next release) so it shouldn't get in the way of regular webhook traffic. So while I'm not sure what the "bad TLS certificate" error there is, it's almost certainly not causing the problems with the webhooks, which actually show a completely different error:

ssh: rejected: connect failed (Connection refused)"}

I don't know what ssh is doing in there - HNC doesn't call SSH. I think public GKE clusters used SSH tunnelling until GKE 1.17, but any cluster since 1.18 should no longer be using them. Are you using an older cluster, or have you modified any of the default firewall rules that GKE creates?

Can you get any other webhooks to work (e.g. Gatekeeper or Policy Controller)? Or is it only HNC that's failing?

@micnncim
Copy link
Author

micnncim commented Aug 5, 2021

Thank you for your answer. We've confirmed the root cause HNC didn't work was that the webhook port change in Config Sync 1.8.0.

https://cloud.google.com/anthos-config-management/docs/release-notes#June_24_2021

The Hierarchy Controller admission webhook serving port has switched from 9443 to 10250.

Our patch --webhook-server-port=9443 was necessary for older versions but didn't work by the change of the default port. HNC is now working as expected.

@adrianludwin
Copy link
Contributor

Ah, I missed that. Glad to hear you found the cause.

/close

@k8s-ci-robot
Copy link
Contributor

@adrianludwin: Closing this issue.

In response to this:

Ah, I missed that. Glad to hear you found the cause.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants