-
Notifications
You must be signed in to change notification settings - Fork 172
HNC : logs "http: TLS handshake error from x:x remote error: tls: bad certificate" #1255
Comments
Are you able to modify any HNC workloads? Did you modify the YAML files in
any way?
I think we can expect errors like this for a short period of time when HNC
is *first* installed, before the certificates have been created and
distributed. But they should stop within the first 30s and should never
recur.
…On Thu, Nov 5, 2020 at 10:53 AM Serge Hartmann ***@***.***> wrote:
Hello,
The *manager* container from *hnc-controller-manager* deployment show
continuously lots of logs like this :
2020/11/05 15:45:55 http: TLS handshake error from 10.233.98.0:25105: remote error: tls: bad certificate2020/11/05 15:45:55 http: TLS handshake error from 10.233.102.0:44592: remote error: tls: bad certificate2020/11/05 15:45:55 http: TLS handshake error from 10.233.92.0:49653: remote error: tls: bad certificate2020/11/05 15:45:55 http: TLS handshake error from 10.233.98.0:36010: remote error: tls: bad certificate2020/11/05 15:45:55 http: TLS handshake error from 10.233.98.0:6684: remote error: tls: bad certificate2020/11/05 15:45:55 http: TLS handshake error from 10.233.102.0:45771: remote error: tls: bad certificate2020/11/05 15:45:55 http: TLS handshake error from 10.233.92.0:37314: remote error: tls: bad certificate2020/11/05 15:45:55 http: TLS handshake error from 10.233.102.0:8601: remote error: tls: bad certificate2020/11/05 15:45:55 http: TLS handshake error from 10.233.92.0:15705: remote error: tls: bad certificate2020/11/05 15:45:55 http: TLS handshake error from 10.233.98.0:50125: remote error: tls: bad certificate2020/11/05 15:45:55 http: TLS handshake error from 10.233.92.0:11056: remote error: tls: bad certificate2020/11/05 15:45:55 http: TLS handshake error from 10.233.102.0:59676: remote error: tls: bad certificate2020/11/05 15:45:55 http: TLS handshake error from 10.233.102.0:17264: remote error: tls: bad certificate2020/11/05 15:45:55 http: TLS handshake error from 10.233.102.0:24978: remote error: tls: bad certificate2020/11/05 15:45:55 http: TLS handshake error from 10.233.98.0:28812: remote error: tls: bad certificate2020/11/05 15:45:55 http: TLS handshake error from 10.233.98.0:15634: remote error: tls: bad certificate2020/11/05 15:45:55 http: TLS handshake error from 10.233.92.0:7157: remote error: tls: bad certificate2020/11/05 15:45:55 http: TLS handshake error from 10.233.92.0:41785: remote error: tls: bad certificate
Does it show an actual issue ? If not, how can we disable the handshakes
attempts, or do not log these attempts ?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1255>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AE43PZAU34VEHYWG2KOEZ7LSOLC7NANCNFSM4TLRDRZA>
.
|
I cannot define a parent to a namespace
Maybe the errors above are not the same issue, but anyway it does not work and messages talk about certificates. Serge |
Ahh I think you need to upgrade to the new version of kubectl-hns - those
logs show a problem with hnc.x-k8s.io/v1alpha1, which was the API version
used by 0.5. But 0.6 uses v1alpha2 and requires the latest kubectl-hns. See
if that helps?
…On Fri, Nov 6, 2020 at 9:16 AM Serge Hartmann ***@***.***> wrote:
- I did not modify the hnc-manager.yaml all-in-one file
- container image name is
gcr.io/k8s-staging-multitenancy/hnc-manager:v0.6.0
- this log is still displayed continuously after 22h running
$ kubectl get deploy/hnc-controller-manager -o wide -n hnc-system
NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
hnc-controller-manager 1/1 1 1 22h manager,kube-rbac-proxy gcr.io/k8s-staging-multitenancy/hnc-manager:v0.6.0,gcr.io/kubebuilder/kube-rbac-proxy:v0.4.0 control-plane=controller-manager
$ kubectl logs --tail 12 deploy/hnc-controller-manager -c manager -n hnc-system
2020/11/06 14:02:15 http: TLS handshake error from 10.233.102.0:10357: remote error: tls: bad certificate
{"level":"info","ts":1604671335.157892,"logger":"cert-rotation","msg":"CRD subnamespaceanchors.hnc.x-k8s.io is being deleted"}
{"level":"info","ts":1604671335.254194,"logger":"cert-rotation","msg":"CRD hierarchyconfigurations.hnc.x-k8s.io is being deleted"}
2020/11/06 14:02:15 http: TLS handshake error from 10.233.102.0:38357: remote error: tls: bad certificate
{"level":"info","ts":1604671335.2594817,"logger":"cert-rotation","msg":"ensuring CA cert on ValidatingWebhookConfiguration"}
2020/11/06 14:02:15 http: TLS handshake error from 10.233.102.0:40448: remote error: tls: bad certificate
2020/11/06 14:02:15 http: TLS handshake error from 10.233.102.0:25077: remote error: tls: bad certificate
2020/11/06 14:02:15 http: TLS handshake error from 10.233.102.0:18810: remote error: tls: bad certificate
2020/11/06 14:02:15 http: TLS handshake error from 10.233.98.0:56772: remote error: tls: bad certificate
2020/11/06 14:02:15 http: TLS handshake error from 10.233.92.0:52681: remote error: tls: bad certificate
2020/11/06 14:02:15 http: TLS handshake error from 10.233.98.0:50827: remote error: tls: bad certificate
2020/11/06 14:02:15 http: TLS handshake error from 10.233.92.0:29824: remote error: tls: bad certificate
I cannot define a parent to a namespace
$ kubectl hns --version
kubectl-hns version v0.6.0
$ kubectl hns tree webs
Error reading hierarchy for webs: conversion webhook for hnc.x-k8s.io/v1alpha1, Kind=HierarchyConfiguration failed: Post "https://hnc-webhook-service.hnc-system.svc:443/convert?timeout=30s": x509: certificate signed by unknown authority
$ kubectl get ns sbr01
NAME STATUS AGE
sbr01 Active 14d
$ kubectl hns set sbr01 --parent webs
Error reading hierarchy for sbr01: conversion webhook for hnc.x-k8s.io/v1alpha1, Kind=HierarchyConfiguration failed: Post "https://hnc-webhook-service.hnc-system.svc:443/convert?timeout=30s": x509: certificate signed by unknown authority
$ kubectl get hncconfiguration
Error from server: conversion webhook for hnc.x-k8s.io/v1alpha1, Kind=HNCConfiguration failed: Post "https://hnc-webhook-service.hnc-system.svc:443/convert?timeout=30s": x509: certificate signed by unknown authority
Maybe the errors above are not the same issue, but anyway it does not work
and messages talk about certificates.
Serge
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1255 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AE43PZFNL2X6SRFNPBOVXBDSOQAMTANCNFSM4TLRDRZA>
.
|
That's what I suspected at first, but if you read my output, there is the check :
I had some difficulties during upgrade from v0.5.0 to v0.6.0 (messages about remaining CRDs), some I have delete all resources from v0.5.0 manifests, then I deleted namespace hnc-system, then I deployed v0.6.0 from the new all-in-one manifest. Here are the API references and resources :
|
I'm not sure what you mean by "references from the old API?" All those
outputs are valid for the new API as well - nothing there mentions v1alpha1.
Just to confirm - did you delete your validating webhook config?
kubectl delete validatingwebhookconfigurations.admissionregistration.k8s.io
hnc-validating-webhook-configuration
And then try recreating that? And once you do, try restarting the HNC pod
as well (k delete pods --all -n hnc-system).
Failing that... what version/distribution of K8s are you using?
…On Fri, Nov 6, 2020 at 9:30 AM Serge Hartmann ***@***.***> wrote:
That's what I suspected at first, but if you read my output, there is the
check :
$ kubectl hns --version
kubectl-hns version v0.6.0
I had some difficulties during upgrade from v0.5.0 to v0.6.0 (messages
about remaining CRDs), some I have delete all resources from v0.5.0
manifests, then I deleted namespace hnc-system, then I deployed v0.6.0 from
the new all-in-one manifest.
However I still got references from the old API :
$ kubectl api-resources | grep -i hnc
hierarchyconfigurations hnc.x-k8s.io true HierarchyConfiguration
hncconfigurations hnc.x-k8s.io false HNCConfiguration
subnamespaceanchors subns hnc.x-k8s.io true SubnamespaceAnchor
$ kubectl api-versions | grep -i hnchnc.x-k8s.io/v1alpha2
$ kubectl get crd -o wide | grep hnchierarchyconfigurations.hnc.x-k8s.io 2020-10-01T09:23:18Zhncconfigurations.hnc.x-k8s.io 2020-10-01T09:23:18Zsubnamespaceanchors.hnc.x-k8s.io 2020-10-01T09:23:18Z
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1255 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AE43PZHZ36VAYA6H4M3E4BTSOQCCHANCNFSM4TLRDRZA>
.
|
Also can you give any more insight about the problem you had upgrading?
What were the "messages about remaining CRDs?
/cc @yiqigao217
…On Fri, Nov 6, 2020 at 1:54 PM Adrian Ludwin ***@***.***> wrote:
I'm not sure what you mean by "references from the old API?" All those
outputs are valid for the new API as well - nothing there mentions v1alpha1.
Just to confirm - did you delete your validating webhook config?
kubectl delete
validatingwebhookconfigurations.admissionregistration.k8s.io
hnc-validating-webhook-configuration
And then try recreating that? And once you do, try restarting the HNC pod
as well (k delete pods --all -n hnc-system).
Failing that... what version/distribution of K8s are you using?
On Fri, Nov 6, 2020 at 9:30 AM Serge Hartmann ***@***.***>
wrote:
> That's what I suspected at first, but if you read my output, there is the
> check :
>
> $ kubectl hns --version
> kubectl-hns version v0.6.0
>
> I had some difficulties during upgrade from v0.5.0 to v0.6.0 (messages
> about remaining CRDs), some I have delete all resources from v0.5.0
> manifests, then I deleted namespace hnc-system, then I deployed v0.6.0 from
> the new all-in-one manifest.
>
> However I still got references from the old API :
>
> $ kubectl api-resources | grep -i hnc
> hierarchyconfigurations hnc.x-k8s.io true HierarchyConfiguration
> hncconfigurations hnc.x-k8s.io false HNCConfiguration
> subnamespaceanchors subns hnc.x-k8s.io true SubnamespaceAnchor
>
> $ kubectl api-versions | grep -i hnchnc.x-k8s.io/v1alpha2
>
> $ kubectl get crd -o wide | grep hnchierarchyconfigurations.hnc.x-k8s.io 2020-10-01T09:23:18Zhncconfigurations.hnc.x-k8s.io 2020-10-01T09:23:18Zsubnamespaceanchors.hnc.x-k8s.io 2020-10-01T09:23:18Z
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <#1255 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AE43PZHZ36VAYA6H4M3E4BTSOQCCHANCNFSM4TLRDRZA>
> .
>
|
It looks like you were upgrading when the certs were not there, so the conversion webhooks cannot work either. Before your upgrade, did the validating webhooks work for your in v0.5? |
@yiqigao217 : yes, validation webhooks worked with hnc v0.5.0 @adrianludwin : I have deleted validatingwebhookconfigurations.admissionregistration.k8s.io/hnc-validating-webhook-configuration and re-created the hnc-system with v0.6.0. Here is the situation now :
|
Ahh, looks like your CRDs are in a bad state. "create not allowed while
custom resource definition is terminating" is a K8s error, not an HNC error.
I'd try fully deleting HNC again, but this time, make sure that the CRDs
have been deleted. If they haven't, it's likely because there are some CRs
that have finalizers on them. If that's the case, you can manually remove
the finalizers.
I've seen this the most often on subnamespaces. Say "kubectl get
subns --all-namespaces" to see which ones still exist (after you've deleted
the CRDs), and then "kubectl edit subns <name> -n <parent-name>" to edit
it. Then you can just delete the "hnc.x-k8s.io" in the metadata.finalizers
list, and the object will be deleted. The CRD itself can't be deleted until
all objects of a given type are deleted first.
…On Thu, Nov 12, 2020 at 11:18 AM Serge Hartmann ***@***.***> wrote:
@yiqigao217 <https://github.com/yiqigao217> : yes, validation webhooks
worked with hnc v0.5.0
@adrianludwin <https://github.com/adrianludwin> : I have deleted
validatingwebhookconfigurations.admissionregistration.k8s.io/hnc-validating-webhook-configuration
and re-created the hnc-system with v0.6.0.
Here is the situation now :
$ kubectl hns tree webs
Error reading hierarchy for webs: conversion webhook for hnc.x-k8s.io/v1alpha1, Kind=HierarchyConfiguration failed: Post "https://hnc-webhook-service.hnc-system.svc:443/convert?timeout=30s": x509: certificate signed by unknown authority
$ kubectl create namespace level01
$ kubectl create namespace level02
$ kubectl hns tree level01
level01
$ kubectl hns set level02 --parent level01
Setting the parent of level02 to level01
Could not update the hierarchical configuration of level02.
Reason: create not allowed while custom resource definition is terminating
$ kubectl get customresourcedefinition,validatingwebhookconfiguration -o wide | grep hnccustomresourcedefinition.apiextensions.k8s.io/hierarchyconfigurations.hnc.x-k8s.io 2020-10-01T09:23:18Zcustomresourcedefinition.apiextensions.k8s.io/hncconfigurations.hnc.x-k8s.io 2020-10-01T09:23:18Zcustomresourcedefinition.apiextensions.k8s.io/subnamespaceanchors.hnc.x-k8s.io 2020-10-01T09:23:18Zvalidatingwebhookconfiguration.admissionregistration.k8s.io/hnc-validating-webhook-configuration 5 2d
$ kubectl logs --tail 6 deploy/hnc-controller-manager -c manager -n hnc-system
{"level":"info","ts":1605197748.854099,"logger":"cert-rotation","msg":"CRD hierarchyconfigurations.hnc.x-k8s.io is being deleted"}
{"level":"info","ts":1605197748.8579118,"logger":"cert-rotation","msg":"ensuring CA cert on ValidatingWebhookConfiguration"}
2020/11/12 16:15:48 http: TLS handshake error from 10.233.102.0:62683: remote error: tls: bad certificate
2020/11/12 16:15:48 http: TLS handshake error from 10.233.92.0:22964: remote error: tls: bad certificate
2020/11/12 16:15:48 http: TLS handshake error from 10.233.102.0:11259: remote error: tls: bad certificate
2020/11/12 16:15:48 http: TLS handshake error from 10.233.98.0:37281: remote error: tls: bad certificate
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1255 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AE43PZBJMDZGEHKHNVWWW7DSPQDEXANCNFSM4TLRDRZA>
.
|
Solved.
Details : Before removing manually the finalizers for remaining customresourcedefinitions (after deleting hnc controller) :
After deletion, before re-install
After re-install :
Now hnc controller v0.6.0 is re-created.
Logs look much better :
Problem is solved. Thanks @adrianludwin |
Ugh, sorry you ran into so much trouble. I've filed #1270 to fix the warnings. I'm not sure what caused the problems in the first place, but once you delete the deployment, it's not surprising that the CRD conversion webhooks fail. It's usually best to delete the CRs before the deployment because the manager is what typically removes the finalizers - but if we get into a bad enough state, it might stop doing the right thing. Please let me know if you see anything like this again. |
I can confirm this, the issue went away after upgrading to v0.8.0 (from 0.7.0) but I had to delete all resources and recreate them again. Update: I think I spoke to fast, it has started throwing errors again. |
@vikas027 what was the prior version of HNC, was it v0.6.0 or v0.7.0? And had HNC been working despite the errors, or was it broken? Only v0.6.0 had the CRD conversion webhooks in it (they were removed in v0.7.0) so if you saw this problem in v0.7.0, I'm leaning more towards it being a K8s issue than an HNC issue. |
Hello,
The manager container from hnc-controller-manager deployment show continuously lots of logs like this :
Does it show an actual issue ? If not, how can we disable the handshakes attempts, or do not log these attempts ?
The text was updated successfully, but these errors were encountered: