Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help needed - tls secret took long to discover(ingress version: 1.10, kubernetes version: 1.20) #1448

Closed
kongdewen opened this issue Mar 11, 2021 · 17 comments · Fixed by #1654
Labels
bug An issue reporting a potential bug

Comments

@kongdewen
Copy link

we are seeing a weird problem when upgrade to latest ingress,

when the k8s ingress pods start, it takes couple minutes for actually find the tls for the ingresses. It would first report secret missing or invalid type(we did have to switch tls type to kubernetes.io/tls lately to prepare of the upgrade) until sometime later came back to normal:

error sample:

W0311 14:30:31.734196       7 controller.go:1983] Error trying to get the secret  for Ingress xxx: secret doesn't exist or of an unsupported type
W0311 14:30:31.734212       7 controller.go:1983] Error trying to get the secret tls-wildcard-xxx for Ingress xxx: secret doesn't exist or of an unsupported type

secret sample:

 kubectl -n xxx describe secret/tls-wildcard-xxx
Name:         tls-wildcard-xxx
Namespace:    xxx
Labels:     
Annotations:  <none>

Type:  kubernetes.io/tls

Data
====
tls.crt:  3420 bytes
tls.key:  1704 bytes

ingress sample:

#
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: "xxx"
  namespace: "xxx"
  annotations:
    ingress.kubernetes.io/rewrite-target: "/"
    nginx.org/redirect-to-https: "true"
    kubernetes.io/ingress.class: "nginx"
  labels:
    app: "xxx"
    track: "prod"
spec:
  tls:
    - hosts:
        - "xxx.default"
    - hosts:
        - "xxx"
      secretName: tls-wildcard-xxx
  rules:
    - host: "xxx.default"
      http:
        paths:
          - path: /
            backend:
              serviceName: xxx
              servicePort: xx
    - host: xxx
      http:
        paths:
          - backend:
              serviceName: xxx
              servicePort: xx
            path: /

Any ideas what might happened?

@pleshakov
Copy link
Contributor

Hi @kongdewen

This seems like a timing issue in the IC - it processed Ingresses before it had processed the referenced Secrets.

How many Ingress and secrets resources are handled by the Ingress Controller in your cluster?

@pleshakov pleshakov added the in review Gathering information label Mar 11, 2021
@kongdewen
Copy link
Author

Hey @pleshakov, for this particular ingress.class, it has 183 ingress, total tls secrets in the cluster is 46.

@pleshakov
Copy link
Contributor

@kongdewen
Thanks for the details. we were able to reproduce it.

It is a bug. I will update this issue once I have an ETA for the fix.

@pleshakov pleshakov added bug An issue reporting a potential bug and removed in review Gathering information labels Mar 13, 2021
@pleshakov
Copy link
Contributor

Hi @kongdewen

I wonder if you noticed any client traffic disruption during the upgrade.

Until the IC fully generates the config for all resources in the cluster, its readiness probe will fail. This is to prevent NGINX from starting accepting client request when its config is only partially generated.

That feature should mitigate your problem - although during the IC start, the IC can report warnings about missing secrets - once all resources are processed, the IC will clear all those warnings. After that, the IC will become ready, its readiness probe will succeed.

However, we had a bug in that feature - in certain scenarios the IC will become ready almost immediately. We just fixed that bug #1457 , it will be part of the upcoming 1.11.0 (end of this month).

For the warnings bug, we will address it in 1.12.0

@kongdewen
Copy link
Author

@pleshakov. Thank you. Yes we did see the disruption when we test the upgrade (connection error). we didn't go any further after we saw the service interruption. we did some test, the way helped the situation was to increase initialDelaySeconds on readinessprobe. But we didn't wanna put a large number there, so we keep our PROD in 1.9.0 atm. looking forward to 1.12.0

@github-actions
Copy link

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the stale Pull requests/issues with no activity label May 16, 2021
@Aohzan
Copy link

Aohzan commented May 20, 2021

Hello,
We have the same issue when with start the cluster (because we stop the preproduction cluster during the night), each morning, we have this error:

controller.go:2022] Error trying to get the secret mytlssecret for Ingress myingress: secret doesn't exist or of an unsupported type

We updated to the last chart version (0.9.1 with nginx ic 1.11.1), but we still have the problem.
We have to kill nginx pods to get resolve the issue. But we still have the error in logs

If someone have an idea ?
Thanks

@pleshakov pleshakov removed the stale Pull requests/issues with no activity label May 20, 2021
@pleshakov
Copy link
Contributor

Hi @Aohzan

This bug with warnings hasn't been fixed it yet. They appear during the IC start, but once the IC fully processes all Ingress and related resources in the cluster, the IC will not emit any warnings. Note that until all resources are processed, the IC pod is not in the Ready state, which means it will not receive any incoming traffic from KubeProxy.

I wonder if you noticed any traffic disruption or just the warnings?

@Aohzan
Copy link

Aohzan commented May 20, 2021

Yes, for the warning, I see it will be solved in the next release
But I still have the other issue: nginx return an error, and I have to kill pods to get it works again

@pleshakov
Copy link
Contributor

But I still have the other issue: nginx return an error, and I have to kill pods to get it works again

What is the error that you see?

@Aohzan
Copy link

Aohzan commented May 21, 2021

Full logs:

I0520 05:56:14.556214       1 main.go:258] Starting NGINX Ingress controller Version=1.11.1 GitCommit=3274536                                                                                                                                                                                                               │
│ W0520 05:56:14.581821       1 main.go:297] The '-use-ingress-class-only' flag will be deprecated and has no effect on versions of kubernetes >= 1.18.0. Processing ONLY resources that have the 'ingressClassName' field in Ingress equal to the class.                                                                     │
│ W0520 05:56:14.586979       1 warnings.go:70] networking.k8s.io/v1beta1 IngressClass is deprecated in v1.19+, unavailable in v1.22+; use networking.k8s.io/v1 IngressClassList                                                                                                                                              │
│ I0520 05:56:14.610602       1 syslog_listener.go:28] Starting latency metrics server listening on: /var/lib/nginx/nginx-syslog.sock                                                                                                                                                                                         │
│ I0520 05:56:14.611005       1 listener.go:51] Starting Prometheus listener on: :9113/metrics                                                                                                                                                                                                                                │
│ I0520 05:56:14.611527       1 leaderelection.go:243] attempting to acquire leader lease mynamespace/myingress-leader-election...                                                                                                                                                                    │
│ W0520 05:56:14.614261       1 warnings.go:70] networking.k8s.io/v1beta1 Ingress is deprecated in v1.19+, unavailable in v1.22+; use networking.k8s.io/v1 Ingress                                                                                                                                                            │
│ W0520 05:56:14.617130       1 warnings.go:70] networking.k8s.io/v1beta1 Ingress is deprecated in v1.19+, unavailable in v1.22+; use networking.k8s.io/v1 Ingress                                                                                                                                                            │
│ W0520 05:56:15.647165       1 controller.go:2022] Error trying to get the secret mytlssecret for Ingress myingress: secret doesn't exist or of an unsupported type                                                                                                                                    │
│ I0520 05:56:15.647206       1 event.go:282] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"mynamespace", Name:"myingress", UID:"9c169189-4296-4cae-b3d7-7322b837ed6d", APIVersion:"v1", ResourceVersion:"20760190", FieldPath:""}): type: 'Normal' reason: 'Updated' Configuration from c-mu │
│ W0520 05:56:15.786503       1 controller.go:2022] Error trying to get the secret mytlssecret for Ingress myingress: secret doesn't exist or of an unsupported type                                                                                                                                    │
│ W0520 05:56:15.786545       1 controller.go:2022] Error trying to get the secret mytlssecret for Ingress myingress: secret doesn't exist or of an unsupported type                                                                                                                                    │
│ W0520 05:56:15.786570       1 controller.go:2022] Error trying to get the secret mytlssecret for Ingress myingress: secret doesn't exist or of an unsupported type                                                                                                                                    │
│ I0520 05:56:15.786572       1 event.go:282] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"mynamespace", Name:"myingress", UID:"78c67e71-e4e2-4488-869c-5bf19b605d1f", APIVersion:"networking.k8s.io/v1beta1", ResourceVersion:"18463348", FieldPath:""}): type: 'Warning' reason: 'AddedOrUpdatedWithWarning' Conf │
│ W0520 05:56:15.786595       1 controller.go:2022] Error trying to get the secret mytlssecret for Ingress myingress: secret doesn't exist or of an unsupported type                                                                                                                                    │
│ W0520 05:56:15.786615       1 controller.go:2022] Error trying to get the secret mytlssecret for Ingress myingress: secret doesn't exist or of an unsupported type                                                                                                                                    │
│ W0520 05:56:15.927361       1 controller.go:2022] Error trying to get the secret mytlssecret for Ingress myingress: secret doesn't exist or of an unsupported type                                                                                                                                    │
│ W0520 05:56:15.927403       1 controller.go:2022] Error trying to get the secret mytlssecret for Ingress myingress: secret doesn't exist or of an unsupported type                                                                                                                                    │
│ W0520 05:56:15.927425       1 controller.go:2022] Error trying to get the secret mytlssecret for Ingress myingress: secret doesn't exist or of an unsupported type                                                                                                                                    │
│ W0520 05:56:15.927445       1 controller.go:2022] Error trying to get the secret mytlssecret for Ingress myingress: secret doesn't exist or of an unsupported type                                                                                                                                    │
│ W0520 05:56:15.927471       1 controller.go:2022] Error trying to get the secret mytlssecret for Ingress myingress: secret doesn't exist or of an unsupported type                                                                                                                                    │
│ I0520 05:56:16.067648       1 event.go:282] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"mynamespace", Name:"myingress", UID:"78c67e71-e4e2-4488-869c-5bf19b605d1f", APIVersion:"networking.k8s.io/v1beta1", ResourceVersion:"18463348", FieldPath:""}): type: 'Warning' reason: 'AddedOrUpdatedWithWarning' Conf │
│ I0520 05:56:16.067682       1 event.go:282] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"mynamespace", Name:"myingress", UID:"78c67e71-e4e2-4488-869c-5bf19b605d1f", APIVersion:"networking.k8s.io/v1beta1", ResourceVersion:"18463348", FieldPath:""}): type: 'Warning' reason: 'AddedOrUpdatedWithWarning' Conf │
│ I0520 05:56:16.067694       1 event.go:282] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"mynamespace", Name:"myingress", UID:"78c67e71-e4e2-4488-869c-5bf19b605d1f", APIVersion:"networking.k8s.io/v1beta1", ResourceVersion:"18463348", FieldPath:""}): type: 'Warning' reason: 'AddedOrUpdatedWithWarning' Conf │
│ I0520 05:56:16.067703       1 event.go:282] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"mynamespace", Name:"myingress", UID:"78c67e71-e4e2-4488-869c-5bf19b605d1f", APIVersion:"networking.k8s.io/v1beta1", ResourceVersion:"18463348", FieldPath:""}): type: 'Warning' reason: 'AddedOrUpdatedWithWarning' Conf │
│ I0520 05:56:16.067736       1 event.go:282] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"mynamespace", Name:"myingress", UID:"78c67e71-e4e2-4488-869c-5bf19b605d1f", APIVersion:"networking.k8s.io/v1beta1", ResourceVersion:"18463348", FieldPath:""}): type: 'Warning' reason: 'AddedOrUpdatedWithWarning' Conf │
│ I0520 05:56:16.205948       1 event.go:282] Event(v1.ObjectReference{Kind:"Secret", Namespace:"mynamespace", Name:"myingress-default-server-tls", UID:"caf9580e-f91e-493f-9889-37b261f7beb9", APIVersion:"v1", ResourceVersion:"20760188", FieldPath:""}): type: 'Normal' reason: 'Updated' the spe │
│ I0520 05:56:16.348338       1 event.go:282] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"mynamespace", Name:"myingress", UID:"78c67e71-e4e2-4488-869c-5bf19b605d1f", APIVersion:"networking.k8s.io/v1beta1", ResourceVersion:"18463348", FieldPath:""}): type: 'Normal' reason: 'AddedOrUpdated' Configuration fo │
│ I0520 05:56:16.348365       1 event.go:282] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"mynamespace", Name:"myingress", UID:"78c67e71-e4e2-4488-869c-5bf19b605d1f", APIVersion:"networking.k8s.io/v1beta1", ResourceVersion:"18463348", FieldPath:""}): type: 'Normal' reason: 'AddedOrUpdated' Configuration fo │
│ I0520 05:56:16.348372       1 event.go:282] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"mynamespace", Name:"myingress", UID:"78c67e71-e4e2-4488-869c-5bf19b605d1f", APIVersion:"networking.k8s.io/v1beta1", ResourceVersion:"18463348", FieldPath:""}): type: 'Normal' reason: 'AddedOrUpdated' Configuration fo │
│ I0520 05:56:16.348377       1 event.go:282] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"mynamespace", Name:"myingress", UID:"78c67e71-e4e2-4488-869c-5bf19b605d1f", APIVersion:"networking.k8s.io/v1beta1", ResourceVersion:"18463348", FieldPath:""}): type: 'Normal' reason: 'AddedOrUpdated' Configuration fo │
│ I0520 05:56:16.348385       1 event.go:282] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"mynamespace", Name:"myingress", UID:"78c67e71-e4e2-4488-869c-5bf19b605d1f", APIVersion:"networking.k8s.io/v1beta1", ResourceVersion:"18463348", FieldPath:""}): type: 'Normal' reason: 'AddedOrUpdated' Configuration fo │

@evilezh
Copy link

evilezh commented May 21, 2021

Hi - I do experience similar issue.
Our configuration have master ingress and minions. On controller start we have a lot of :

I0521 10:53:29.993523 1 event.go:282] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"namespace", Name:"ing-1", UID:"ca67a061-43eb-45b7-9172-6219d0e46e12", APIVersion:"networking.k8s.io/v1beta1", ResourceVersion:"225845246", FieldPath:""}): type: 'Warning' reason: 'AddedOrUpdatedWithWarning' Configuration for namespace/ing-1was added or updated ; with warning(s): TLS secret secret-1 is invalid: secret doesn't exist or of an unsupported type
and
W0521 10:53:29.993566 1 controller.go:2022] Error trying to get the secret secret-1 for Ingress ing-1: secret doesn't exist or of an unsupported type

It seems if pod is not picking up in few minutes, it is reasonable to kill it .. until it picks up. We have ~80 secrets in namesapce.

I'm using version 1.11.1. I tried to roll back up to 1.9 (as mentioned before), but 1.9 also did not help.

@pleshakov
Copy link
Contributor

Hi @Aohzan
Thanks for sharing the log. The errors should be intermittent - once the IC process all secrets, it will successfully generate the config for the Ingress resources. Until that happens, the pod will be in the NotReady state, so that if it is exposed via a LoadBalancer/NodePort service, it will not receive any traffic. I would not recommend killing the pod.

@pleshakov
Copy link
Contributor

Hi @evilezh

The pod is in the NotReady state until it processes all the secrets and Ingresses. During that processes, you will see intermittent errors. However, once the processing is finished, there should be no errors. I would not recommend killing the pod.

To see how much time it takes for a KIC pod to get ready, you can run the IC with -v=3 and then check at what point you get NGINX is ready

@evilezh
Copy link

evilezh commented May 22, 2021

Hi @pleshakov , if that would be a case, wouldn't have any issues.
Problem is that pod comes into service and apps stop working. I can say - on first launch .. in 5 hours nothing did happen really, so we went into rollbacks etc. And then figured out, that killing pod, if it does not come up in few minutes, actually helps.

By 'pod does not come up' i mean ... if it fails to load secrets. I will do a bit more research.

I did also exec into the controller container to check the configuration - it was ok ... only the listen directive was for port 80 only (I guess because of missing TLS).

@evilezh
Copy link

evilezh commented May 22, 2021

I did only a little bit read code :) And few thoughts on start sequence of controller.
I've that scenario with master and minions and it seems, there are a lot of re-loads on startup.

It would make sense, that on the startup, we load first secrets and then load all ingresses, build configuration, and only then start nginx. And then follow up with updates.

Currently those errors are missleading. If i wouldn't lookup in code - I woul think - that nginx can't find secrets in k8s.

@pleshakov
Copy link
Contributor

The PR #1654 fixes the problem with warnings.

The related problem of slow configuration when the IC starts is now tracked in #1655

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug An issue reporting a potential bug
Projects
None yet
4 participants