Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

linkerd pods fail to start when following the get started guide #10643

Closed
jakub-moravec opened this issue Mar 29, 2023 · 5 comments
Closed

linkerd pods fail to start when following the get started guide #10643

jakub-moravec opened this issue Mar 29, 2023 · 5 comments

Comments

@jakub-moravec
Copy link

What is the issue?

All pods fall to CrashLoopBackOff status

$ kubectl -n linkerd get po
NAME                                      READY   STATUS             RESTARTS        AGE
linkerd-destination-7565d9f4ff-rz6xj      0/4     CrashLoopBackOff   18 (3m3s ago)   17m
linkerd-proxy-injector-5cfdfd8d77-q5wx8   0/2     CrashLoopBackOff   9 (2m15s ago)   17m
linkerd-proxy-injector-56cb5cfd95-k4psh   0/2     CrashLoopBackOff   9 (104s ago)    17m
linkerd-destination-59f67cb879-xhdnn      0/4     CrashLoopBackOff   22 (102s ago)   17m
linkerd-identity-64bc5c7dd9-t5lqh         0/2     CrashLoopBackOff   8 (96s ago)     17m
linkerd-identity-64dc6f886c-vsv97         0/2     CrashLoopBackOff   8 (98s ago)     17m

linkerd check returns
pod/linkerd-destination-7565d9f4ff-rz6xj container sp-validator is not ready

How can it be reproduced?

Following getting started guide on described environment up to linkerd check

Logs, error output, etc

identity pod

$ kubectl logs linkerd-identity-64bc5c7dd9-t5lqh -n linkerd
Defaulted container "identity" out of: identity, linkerd-proxy, linkerd-init (init)
Error from server: Get "https://test02-jmoravec-linkerd:10250/containerLogs/linkerd/linkerd-identity-64bc5c7dd9-t5lqh/identity": dial tcp: lookup test02-jmoravec-linkerd on 192.168.50.2:53: no such host
$ kubectl describe pod linkerd-identity-64bc5c7dd9-t5lqh -n linkerd
Warning  Unhealthy  33s (x235 over 20m)  kubelet  Readiness probe failed: HTTP probe failed with statuscode: 503
$ kubectl describe pod linkerd-destination-59f67cb879-xhdnn -n linkerd
Warning  FailedPostStartHook  3m22s (x6 over 21m)  kubelet  PostStartHook failed
$ kubectl describe pod linkerd-proxy-injector-5cfdfd8d77-q5wx8 -n linkerd
 Warning  Unhealthy  4m30s (x64 over 22m)  kubelet  Readiness probe failed: Get "http://10.1.232.155:9995/ready": dial tcp 10.1.232.155:9995: connect: connection refused

output of linkerd check -o short

Linkerd core checks
===================

linkerd-existence                                                                                                                                             \ pod/linkerd-destination-7565d9f4ff-rz6xj container sp-validator is not ready ^Cpod/linkerd-destination-7565d9f4ff-rz6xj container sp-validator is not ready

Environment

Docker: 23.0.1
Microk8s: v1.26.1
CentOS Stream release 8

Possible solution

No response

Additional context

I tried increasing liveness and readiness probes timeouts as recommended in #8235
I tried setting runAsRoot to true for the proxyInit as recommended in #7283

Would you like to work on fixing this bug?

None

@alpeb
Copy link
Member

alpeb commented Mar 30, 2023

Is there any other information from the kubectl describe commands besides the single lines you posted?
Also, please try running linkerd check --pre before attempting to install linkerd to verify all the prerequisites are fulfilled.

@jakub-moravec
Copy link
Author

Here's the full output of kubectl describe. I ran the pre-installation check, it succeeded.

linkerd-proxy-injector-5cfdfd8d77-q5wx8.txt
linkerd-destination-59f67cb879-xhdnn.txt
linkerd-identity-64bc5c7dd9-t5lqh_describe.txt

@hawkw
Copy link
Contributor

hawkw commented Apr 5, 2023

Interesting, it looks like the destination pod's proxy is getting "connectiion refused" errors trying to talk to the (local) policy controller in that pod:

  linkerd-proxy:
    Container ID:  containerd://f0c1e82aa5bc36d9cd010a2eadb94fb3434f312d954c597a203e0d3657b16aa3
    Image:         cr.l5d.io/linkerd/proxy:stable-2.12.4
    Image ID:      cr.l5d.io/linkerd/proxy@sha256:9d277c72488a214bb467f90b9f32ee7eb4cb4b12f2ad0827f486e98469095666
    Ports:         4143/TCP, 4191/TCP
    Host Ports:    0/TCP, 0/TCP
    State:         Waiting
      Reason:      CrashLoopBackOff
    Last State:    Terminated
      Reason:      Error
      Message:     watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[   145.665928s]  WARN ThreadId(01) policy:watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[   146.166724s]  WARN ThreadId(01) policy:watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[   146.667541s]  WARN ThreadId(01) policy:watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[   147.169189s]  WARN ThreadId(01) policy:watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[   147.671039s]  WARN ThreadId(01) policy:watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[   148.172455s]  WARN ThreadId(01) policy:watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[   148.674112s]  WARN ThreadId(01) policy:watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[   149.175909s]  WARN ThreadId(01) policy:watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[   149.681104s]  WARN ThreadId(01) policy:watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
[   150.012506s]  WARN ThreadId(01) linkerd_app: Waiting for identity to be initialized...

@hawkw
Copy link
Contributor

hawkw commented Apr 6, 2023

Hi @jakub-moravec, are you using a CNI plugin in your cluster? And, if not, do you know whether there might be other network configurations that could disallow some TCP connections within a pod?

@stale
Copy link

stale bot commented Jul 9, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Jul 9, 2023
@stale stale bot closed this as completed Jul 23, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants